From van@lbl-csam.arpa  Mon Feb 22 02:50:38 1988
Posted-Date: Mon, 22 Feb 88 02:49:56 PST
Received-Date: Mon, 22 Feb 88 02:50:38 PST
Received: from LBL-CSAM.ARPA by venera.isi.edu (5.54/5.51)
	id AA22185; Mon, 22 Feb 88 02:50:38 PST
Received: by lbl-csam.arpa (5.58/1.18)
	id AA15794; Mon, 22 Feb 88 02:49:56 PST
Message-Id: <8802221049.AA15794@lbl-csam.arpa>
To: eman@melange.lcs.mit.edu (Eman Hashem)
Cc: end2end-interest@venera.isi.edu, ddc@lcs.mit.edu
Subject: Re: RTT Behavior 
In-Reply-To: Your message of Thu, 18 Feb 88 12:20:58 EST.
Date: Mon, 22 Feb 88 02:49:56 PST
From: Van Jacobson <van@lbl-csam.arpa>
Status: R

Eman -

The thing I couldn't understand in the picture Dave sent out
were the large gaps between the retransmitted bursts.  These
gaps must = rto but they appear to be 10 to 20 times larger
than rtt (I'm getting an rtt estimate from the spacing of the
stairsteps when there are no retransmissions, like from the
region between 40,000 & 44,000 bytes).  It looks like the rtt
is around 100-150ms but the smallest rto I see is 500ms and
some, like the big interval just before 40s, are around 2sec.

I wanted to know if these were just due to clock granularity
(since the 4bsd retransmit clock runs at 500ms) or it they
were due to simulating a known bug in the 4.2 timer code.  If
the big gaps are due to timer granularity, I might suggest
that you run the simulations such that the round trip time
(the time to exchange a full window of data) is at least 10
times a clock tick rather than 1/5 a clock tick.  Otherwise
your measured throughput is a mixture of behavior due to clock
quantization and behavior due to protocol performance, different
effects that are confusing when you see them combined.

I'm not sure from your note whether you actually do simulate the
4.2 algorithm.  4.2 effectively timed only one packet in each
window exchanged (this wasn't by design but that's how the
algorithm performs).  4.2 also doesn't stop the timer on a
retransmit.  This leads to the following bug:  Say the actual
rtt is R and the rtt estimate is correct (srtt = R), we send
packets 1 & 2, are timing 2 and 1 gets lost.  After 2R, 1 is
retransmitted.  Since 2 was cached, when 1 arrives 2 is acked.
On the ack we notice that 2 was timed & compute srtt with an rtt
of 3R (2R got added by the timeout for packet 1).  So rto now
goes from 2R to 6R (I'll ignore the srtt filter to keep the math
simple -- The filter has no effect on this behavior).  Say we're
in one of the repetitive failure modes that shows up so clearly
Dave's graph.  So we send 3 & 4, timing 4, and 3 gets lost.
Same behavior as before but now the measured rtt for 4 will be
7R and the next rto will be 14R.

It should be obvious that rto will increase exponentially to
whatever its maximum value is (30 sec under 4.2bsd) then stay
there.  Although the above scenario may seem unlikely, in
fact it *always* happened when 4.2 got into a repetitive
failure mode (the reason why is complicated to explain).

This kind of failure is interesting since it shows an algorithm
that fails catastrophically under a particular combination of
events; a combination that no one would have expected to occur
but which turned out to be quite likely.  (this bug is also
interesting to me because it's a graphic demonstration that some
people who have said "timing retransmitted packets won't make
much difference" are badly confused.)

So, the reason for the `at least 10 ticks to 1 rtt' statement
above was so that time gaps in the trace will always be a
clue that "interesting" behavior is occuring and won't be
overlooked.

 - Van


