From van@helios.ee.lbl.gov  Fri Sep 16 15:41:21 1988
Posted-Date: Fri, 16 Sep 88 15:42:10 PDT
Received-Date: Fri, 16 Sep 88 15:41:21 PDT
Received: from vs.ee.lbl.gov by venera.isi.edu (5.54/5.51)
	id AA10530; Fri, 16 Sep 88 15:41:21 PDT
Received: by helios.ee.lbl.gov (5.59/s2.2)
	id AA09291; Fri, 16 Sep 88 15:42:13 PDT
Message-Id: <8809162242.AA09291@helios.ee.lbl.gov>
To: end2end-interest@venera.isi.edu
Cc: boggs@decwrl.dec.com
Subject: some more ethernet/tcp throughput tests
Date: Fri, 16 Sep 88 15:42:10 PDT
From: Van Jacobson <van@helios.ee.lbl.gov>
Status: R

Dave Boggs has been doing more ethernet tests & was kind enough
to forward some of the data to me.  In one fascinating series of
tests with 16 independent sources, the delay stayed constant
with load while the throughput increased linearly right up to
100% utilization.  In the tests, a) each source generated
traffic at a fixed frequency and b) the (fixed) time-to-next-
packet was measured from the completion of the previous send.
(B) means that relative phase between sources is what I would
call a "network free parameter" -- a parameter adjusted by the
dynamics of the network.  After discussing the data, I think we
agreed that the combination of (a) & (b) meant collisions tend
to adjust the phase of different sources so that future sends
don't interfere.  That led to very efficient media utilization.
(Enthusiasts may immediately see (a) as rate-based flow control
and say "of course it gets great utilization".  Actually,
because of (b), the test may resemble window flow control more
than rate-based and "of course it gets great utilization" :-)

Dave embarked on a new series of tests with more randomness in
the source packet frequency which he claimed gave more
"reasonable" results.  I thought the fixed frequency results
were reasonable for a heavily loaded net because transport
protocols tend to impose fixed frequencies on the traffic.  I
did some tcp tests which tended to support this theory.

I thought this might be of interest to the group both because of
the protocol design issues raised and because it's an example
where self-organization helps rather than hurts.  So, with Dave's
permission, I'm forwarding a note on some tcp tests and an earlier
note theorizing about the self-organization mechanism.  Hopefully,
Dave will soon be publishing his test data (which is MUCH more
interesting than my blatherings about it.)

 - Van

------- Forwarded Messages

From:    Van Jacobson <van>
Date:    Thu, 15 Sep 88 09:04:12 PDT
To:      boggs@decwrl.dec.com
Subject: more ethernet throughput

Dave -

Bob Fink dropped off the latest graph.  Thanks!  But I think the
numbers you're getting now are conservative (i.e., too low) for
real life data transfer across an ethernet.  The reason is that
real transport protocols with congestion & flow control
introduce a lot of periodicity into the data flows.  With
multiple, simultaneous flows, this periodicity lets the traffic
self-organize to minimize interference (as it did in the round
of tests you did before the current series).

I tried to get some TCP data to support this:  This morning I
grabbed 4 of our workstations, 3 Sun-3/60's (20MHz 68020 w/LANCE
chip) and 1 Sun-3/280 (25MHz 68020 w/Intel 82586 chip), and ran
some tests with two simultaneous TCP transfers going between
pairs of workstations.  On a fairly quiet ethernet (a background
load averaging 20 pps & ~150 kbps due to the ND traffic from 14
diskless Sun's), I first ran some tests to get the single-
conversation throughput:

Single Conversation:

               data   ack
  Pair	 KB/s   p/s   p/s   mb/s
  ----   ----  ----   ---   ----
  A > B	  870   610   305   7.70
  C > D	  650	456   228   5.75

A, B & D are 3/60's, C is a 3/280 (although the 3/280 is much
faster than the 3/60, the brain-dead 82586 makes its ethernet
interface a real dog.  I would rather have done a test with 4
3/60s but we only have 3).  KB/s is the task-to-task data
throughput in Kbytes/sec. (K=1024).  "data p/s" is the number of data
packets per second on the wire.  Every two data packets result
in an ack packet the other direction ("ack p/s").  Every packet
has a 40 byte tcp/ip header, 14 byte ethernet header & 24 bytes
of CRC, IPG & sync.  Data packets have 1460 bytes of user data
so their total size on the wire is 1538 bytes.  Acks have no data
so their size is 78 bytes.  Multiplying the rates by the sizes
gives the total bit rate on the wire in megabits/sec ("mb/s", m =
10^6).

The above numbers are for a single, active TCP conversation (I
wanted single conversation throughputs to make sure that two
active conversations could demand more than 10 mb/s).  Each
number above is the average of 8 tests.  Each test consisted of
recording the total time taken to send 16MB between the two
machines.  (The times were ~20s and the clock resolution is 10ms
so the throughputs should be accurate to ~0.1%).

I then simultaneously fired off data transfers between A & B and
C & D.  Each pair transferred 32MB.  The test program was
modified to record the time after each 8MB was sent.  I computed
throughputs using the middle 16MB (so there would be no "end
effects" where only one of the two conversations was active).
The averages for 8 trials were:

Two Simultaneous Conversations:

               data   ack
  Pair	 KB/s   p/s   p/s   mb/s
  ----   ----  ----   ---   ----
  A > B	  767	538   269   6.79
  C > D	  318	223   112   2.81
  -----  ----   ---   ---   ----
  Total  1085   761   381   9.60

(If you add in the 150 kb/s background traffic, the last number
says that the average Ethernet utilization was 98% over every
40 sec. test!  I was impressed.)


Discussion
----------
With one active conversation (and a decent ethernet chip), I
consistently get 80-85% of the ethernet bandwidth.  This number
seems to be independent of processor speed (I've tried 15MHz,
16.7MHz and 20MHz 68020s and all that happens is the idle time
increases as the processor speed increases).  I believe that
because the ack traffic is synchronized with the data traffic,
the receiver frequently collides with the sender.  These
collisions generate holes (idle time) on the wire and, with only
one conversation, nothing can fill these holes.

With two or more active conversations, one conversation can fill
another's holes.  In fact, because of the interference effects
we talked about last time, the conversations will adjust their
relative phases so that one guy's sends tend to line up with the
other guy's holes.  Thus the collision rate stays about
constant, the wire idle time goes down and the total throughput
goes up.  My guess is that the total throughput will stay about
100% as more host pairs are added up to around 8-10 active
conversations where it will start to fall off to the theorists 86%.

 - Van

-------------------

Date: Tue, 06 Sep 88 00:18:32 PDT
To:   boggs@decwrl.dec.com (David Boggs)
From: Van Jacobson <van>
Subject: Re: transmission delays 

Dave -

I think most of your data makes sense (that's shorthand for "I
believe I can construct a theory that predicts what you observe").
My understanding of the tests is:

	On each of N hosts do some large, fixed number of times:
		transmit a packet of size S
		when transmit done,
			record delay
			wait additional time D

I suspect the critical difference between this & the SIGCOMM
tests is that in this test all the hosts run at a fixed
frequency.  That makes the system organize itself for best-case
performance at loads <100% and worst-case performance at loads
>100% with a sharp transition between the two regimes.

The self-organization works likes this:  Say you were doing the
test with two hosts, P & Q and running at 100%.  I.e., each host
generates a packet then delays one packet time.  On the first
packet from each host, there are two possibilities: it doesn't
collide with the other guy or it does.  If there's no collision
on the first packet, there will never be one since the two hosts
are running at the same frequency but 180 deg. out of phase.
So, say there's a collision and P defers.  Eventually P will
find a time slot for its packet.  The only free time slots are
Q's delay slots so the start of P's packet will be the end of
Q's and the end of P's packet will coincide with the start of
Q's next packet.  Since the delay starts at transmit done, P
will be in its delay loop while Q is sending and, thus, the two
are now running 180 deg. out of phase & we're back to the first
case.  I.e., the first collision corrects the phases so it's the
only collision.  Since we only measure average delay over many
iterations, we never see that one collision in the average & it
looks like the delay is constant and minimum.

It's probably obvious that the above mechanism works for any
fixed number of hosts and any frequency (i.e., any value of S
and D).  In fact it works even if S and D are different for
every host, just as long as they're fixed.  The easiest way for
me to see this is to picture a collisions as a "force" acting on
the packets.  If two or more packets overlap, this force pushes
them apart; if they don't overlap, there's no force.  As long as
there's nothing else that can change the relative phase of the
packets (e.g., no randomness), you have to end up with none of
the packets overlapped (if this is possible, e.g., as long as
the load is <100%).  So, seeing a constant, minimum delay at low
loads isn't much more surprising than dropping a handfull of
sand in a glass then noticing that, after everything stops moving,
no grain is inside any other and they're all touching.  (An even
closer analogy is to note that this mechanism is exactly the way
you make a laser: fix the frequency (via an external "pump")
then let mutual interference effects rearrange the phases so
everything's coherent.)

The only change I'd expect as you vary S & D would be in the
length of the turn-on transient -- the time & collisions it
takes the system to rearrange itself so nothing touches.  My
guess is this would be longest if all the S's & D's are
different and mutually relatively prime.  (I used to know how
to calculate this -- it's just a Gibbs ensemble & we want
to maximize Z, the sum-over-states.  But, I haven't done that
for years and my intuition is rapidly fading.)

The reason there's such a sharp increase in delay at 100% is
also because the interference makes things pack so well but it's
a lot harder to explain without pictures (but there was a
Scientific American article a couple of months back on "Squeezed
Light" that was a pretty good description of what's probably
going on).  In your SIGCOMM tests, the randomness (generated by
the random backoffs) made some holes in the data stream on the
wire.  Some collisions fill the holes (which lowers average
delay) and some generate collisions with later packets (which
raises average delay).  The balance between these positive and
negative effects is what gives you the smooth, linear delay
increase with load in the SIGCOMM data.  In this test, there are
no holes and each collision generates a cascade of later
collisions which is what makes the step at 100%.  But, as the
load keeps going up, the random backoff decision lets some early
colliding guys generate holes that later colliding guys can land
in.  This is why the delay increases much slower than linearly
with load (i.e., after the step at 100%, delay only increases
like the square root of the load).

Anyway, that's my theory.  Behavior on a real ethernet is
certainly going to be a mix of what you observed in the SIGCOMM
tests (i.e., with lots of source randomness) and what you
observed in this test (i.e., with no source randomness) but I
think the mix will skew toward this test.  That is, the random
sources tend to generate a small fraction of the load and things
that generate a lot of load, big data transfers, tend to run at
fixed frequency (i.e., they do the same processing on each
packet).  Certainly all my ethernet TCP throughput tests have
shown the same step up in delay (and thus down in throughput)
around 100% that you observe (the steps in my tests were below
100% in part because of the random traffic and in part because
of the asymmetry in sender & receiver overhead but, qualitatively,
the same thing is happening.)  So, I think this test shows that,
under a typical data transfer load, an ethernet behaves much
better than you think it might (i.e., as long as you ask for less
than 100%, it gives you everything you ask for).

 - Van

------- End of Forwarded Messages

