From van@lbl-csam.arpa  Thu Feb 18 19:16:31 1988
Posted-Date: Thu, 18 Feb 88 19:16:00 PST
Received-Date: Thu, 18 Feb 88 19:16:31 PST
Received: from LBL-CSAM.ARPA by venera.isi.edu (5.54/5.51)
	id AA21372; Thu, 18 Feb 88 19:16:31 PST
Received: by lbl-csam.arpa (5.58/1.18)
	id AA04615; Thu, 18 Feb 88 19:16:02 PST
Message-Id: <8802190316.AA04615@lbl-csam.arpa>
To: braden@venera.isi.edu
Cc: jon@cs.ucl.ac.uk, end2end-interest@venera.isi.edu
Subject: Re: Randomising events 
In-Reply-To: Your message of Wed, 17 Feb 88 10:38:06 PST.
Date: Thu, 18 Feb 88 19:16:00 PST
From: Van Jacobson <van@lbl-csam.arpa>
Status: R

Bob -

I think that network queuing theorists may have given math a bad
name.  Chemists and physicists are, occasionally, practical
people with practical problems to solve.  They almost always
want to understand how to make some particular thing work or
stop working.  But, because of the scale of things they deal
with (e.g., 10^23 molecules in a gram of air), they found that
most understanding was rooted in a mathematical description of
the observed phenomena.  As the scale of our networks increases
from tens to hundreds or thousands of hosts, I suspect we will
be forced into that same mode of understanding.

Mathematical descriptions tend to wrap a lot of information into
a small package.  Also, because such descriptions derive from
formal proofs, they always tell you their domain of applicability.
Thus, given a formal description of a synchronization
phenomenon, one should be able to get insights on the
complementary desynchronization by inverting the question one
asks of the equation (e.g., find maxima instead of minima).  Or,
if the description is monotone, the equation has told you than
no system following the constraints of the derivation will work.
Since you know the constraints, you know what class of things to
change while searching for something that will do what you want.

In particular, Smoluchowski's result said that uniformity
vanished over time something like 1/(1 + c0 g t)^2.  Since t is
monotone increasing and c0 & g are positive, this has
immediately told you that *any* such system will eventually
synchronize.  It has even told you how long `eventually' is:
The above is not appreciably different from 1 as long as t <<
1/(c0 g).  Thus, if you know g and the packet density, you know
how long the system will stay unsynchronized.  If this time
turns out to be a year, for example, you have found there's
nothing to worry about (until your service gets popular and the
density goes up :-).  If this time is within a few orders of
magnitude of what's reasonable, you look at either limiting the
density (e.g., reducing the frequency of update messages) or
reducing the geometric factor (e.g., increase the network
bandwidth so there's less time displacement due to media
contention).

A sensible person might decide they don't have enough knowledge
or control of the hosts that will eventually run the distributed
algorithm.  That probably rules out the coefficient modification
procedures above and leads one down a harder road of
understanding the derivation.  In the derivation, the only real
randomness was the initial positions and velocities (for
networks, substitute `send time' for `position' and there is no
`velocity').  Since the interaction forces are deterministic,
they can only remove randomness from the system.  If the only
randomness is initial, then the system can only synchronize over
time.  G tells you the average magnitude of the interaction
forces and c0 tells you how many things can interact so the
product c0 g is a measure of the total force available to remove
randomness from the system (i.e., the rate at which things will
synchronize).

After looking at the Smoluchowski's derivation (although I don't
claim to understand it), I don't see any way to prevent
synchronization using a deterministic algorithm.  That suggests
looking for algorithms that always add some randomness (e.g.,
instead of doing routing updates at 2 minute intervals, do the
next update at now + R where R is a uniformly distributed random
number between 0 and 4 minutes).  And, once again Smoluchowski
tells you something useful: the minimum randomness needed to
counter the interaction forces (which must still exist), c0 g.
[Perhaps you can see why I thought that making the chemist's
method of analytically determining g work for a computer network
was a useful activity.  I think there might be a few numbers,
like g and the hydrodynamic Reynolds number, that would have a
lot of explantory power for the strange things we observe in a
complex network.]

The c0 g dependence unfortunately means that you have to tailor
the amount of injected randomness to match a particular network.
Some applications have enough time budget to live with
reasonable worst-case assumptions (e.g., one might find that
rwho random update with a 10 minute interval will be stable for
an ethernet with up to 100 hosts or a network of up to 13 hosts
connected by 1200 baud SLIP links).  But most applications
(e.g., multiple, independent NETBLTs) probably wouldn't want to
inject any more random delay than they absolutely had to.  Since
we have a model for how order evolves, it shouldn't be hard to
construct an estimator for the current entropy gradient, then an
adaptive control that injects more randomness if the entropy is
decreasing.

Or, if you want to be at the leading edge of what's known, you
might look at how biological systems generate randomness to get
themselves out of the same kind of synchronized ruts.  This
might take you off into the fascinating world of non-linear maps
and chaotic dynamics and can lead to things like the `epidemic'
algorithm that the Xerox PARC people just developed to counter
the global synchronization of Clearinghouse updates (by `global'
I really mean global -- the Worldwide coorporate network was
running lockstep trying to update the user database.  John
Larson has some total traffic vs. time graphs that are truly
amazing).  This kind of algorithm doesn't have a built-in, fixed
amount of randomness.  Instead, it uses a small random `seed' and
a non-linear, birth-death map to automagically increases the
global randomness as the `scale' of the network increases.

But I would want to be pretty heavily armored with theory before
starting these investigations:  As the weather people found out
long ago, and the PARC people learned, "unpredictable" and
"random" are different.  Most chaotic algorithms lead to low
frequency limit cycles that can do almost as much damage as the
original synchronization.  But ecologists tell us that our
complicated, highly interconnected planet uses chaotic maps to
evolve and survive (and these mechanisms worked well until
humans started mucking with the machinery) so there must good
stuff waiting for us off in that direction.  But I wouldn't hold
my breath waiting for a non-mathematical insight or explanation.
(Or maybe it was accident that the PARC team that developed
epidemic algorithms included an ex-physicist and a mathematician
in addition to some hardcore network hackers).

 - Van