From mankin@gateway.mitre.org Wed Apr 26 11:26:25 1989
To: braden@venera.isi.edu
Subject: Shenker response
Date: Wed, 26 Apr 89 14:23:59 -0400
From: mankin@gateway.mitre.org
Status: R


------- Forwarded Message

Return-Path: Shenker.pa@Xerox.COM
Return-Path: <Shenker.pa@Xerox.COM>
Received: by gateway.mitre.org (5.54/SMI-2.2)
	id AA13223; Sun, 9 Apr 89 16:44:34 EDT
Received: from Cabernet.ms by ArpaGateway.ms ; 09 APR 89 13:35:50 PDT
Date: 9 Apr 89 13:35:43 PDT (Sunday)
From: Shenker.pa@Xerox.COM
Subject: Comments
In-Reply-To: <8904051851.AA19352@gateway.mitre.org>
To: mankin@gateway.mitre.org (Allison J. Mankin)
Cc: Shenker.pa@Xerox.COM
Message-Id: <890409-133550-2836@Xerox>

Allison, 

Here are some random comments.  I look forward to continuing this
discussion in July.  By then I should have finished my game theory paper
(which, by the way, argues that Fair Queueing is the only way to go!), and
also some simulations comparing Random Drop to Fair Queueing.


COMMENTS ON DRAFT

Interloper that I am, I apologize for entering the discussion at such a
late date (and only through email).  I hope that a more comprehensive
dialogue can take place at the July meeting (which I plan to attend).
Whatever truth is contained in the following is due to conversations with
my coauthors Alan Demers and Srinivasan Keshav; however, since they did not
get a chance to review these comments, they bear no responsibility for any
of my errors.


*General Comments:

I assume that this document's primary purpose is to propose adoption of the
Random Drop (RD) policy.  I am completely in favor of this, at least in its
simplest form of just dropping packets randomly whenever the gateway buffer
overflows.  This version of the RD policy (I will call it vanilla RD) has
very little risk and significant potential gain.  Vanilla RD drops no more
packets than the present drop-from-end-of-queue policy, yet it avoids some
of the pathological segregation effects.  Furthermore, this vanilla RD is
simple to implement and requires no tuning of parameters.  Thus, as a
simple, low risk, temporary fix, it seems clear that vanilla RD is
advisable.  While I am in agreement on this point, I do have some
reservations about the draft.  

First, I have my doubts about turning RD into an aggressive congestion
avoidance policy by dropping packets even for reasonably small queue sizes.
Simulations of the DECbit algorithm indicate that such an aggressive policy
can waste a significant amount of bandwidth, especially on high-latency
links.  Furthermore, this would greatly increase the vulnerability of
well-behaved sources to ill-behaved ones (ill-behaved sources such as NFS
servers may not always overflow the buffer, but would often cause the
gateway to exceed the optimal operating point criterion advocated in the
draft).  Even if one were to devise an RD policy that avoided these
pitfalls, the sensitivity and unpredictability of its behavior greatly
increases the risk of implementing RD.  Thus, I would not feel comfortable
recommending implementation until extensive simulation tests were made.
However, I would not object to immediate adoption of a slight modification
to the vanilla algorithm in which the drop rate was a function of the
average queue size, just as long as this function did not introduce
significant dropping rates until a sizable fraction (say 1/2) of the buffer
was occupied.

Second, some of the comments in the minutes refer to complicated schemes
for detecting noncooperative sources and singling them out for punishment.
My primary objection is that these noncompliance detection schemes have
many of the problems of Fair Queueing (complicated algorithm, aggregation
of users, etc.) with few of the advantages.  My other objection is that I
don't believe that one can reliably detect noncompliance.  But, even if you
could, it's the wrong thing to do.  The term cooperate has a emotional tone
to it (everyone should cooperate!), but noncooperation may merely reflect
heterogeneity of flow control algorithms.  We need to prepare the network
for operating in a heterogeneous environment, not only because of the
impending tcp-ip -> ISO transition but also because I believe future
networks should allow applications to employ flow control algorithms which
are tuned to their particular needs.  This can work if gateways insulate
users from each other, so that an ill-behaved source hurts nobody but
itself (a la Fair Queueing).  Having the gateway implementations dictate
which flow control algorithms are acceptable is a large step in the wrong
direction, and will hamper growth and and stifle innovation (slow-start
never would have been implemented if the gateways had dictated the
acceptable flow control algorithm). 

Third, I don't see any variant of RD being a long-term solution.  Any
gateway policy which retains the FIFO service discipline has severe
limitations.  My guess is that the working group doesn't see RD as a
long-term solution either, but the draft didn't make that clear to me.

As you can see, I am pushing for a conservative approach to RD, and would
urge that the draft lower its sights and make a more solid case for a more
limited proposal.  I would suggest adding to the draft more explicit
acknowledgement of the short-term and limited utility aspects of the fix.
For instance, it should be made clear exactly what problems you think RD
will fix.  Personally, I think RD will fix the pathologies of segregation
and other extreme nonuniformities in service; I do not think it will
provide fairness or congestion control (at least not in the power
optimizing sense).

Below are some more detailed comments.


*Evaluation Criteria:

A network should be viewed as providing a service to users, and it should
ultimately be evaluated in terms of user satisfaction.  This approach is
best expressed using game theory and utility functions.  I am currently
writing a paper describing this approach, but here I will just make a few
sketchy comments.  Clearly, users that are doing a large file transfer care
mostly about the throughput of their stream (aka transport connection), and
care very little about the individual packet delays.  Conversely, a Telnet
user cares almost solely about the packet delays, and not at all about the
bandwidth available.  Users of Voice and Video applications will have
different preferences, involving bounded delays and other constraints.  We
should aim toward building networks that satisfy all of these different
type-of-service requirements.  However, as long as we use a FIFO packet
service discipline, all users will see roughly the same average delay at
each gateway and there is no single operating point that will
simultaneously satisfy the various desires.  Thus, as we argue in our FQ
paper, one must abandon the FIFO service discipline to simultaneously
provide satisfactory service to users with different preferences.

However, it appears that changing the queueing discipline is not an option
in the near term.  Thus, if we must use FIFO, then the best we can do is to
(1) pick a reasonable compromise operating target, (2) design flow control
algorithms to arrive at this operating point if everyone cooperates, and
(3) attempt to deal with sources that don't cooperate.  I interpret
optimizing "power" as merely a heuristic to reach a compromise operating
point given that FIFO scheduling is used.  The draft discusses maximization
of "power" in the opening paragraph, as if it were some fundamental truth,
and that sets the wrong tone.

Another concept that is more easily treated with game theory is "Fairness".
For FIFO service disciplines, where delays are uniform across users,
fairness just means equal allocation of throughput (or packet drops, etc.).
The crucial point is to decide to which entities should we try to be fair.
One choice of fairness entity would be individual human users (one person,
one byte/sec!).  Since we don't individually sign each packet, the
information needed to provide equal throughput to each human user isn't
available.  We must then choose some substitute fairness entity which
hopefully will give approximate fairness to human users.  The draft
advocates using fairness over streams as the correct goal (the more
accurate statement is that fairness over streams is what the RD proposal
will deliver (though only in very limited circumstances, see below), and
the draft deems RD fair).  In our FQ paper we adopt fairness to
source-destination pairs as our goal.  Neither of these is perfect.
Source-destination pairs may aggregate many human users inappropriately.
A single human user can open up several streams, and single streams can
contain packets from many human users.  My problem with the draft is that
it presents fairness to streams as the true goal, rather than a way of
approximating the true goal of fairness over human users.


*Conjectures on the performance of RD:

In the following, Random Drop refers to a simple (not quite vanilla)
version of the algorithm (i.e., a drop rate is determined from congestion
criterion, and packets to be dropped are chosen randomly).  The following
statements are based on back-of-the-envelope analysis, and have not been
tested through simulation.  These statements apply only to an equilibrium
situation, where the streams last forever, all transients have decayed, and
the gateway is dropping packets at some approximately constant rate.  I
will ignore all microscopic dynamical effects (such as detailed
correlations between fluctuations in individual source behavior and the
gateway drop rate), so that all streams will have the same fraction of
their packets dropped.  The statement that all streams see the same drop
rate is crucial to understanding the behavior of RD (but may not be true in
practice, where transients are relevant).  This equal dropping rate
property gives a very different impression of the algorithm than the
statement on page 8: "Dropping the randomly selected packet causes the
proper user, one whose demand is higher than the fair share, to reduce
demand".

One can model a window adjusting flow control algorithm with a transition
graph.  In the face of a random drop rate, the flow control algorithm will
execute a random walk over this transition graph.  Given a transition
graph, the occupation probabilities in the various states, which completely
determine the average window size W, depend only on the average rate at
which packets are dropped.  Consider several streams converging on a single
bottleneck gateway.  A stream's average throughput is proportional to
WP/RTT, where P is the packet size and RTT is the round trip time.  Thus,
if all of the streams have implemented the same flow control algorithm, so
that they all have the same average window size W, their average throughput
should be roughly proportional to P/RTT.  RD allocates throughput fairly to
streams ONLY if the quantity P/RTT is the same for all streams.

Now consider a network with one stream traversing n congested gateways
(each of which is dropping packets at some constant rate r).  This stream
will perceive a cumulative drop rate of (1-(1-r)**n).  For lare n, the
stream will achieve a throughput that decreases exponentially with n.  This
is a serious scaling problem, and one that renders RD only a short-term
fix.  One might argue that typical paths have only a single congested
gateway.  However, as networks grow, multiple congested gateways will
become more common.  Furthermore, if RD is implemented as a congestion
avoidance device, with dropping initiated quite aggressively to prevent
queue build-up, then almost all gateways will be dropping packets (even if
carrying a single well-behaved stream!).


*Comments about Fair Queueing:

OK, I'll admit it, I'm a big fan of Fair Queueing and I think its the
answer to flow-control, routing, type-of-service, hunger, and world peace.
BUT, I am aware that (1) FQ has some drawbacks (see the last section of our
paper), (2) major changes to gateways would be needed, and (3) the effect
of FQ on common flow control algorithms is somewhat unpredictable.  Thus,
FQ is not a low-risk, easily implementable proposal and can be rejected (as
far as the draft's recommendation is concerned) solely on those grounds.

However, the comments in the draft are somewhat disconcerting.  FQ is most
certainly a congestion control mechanism.  While, by itself, it does not
prevent congestion (and neither does RD), it does control the effect of
congestion.  Quite simply, it prevents your congestion from hurting my
performance.  The fact that I no longer require your cooperation to achieve
good performance is in sharp contrast to the comment on top of page 8 in
draft.  I think this insulation property is a crucial requirement for
building stable and robust networks.  How (and even whether) I choose to
control my congestion is then my own concern (and the gateway should not
try to dictate what I do).

FQ prevents any conversation (source-destination pair) from utilizing more
than its fair share of buffers or throughput.  This does two things.
First, as I claimed above, it protects conversations from other ill-behaved
sources.  Second, it makes dropped packets and high delays reliable
congestion indicators (there is no need to worry that this is due to a
burst of packets from an ill-behaved source).  This gives the stream better
data on which to base flow control algorithms.

In addition, the ability to provide different delays to different
conversations is a tremendous advantage over the FIFO based congestion
control algorithms.  Our simulations indicate that the "optimal" operating
point of the network with FIFO gateways is typically much worse than the
normal operating point of a network with FQ gateways, because the
low-throughput Telnet sources automatically get very low delays no matter
what the FTP sources do.

The comment in the draft about FQ aggregating users is certainly true.
However, the corresponding comment about RD not being fair because (1)
human users could share the same stream, and (2) a single human user could
open many streams, is not made in the draft.  Now, there is a serious
research and public policy question as to which entity fairness should be
delivered; all I know is that neither FQ nor RD get it right.
Unfortunately, the draft gives the incorrect impression that RD delivers
perfect fairness whereas FQ does not deliver either fairness or congestion
control.

------- End of Forwarded Message


----- End Included Message -----