From braden  Wed May 24 10:32:45 1989
Received-Date: Wed, 24 May 89 10:32:45 -0700
Received: from braden.isi.edu by venera.isi.edu (5.61/5.51)
	id <AA08627>; Wed, 24 May 89 10:32:45 -0700
Date: Wed, 24 May 89 10:32:19 PDT
From: braden
Posted-Date: Wed, 24 May 89 10:32:19 PDT
Message-Id: <8905241732.AA04601@braden.isi.edu>
Received: by braden.isi.edu (5.54/5.51)
	id AA04601; Wed, 24 May 89 10:32:19 PDT
To: end2end-interest
Subject: Minutes of end2end video meeting
Status: R


        End-To-End Task Force Video Meeting
                  March 20, 1989

    (Via MCI Video Link, thanks to Vint Cerf)

Attendees:

    Braden
    Aguilar
    Cheriton
    Cooper
    Deering
    Jacobson
    Nowicki
    Partridge
    Bob Sidebotham of CMU (guest)
    
ACTION ITEM SUMMARY:

    ACTION ITEM: Van to distribute notes about the Patricia routing
	algorithm (DONE).

    ACTION ITEM: Van incorporate distance-vector multicast router into
    4.4BSD kernel.

    ACTION ITEM: Deering to visit Milo Medin (of the OSPFIGP camp)
	and make sure multicast routing can be incorporated easily.

    ACTION ITEM: Cheriton to distribute Multicast MazeWar.
    
    ACTION ITEM: Cooper to see if he can distribute Satya's paper to us.

    ACTION ITEM: Van to give us citation that proves gateways need to
    give feedback to get us to the knee.  He also mentions an article
    by Batcher in Bell Systems Technical Journal in '76 or '78.
	

Multicasting (Deering):

    A new release of IP multicast software (host and RIP-based router
    code) is done, and has been tested in SUN OS 4.0 and BSD 4.3.

    This raised the question of when Berkeley will release multicast
    code.  Van answered: multicasting is/will be in 4.4BSD.  There is
    some chance that 4.4 will be delayed, but if it gets delayed too
    much Berkeley will do an interim release of the new networking code
    (with OSI stuff + Van's latest), and multicasting will be in that
    interim release.  He hopes for release by the end of summer '89.

    Questions were then raised about whether the multicast router code
    should go into the BSD release.  Brief discussion of the merits of
    SPF vs. distance-vector routing protocols and whether the
    distance-vector implementation should be distributed (with no firm
    resolution but see below).  Van volunteered that he would
    incorporate the multicast routing code and host code into 4.4BSD
    (including the router).  He also noted that the new BSD code will
    include the Patricia routing algorithm.

    ACTION ITEM: Van to distribute notes about the Patricia routing
	algorithm (DONE).

    ACTION ITEM: Van incorporate distance-vector multicast router into
    4.4BSD kernel.
    
    Braden: Question of whether OSPFIGP will beat out RIP and thus make
    the distance vector router worthless.  Partridge: DVMRP is not
    RIP.  Braden: So we could run SPF multicast routing algorithms in
    parallel with RIP? (Yes).  Jacobson: Problem with OSPFIGP is that
    its address fields are not large enough to include source address;
    need both destination and source address for rooted multicast
    trees.  However, the machinery is close to being there, and it
    should obviate the need for additional, multicast-specific routing
    protocols.  Noted that OSPFIGP introduces complexities due to its
    hierarchical routing.

    ACTION ITEM: Deering to visit Milo Medin (of the OSPFIGP camp)
	and make sure multicast routing can be incorporated easily.

    Braden: What about applications that use multicasting?  Cheriton:
    Multicast MazeWar is nearly done.  Jacobson: A multicast Xtrek is
    needed -- he suspects it would reduce traffic on NSFNET by about
    1/3 (!)  Braden:  Doesn't Xtrek need a multicast TCP?  Jacobson: Not
    necessarily, although Van still has the notes in his TCP on how to
    support multicast TCP.  Cooper: Satya has a replicated distributed
    file system (CODA) based on multicast, and a paper on it is due out
    shortly.  He noted that Satya's grad students are the ones at CMU
    who need to be persuaded to upgrade to the latest multicast spec. 

    ACTION ITEM: Cooper to see if he can distribute Satya's paper to us.

    ACTION ITEM: Cheriton to distribute Multicast MazeWar.
    
    Sidebotham: Considering putting multicasting in Rx.

    NTP also considering multicasting, and NNTP is working on a multicast
    protocol.

    Braden: Worried that Internet is headed toward making key decisions
    on the future of routing and that if we don't move soon, multicasting
    might be precluded.


VMTP (Cheriton):

    A release was made at the end of November '88.  Lots of people have
    picked up the code but there's been little feedback.  Dave talked
    to ANSI about VMTP; Greg Chesson was also at the ANSI meeting and
    Dave noted that XTP seemed to be getting a moderately hostile
    reception.

    It was noted that VMTP's current application interface in Unix is
    hard to use, since there are no light-weight processes in Unix and
    an asynchronous socket interface has not been provided.

    There is a version of Mazewars running on VMTP, that will be
    available soon.  Van asked whether Dave has looked at NFS over
    VMTP.  Dave has a student interested in modifying SUN rpcgen
    library to use VMTP.  They are writing a file server daemon, that
    could be expanded to do multicast file transfer.  Dave also
    observed that his group doesn't use UNIX or NFS much, which slows
    work on VMTP-related UNIX implementation and support.

    Dave mentioned that an Ada implementation of VMTP now exists and
    that there's some work being done w/Berkeley (?) to develop a
    transaction system call.

    Dave also mentioned he's looking into fast encryption algorithms
    for VMTP and that Merkle at Xerox PARC has a fast DES-like
    algorithm that runs at 4mbits/sec in software on a SUN.

Rx:

    Cheriton: Is it better to pass around state information explicitly,
    as Rx does, or to infer it, as TCP does.  Wonders if all the state
    information being passed around is useful.

    Sidebotham: Much of the state information is "kitchen sink"
    variety.  He included it in the expectation that it would be grist
    for experimentation.

    Braden: is the Rx rate control scheme, where receiver says
    "interpacket delay suggests congestion", actually useful?

    Van: interpacket delay is not useful in TCP.  Clocks are not
    fine-grain enough, and the Internet clumps packets anyway.  Warns
    that you really want the receiver to be very deterministic, or else
    the delayed feedback can result in instability.  Van suggested
    reading chapter 3 of any control theory textbook as a place to
    pursue this question further.  We need faster clocks.

    Much further discussion of control theory and its implications on
    the amount of control information to be passed within a protocol.
    Advice for Rx is inconclusive.

    Sidebotham: Suggests that Rx transmission scheme is more stable
    than TCP because it has no timeouts.

    Van: Actually, it is less "stable" (where we are talking about
    network stability) because it resembles Fast Retransmit without
    clamping (slow-start), and simulations have shown this can lead to
    instability.  Van cites his well-known unpublished paper (!!) as
    authority.

    Sidebotham: Suggests that the skew mechanism in Rx minimizes the
    retransmissions.

    Van: believes skew hard to do reliably and if you get it wrong
    you will spuriously retransmit.  

    Van then launched into a discussion of the merits of selective
    retransmission; it may be a bad idea over some paths, although
    it is useful over some paths.

    The problem is that we have only one-bit of information: the
    datagram was lost.  In the absence of more information, we must
    decide whether to treat loss as damage, and send more data ASAP --
    or treat the loss as congestive loss, and send less data ASAP.  Van
    says the trend is toward more reliable subnets, so it is better to
    assume loss implies congestion.  Selective retransmission is thus
    probably a bad idea.

    Further provoked, Van goes on to argue that selective
    retransmission is only a win if the delay * bandwidth product is
    much much greater than the packet size (or to handle "grotesque
    implementations").  His argument ran as follows.  When you incur
    loss, you (must) go into some form of slow start.  So knowing more
    than the first "hole" (where the first segment is missing) doesn't
    help because you can't send more than the first segment until you
    get an ACK for the retransmission.  Partridge: but when you send
    two segments after the first ack, how do you know where to put the
    second segment of the two (i.e. isn't it possible the holes are
    discontinuous?).  Van:  usually the gaps are continuous, so not an
    issue.

    Braden: What's the idea behind CMU's National File System?

    Sidebotham: Idea is to promote Andrew, and it is doing pretty
    well.  4 sites currently, 10 to 15 coming on-line shortly. 
    Apparently a very popular service.

    Braden: Is the source for Rx available?

    Sidebotham: Yes but I'm not happy with it.  Could we wait for
    awhile before delivery?

    Cheriton: Is the channel construct really useful?  How much locality
    of reference is there in the communication state?

    Sidebotham: Probably not.  Amortizes cost of calls in parallel,
    but they aren't common.

    Sidebotham: Is VMTP Entity ID concept sufficient to support
    migration of objects, or does Entity ID only work for processes?

    Cheriton: Was trying to solve process migration problem.  Entity
    IDs are not a complete solution.

    Sidebotham: Why does the transport protocol have to support ID
    notion?  Why not put it over the transport layer (in a session
    layer?) and let the transport protocol use a simpler mechanism?

    Cheriton:  Argues that ID mechanism ensures that transport layer
    independent of network layer.

    Van: Note that using an RPC-like paradigm across a long delay
    requires a very big unit, and this kills the network.  Packet
    sizes are going up much slower than network speeds.
    
Congestion Control (Jacobson):

    Braden asks Van to report on his congestion control work.  Van says
    he's been taking some time off to spend with his girlfriend
    (Cheriton is unimpressed).

    Lattice gas model:  In the 1970s there was an attempt to model
    gas turbulence using a simple model of balls placed on a
    lattice.  The idea is to perturb the balls so they move along the
    paths in the lattice and to have rules about what happens to balls
    arriving at interconnection points in the lattice.  Using such a
    model, with simple rules at interconnection points, you get
    turbulence.

    Van's thought was that since networks show signs of turbulent
    behavior and since networks are physical instantiations of lattices,
    we might be able to use a lattice gas to model a (sufficiently
    large) network.  He hoped to see analogs to the state transitions
    observed in the physical world.

    He lacks the funding to pursue this idea very far, but given some
    ARPANET data from one of the congestion collapses, he did a
    simulation to see what happened.  He got results reminiscent of the
    ARPANET, but the lattice gas transition required a much higher
    traffic level (x20) than was going through the ARPANET when it
    collapsed, and the gas required on the order of 100 nodes to
    change state.  He thinks early collapse of the ARPANET reflects the
    fact that the ARPANET is connected to higher speed Ethernets, so
    that part of the turbulence is caused by the high speed flows
    crashing into the low speed ARPANET.  But he doesn't intend to
    pursue this idea because modeling it requires tensors.

    Van also noted that one can think of the length between points in
    the lattice as corresponding to viscosity.  This led Cheriton to
    wonder if reducing the viscosity in the network would be a good
    thing.  Van's view was that while reducing viscosity might be 
    desirable, the cost would be exorbitant.  If we tried to lengthen
    network paths, we'd be moving closer to direct connectivity (with
    O(N**2) connect cost).  A hierarchical topology develops "hot
    spots".  To damp oscillations ("turbulence"), you have to have
    more "degrees of freedom" in the network, e.g., more cross-country
    links.
    
    Braden: how about more Van's more immediate ideas on congestion 
    control for the Internet?
    
    Gateways measure flow:  Currently hosts can figure out the
    pipe size in the network by injecting increasingly large flows
    and measuring the throughput they get.  Gateways currently
    can't do this, but gateways also need to be able to measure their
    load.  Van says the way to do this is to average the number of
    datagrams going through the gateway over an time interval that is
    the roughly the average RTT.  
    
    He mentions "token-ring" behavior as a reason to want average RTT.
    In the token ring model, we see e.g. a 1 second slot circulating
    every 30 seconds; this 1:30 ratio is a function of the topology.  A
    great range of simulations show that a network tends to organize
    itself, oscillating between maximum and minimum queues; this
    results in clumps of packets. Therefore, you cannot just measure
    queue lengths to do congestion control.  You must distinguish the
    reason for the queue: upstream, by hosts overloading the
    network) vs. downstream blockage, by other "clumps".  You don't want
    to react to downstream blocking by reducing offerred traffic,
    because this would simply ensure that when the "token" next
    appears, you will get less bandwidth than you should.

    Partridge; What's wrong with using the Jain, Ramakrishnan and Chiu
    scheme?

    Van: It scales poorly.  Essentially measuring return times of
    random walks from zero-length queue to zero-length queue.  It
    scales proportionately to the cube of the load on the system,
    soThus as load goes up, noise increases strongly.  RJC
    scheme only good to a load of 75%.

    Cooper: Could a gateway predict the RTT by pinging selected pairs
    of source-destinations observed by the gateway and averaging the
    results?

    Van: routetrace has shown that on long paths, the forward and
    return paths are almost always assymetric.

    Van then moved on to suggest a new scheme.  Recall his IAB workshop
    presentation which showed a simple loop at steady state into which
    one additional datagram was inserted.  The queue length at the
    first hop gateway jumped from 0 to 1 while the extra datagram was
    in the system, and then dropped to 0 after the datagram left, and
    the ack pattern resync'ed.  In other words, the queue length went
    to 1 for exactly one RTT and then back to 0.

    Now consider n steady-state connections, each with the same RTT,
    and some random perturbation.  Sum the queues at each instant and
    do a Fourier transform on the graph; the fundamental RTT would
    dominate.  With multiple RTTs, you get a series of peaks from which
    you can extract the average RTT.

    Now, note that the Fourier transform is equivalent to fitting a
    time series model to the inter-arrival times.  The inter-arrival
    time for packet n is:

	    I(n) = aI(n-1) + bI(n-2) + c

    In other words, the inter-arrival time of packet n is a function of
    the inter-arrival times of n-1 and n-2 plus some constant.  We want
    to estimate a, b and c.  Then b/c is the average RTT, a is the load
    hosts are putting on the gateway, and b is how much the hosts are
    listening to the gateway's congestion advice.

    The control policy ought to be Random Drop; the average RTT from
    this measurement would be used to clamp the control policy.

    Van wants to implement this mechanism in the NSFnet backbone
    switches.

    Braden:  What do we need to do about congestion today?  Are Van's
    TCP algorithms saving us now?

    Van:  No, we are being saved by the excess NSFnet backbone capacity
    right now.  TCP only can guarantee that we run the network just
    short of the congestion cliff.  Need algorithms in the gateways to
    push us back to the knee of the performance curve.

    ACTION ITEM: Van to give us citation that proves gateways need to
    give feedback to get us to the knee.  He also mentions an article
    by Batcher in Bell Systems Technical Journal in '76 or '78.

    Van: As an interim step, it will be sufficient to use a fixed
    percentage Random Drop algorithm.  The queue length in the gateway
    when you drop packets is greater than the delay * bandwidth
    product, so you are seeing one round-trip time worth of network
    traffic in your queue (a complete work cycle) and can make rational
    decisions from it.  As pipe size increases, this won't work; the
    number of packets in a full queue will no longer be a
    representative slice of traffic.  Then we will hit scaling laws for
    queue-based algorithms.
    
Limits of the Internet Architecture:

    Braden: What experiments could test limits of the Internet
    architecture?

    Two problem areas were suggested: (1) add long-lived,
    high-bandwidth streams (e.g. real-time video); (2)  current traffic
    patterns persist but bandwidth*delay product becomes very large.

    Van: Real-time video on a gigabit network looks like FTP on an
    Internet.  So end points and intermediate points can adapt to the
    existence of a video flow.

    If you need to guarantee bandwidth you can have the gateways watch
    distinct flows (if you identify them somehow).  Then the gateway
    will know the total demand and individual demand.  Then policies
    can be enforced to properly apportion resources over the demand.

    The place you are likely to have problems is that scaling up the
    bandwidth means that many connections will be finished sending
    before the first ack comes back.  There's no time for the sender to
    learn anything about the network.

    A three-way analysis was suggested: streams, TCP-like connections,
    and "other".
    
Next Meeting:

    June 7-8 at LBL, Berkeley, California.