From braden  Tue Dec  8 14:12:38 1987
Received-Date: Tue, 8 Dec 87 14:12:38 PST
Received: from braden.isi.edu by venera.isi.edu (5.54/5.51)
	id AA29748; Tue, 8 Dec 87 14:12:38 PST
Date: Tue, 8 Dec 87 14:11:37 PST
From: braden
Posted-Date: Tue, 8 Dec 87 14:11:37 PST
Message-Id: <8712082211.AA15336@braden.isi.edu>
Received: by braden.isi.edu (5.54/5.51)
	id AA15336; Tue, 8 Dec 87 14:11:37 PST
To: end2end-interest
Subject: Oct '87 Meeting Minutes -- Final Version
Cc: Lixia@xx.lcs.mit.edu, kseo@g.bbn.com, stevens@d.isi.edu
Content-Length: 19519
X-Lines: 497
Status: RO





                   End-to-End Task Force Meeting
                      October 22-23, 1987
                      
                      MIT Endicott House
                     Dedham, Massachusetts
                     
                     
Minutes taken by: Eric Cooper, CMU

Attending:

  Bob Braden, ISI
  Joel Emer, DEC/MIT
  Craig Partridge, BBN
  Eric Cooper, CMU
  Steve Deering, Stanford
  Bill Nowicki, Sun
  Dave Clark, MIT
  Van Jacobson, LBL
  
  Karen Seo, BBN  (visitor)
  Lixia Zhang, MIT  (visitor)
  Jim Stevens, Rockwell (visitor)

Absent:

  Lorenzo Aguilar, SRI
  Gerd Beling, FGAN
  Dave Cheriton, Stanford
  Jon Crowcroft, UCL


1. STATUS REPORTS


** Joel Emer:

Joel reported on the status of MIT's Common system, now renamed Mercury. 
The Lisp veneer and a simple stub generator for C are working.
The project is looking at asynchronous interfaces for RPC. In C, the
interface returns an event flag to the caller, which can later wait for
that event. In Argus, a mechanism called a "promise" (like a Multilisp
"future") is returned. They are currently using a thin layer over TCP to
multiplex multiple streams.

Joel mentioned difficulties in mapping the Mercury notion of stream-call
onto VMTP's streaming model, for example the impossibility of buffering
several requests or responses in a single VMTP segment. (Dave Clark added
that Mercury would like to see more of the buffering details than current
transport protocols expose to an application.)

With Mark Lambert, Joel is working on adding Mercury client stubs to GNU
Emacs, to support remote invocations of the editor.

Joel took an action item to write a description of Mercury's transport 
protocol requirements.


** Craig Partridge:

Craig has mainly been working on network management.  In the area of RDP, he
has been looking at acknowledgment strategies for lossy environments.  He
presented a paper on RTT estimation with Phil Karn.  

BBN has money to deploy Cronus clusters on the Wideband net, and he
expects to do some work on transport protocols in this area.
Further work on IP multicast at BBN is awaiting DARPA funding. 

About 5 copies of the multicast code have been distributed. 
(Joel Emer asked about the Internet impact of providing multicast
facilities with no access controls.)


** Karen Seo:

Karen discussed TCP/Satnet performance measurements.  The SATNET/ICB
community has formed a taskforce to investigate SATNET performance issues
and in particular to explain the response problems that the users were
seeing (TELNET/FTP).  This measurement taskforce is composed of members
from the SATNET sites (UCL, NTA-RE, CNUCE, RSRE, DFVLR, BBN) and has been
chaired by BBN (originally Claudio Topolcic, currently Karen Seo) .  They
started by performing tests to measure throughput, delay and packet loss
at the IP level.  This work is still ongoing.

Under normal traffic conditions, the taskforce has found a mean RTT of 2
seconds with high variance.  Packet loss rate (true "bit rot", not
congestive losses) has been under 1%.  (This was followed by some
discussion as to whether link-level forward error correction, currently
turned off, should be enabled.  Dave Clark stated his rule of thumb that
the link level should try to ensure less than 1% packet loss.)

This work has uncovered some problems -- multiple-of-8 bug, throughput
bottleneck in UK, channel bug.  But so far, much of the performance
bottleneck seems to be due to bad interactions between TCP/IP and Arpanet
congestion (large and variable delays, packet loss).  The taskforce has
recently begun work on TCP-level performance measurements.  In
particular, Jon Crowcroft (UCL) has been building a TCP-measurement tool
which incorporates some of Van Jacobsen's TCP windowing and
retransmission strategies.


** Eric Cooper:

Eric described the use of VMTP in Mach and some problems they have
encountered with it.  

  *  The retransmission algorithms in the UNIX implementation of VMTP do
     not work reliably across gateways with appreciable delay.  Although
     the ESP subprotocol could be used for RTT calculations for adaptive
     retransmission, it is not currently done.  [A later discussion,
     described below, suggested the use of Slow Start in VMTP].

     
  *  The security aspects of VMTP are largely irrelevant in the Mach
     environment, which uses host-to-host rather than end-to-end encrypted
     communication.  
     
  *  The 16K limit on a single transmission complicates the Mach code,
     which needs to be able to ship an entire address space.

The student slated to adopt IP multicasting into Camelot has not yet begun
work.


** Lixia Zhang:

Lixia attended a design meeting on routing for the next generation Internet
(i.e. the successor to EGP). The designers agreed that routing between
autonomous systems should not depend on traffic load, and that it should
support type-of-service routing.

(Dave Clark interjected that types of service should include cost/billing
in addition to throughput, delay, and reliability. He commented that the
Internet cannot allow TOS to be used in an unregulated way, since people
will abuse it; charging real dollars may be the only workable scheme.
He noted that in a routing discussion at SIGCOMM, all agreed
on the necessity for multiple administrative domains and a billing
mechanism.)
 
(It was pointed out that TOS is relevant to end2end services through the
design of the host interface to TOS.  Van Jacobson added that the
Berkeley implementations of network applications such as telnet and ftp
now advise the transport layer of their preferred type of service using a
Set Socket Option call.)
 
 
** Steve Deering:

Steve has organized the VMTP/IP multicast distribution tape and put out a
new release. Changes to VMTP include a new checksum algorithm and headers in
Big Endian order only. Byte order of the data is still indicated by a bit
in the entity ID.  Steve is spending his time fixing bugs in VMTP.

A summer student at Stanford has changed the domain name resolver to
use multicast (via either UDP or VMTP) in its queries; however, there is no
plan or manpower to extend this work.


** Bill Nowicki:

Bill presented some performance figures.  TCP runs at 4.8 Mbit/sec
(630kbytes/sec) between Sun-4s on an Ethernet, and 6.1 Mbit/sec between
Sun-3s on a Pronet-80 token ring.  The new implementation of virtual
memory paging over NFS turned out to be 2-5 times slower than the old
system; tuning is in progress.

Bill, Van Jacobson, and Dave Clark agreed to cooperate to make a TCP
performance test for SUN-4s over a Pronet-80 ring happen.

Sun has changed the socket buffering mechanism to allow limits by either
bytes or packets. He mentioned that NFS retransmission timers do "back
off" but do not measure round trip delays dynamically. This reminded us
of the need to write a "Transport Follies" paper, about the pitfalls of
reliable-transmission algorithms in the application layer (eg above
UDP).  It was suggested that someone should actually write this paper,
perhaps for SIGCOMM next summer.

(Dave Clark took the group on a detour about OS/2 and the future of
operating systems.)

Sun also has a new version of the RPC protocol spec.  There was general
discussion as to whether the NFS v2 spec should go out as an RFC, or
whether we should wait for v3.


** Dave Clark:

Dave reported NETBLT performance of 20 Kbit/sec over the Arpanet. Testing
over SATNET seems to be waiting for Jon Crowcroft to take the initiative,
since the best path (involving no Arpanet slough) is UCL-Norway.

Dave is working on dynamic rate adjustment algorithms. He has found that
Gallagher and Bertsekis have promising algorithms.

A student is working on an X-based simulation tool, which should be
available in a few months. He said "it only really looks good in color".

(General discussion about Greg Chesson's work on a Protocol Engine
for FDDI followed.)


** Van Jacobson:

Van mentioned that he has been experimenting with bandwidth allocation in
gateways, using a probabilistic algorithm to detect streams. 

Van mentioned that he would like to include the IP multicast code with
the forthcoming public domain Berkeley networking release.  Unfortunately,
it seems that this is not possible.

The rest of Van's report was deferred to the discussion of performance.


** Bob Braden:

Bob described work on NSFnet statistics gathering by promiscuous spying
on Ethernets, and on a background FTP service using batch mode
third-party control.

--------

2. NETWORK COMPUTING FORUM

They are concentrating on a workstation+LAN environment, looking at
vendor-independent protocols, services, and architectures for "network
computing".  Currently 75-80 member organizations.  There will be a
meeting of 300-400 people in November in the Bay Area.

Several of the NCF technical groups appear to have interests in common with
our TF, in particular the groups looking at core network services, network
administration, and wide area networking.  However, Dave Clark cautioned
about the dangers of "dancing with an elephant", and pointed out the 
primary TF focus on research is largely orthogonal to the goals of the NCF.
There was a consensus that no action is appropriate at present.

Bill Nowicki took an action item to find out about attending the NCF meeting.

--------

3. DIAMOND AND POSTSCRIPT

There was a discussion of a Diamond mailing list, for interchange
of documents among those with access to Diamond.

It became apparent that PostScript printing capabilities were available to
all TF members present, and since Diamond can produce PostScript output, it
represented a better choice for interchange.  It was suggested that 
Jon Postel ought to accept RFC's in Postscript form.

Van mentioned the availability of a PostScript previewer (ups/xps)
for X from Berkeley.  He will send out a message with the information on
FTPing the TAR file(s).

Eric mentioned an X-based drawing editor at CMU that produces PostScript
output.

--------

4. MULTICASTING

When should IP multicasting be blessed as an Internet standard?  The IAB
has asked for plans and schedules for projected Internet standards.

The following schedule was proposed:
	January 1988	- presentation to IAB
	July 1988	- adoption as Internet standard

Unfortunately, the threat of standardization forced Steve to admit that that
he wants to change the protocol!  He is considering modifications to
eliminate the Create, Join, and Leave Group operations, leaving only the
Confirm Group operation.

Steve said that he would have a revised RFC by the end of December.
Revision of the spec  makes the proposed schedule somewhat questionable,
but we decided to proceed with it as if it were real.

--------

5. DATA REPRESENTATION

Dave Clark began the discussion by noting that one of his students had
"benchmarked" X.409 (ASN.1) against Courier by encoding the same data
structure into C for each; however, he is not sure his test really used
ASN.1 as it is intended.  He suggested that it is probably not useful to
"benchmark" ASN.1 without a well-defined model of the application's usage
model.

There followed a discussion of the philosophy of ASN.1/X.409, and
self-describing data versus the "compiled" Courier and XDR styles.

--------

6. PERFORMANCE

Van has compared the performance improvements of TCP with Slow Start and
his new dynamic window sizing algorithm.  He found that Slow Start gives
a 70% improvement, and dynamic window sizing gives an additional 30%
improvement.

He described the dynamic window algorithm, which opens exponentially
until the first drop is experienced, falls back to half of that window,
and then increases linearly until another drop is experience.  Basically,
he is using packet loss as a measure of congestion; this is the main
difference between his algorithm and the one due to Raj Jain, who uses a
"congestion experienced" bit (also known as the DEC bit) to measure
congestion. Van's approach requires no modification to headers or
gateways and exhibits better fairness in the presence of bad guys.

He currently does not attempt to filter drops, since the dominant factor in
the current Internet is the gateways' reluctance to drop packets.
  [[****I'm sorry, I don't know what this sentence means... Bob***]]

He has concluded that for improved performance, gateways should be more
aggressive about dropping packets when their queues get long, and that 
this dropping should be RANDOM in order to discriminate against bad
hosts. This random loss has some excellent properties, including the
general statement that cooperation works better than competition among
hosts if packets are randomly tossed.

"Anything that reacts faster than one RTT must be unstable, because
RTT is the fundamental frequency of the system."

It is necessary to distinguish intrinsic ("pipeline") delay from
queue delays; the window size should be bounded by the intrinsic delay.
Satnet has much more intrinsic delay than the Arpanet. Van finds it
hard to get failures of his algorithms on the Arpanet, which nearly
always yields 6-packet windows with his algorithms.

Van has a very general argument that a stable windowing algorithm must
necessarily involve exponential decrease of traffic but linear increase.
The argument is based upon the solutions to the first-order fifference
equation governing window-based round-trip delays (there is no time-
dependent noise, so RTT is the fundamental frequency).  

Rate-based algorithms, on the other hand, are governed by a second-order
equation (time-dependent noise, caused by different path lengths), which
may have oscillatory solutions; Van does not know how to guarantee
stability of such algorithms in general.  There were several interesting
periods inside and outside the meeting when Van tried to convince Dave
Clark with his arguments.  Once Upon a Time, there were Good Oscillations
and Bad Oscillations...

Dave Clark described the bandwidth-reservation ideas that he is exploring
for NETBLT.  Suppose you have n flows using up all the bandwidth, and the
n+1st flow comes along.  He has an algorithm that will converge to each
having 1/(n+1) of the bandwidth.

Bob Braden asked Dave Clark if he planned to try Van's idea for adapting
Slow Start to NETBLT, subdividing the delay interval rather than the
window.  Dave thought the learning time using Slow Start would be
excessive, and instead he plans to pursue the bandwidth reservation approach
just discussed.

Van noted some interesting effects in multi-conversation stability:
although bandwidth sharing was always quite good, the odd numbered
arrivals managed to gain a slight advantage, which was most pronounced
for conversation number 3.  Lixia mentioned a Chinese proverb that
explains the situation: when two people are fighting over something, a
third can take it away from them both.

At the last meeting, we discussed the "freezing" of Van's TCP
improvements into a TCP/IP update for 4.2 and 4.3BSD.  Van assured us
that Mike Karels is cooperating on this goal, and that it should be
accomplished soon.  The package will include TCP, IP, ICMP, UDP,
and socket code.

Van expects the next version of this package will include caching of
RTT, MTU, and loss statistics in kernel.
 [[** IS THIS RIGHT?]]

--------

7. TRANSACTION PROTOCOLS

Jim Stevens presented an introduction to TTP, a reliable message
protocol.  TTP was developed as part of the SURAN effort, and is slated
to be implemented by John Hight at SRI.  Unfortunately, Jim Stevens is
not funded to do any more work on TTP, although he is continuing work on
his own time as part of a PhD thesis effort.

TTP uses port addressing, and limits a message to 4K bytes (because of
the size of a selective-retransmission mask).  [Like all Birrell&Nelson-
based protocols] it allows only one outstanding message at a time
per connection, i.e., per socket pair, and uses a quiet time at startup
to get rid of old duplicates after a crash.

TTP uses a three-level message number/segment number/retransmission
number scheme, and does selective retransmission using a 16-bit bitmask.
Since retransmissions are explicitly numbered, calculation of RTT and
related timers is simple, and is completely spelled out in the protocol spec.
It was suggested in subsequent discussion that it may be TOO complete!

TTP does not require request/response alternation, but does take
advantage of it when it occurs for piggybacking of ACKs, achieving
minimum packet exchanges.  [TTP embodies ideas that Robert Cole presented
to the Task Force two years ago and that were incorporated into the UCL
ESP protocol]. Eric Cooper observed that in this respect TTP might meet
the requirements of Mach better than VMTP. Joel made a similar
observation about Mercury -- strict request/response is not quite the
right model.

It was felt that publishing the RFC on TTP would be worthwhile, even
if Jim does not have a chance to update the current draft.  

It was observed that NFS places a lighter requirement on its transport
protocol -- it needs at-least-once delivery, but not sequencing.
On the other hand, there are few applications like this.

Bob Braden suggested that Slow Start would work well to provide "soft"
flow control for transaction protocols. It would be particularly effective
with a cache of RTT experienced.  Clark pointed out that over DS3
networks, almost everything, even FTP's, will look like transactions.
It would seem desirable to try Slow Start in VMTP.

Van and Bill plan to try Slow Start with NFS and Sun RPC.  

--------

8. FUNDING AND FUTURE DIRECTIONS

A number of possible funding sources were discussed, including DARPA, NRI
(Bob Kahn), NSF, NASA (Barry Leiner), DOE, DCA, RADC, NBS, the RBOC's,
Bellcore, and Bell Labs.

A discussion of where to go next produced the following suggestions:
completion of multicast work, transaction protocols, reliable multicasting,
and multimedia streams.

Craig suggested a special issue of IEEE Computer publication on transport
protocols, including transaction protocols.  It should be possible to get
paper(s) on experience with RPC -- e.g., Birrell (Xerox), Sun NFS, Cronos,
Andrew file system at CMU, and REX.


9. NEXT MEETING

There will be an attempt to schedule a video meeting in January,
and a physical meeting in the Bay Area (at Sun?) on April 12 and 13.

--------

10. ACTION ITEMS

Short-term: 

   a. Send message to TF list with info on ups/xps retrieval --
      Jacobson.
      
   b. Run TCP performance test between Sun-4's over Pronet-80 ring --
      Nowicki, Jacobson, Clark.
      
   c. Distribute REX documents to TF membership --
      Clark [DONE].
      
   d. Find out status of IP multicasting implementation in Butterfly
      gateway --
      Partridge.
            
   e. Write description of Mercury's transport protocol requirements --
      Emer.   
      
   f. Attend NCF meeting if possible --
      Nowicki  [DONE].
      
   g. Talk to Barry Leiner about TTP politics --
      Braden [DONE].
      
   h. Talk to Jon Postel about Postscript for RFC's --
      Braden [DONE].
      
   i. Arrange next meeting --
      Braden.
      
Longer-term:

   a. Write "Transport Follies" paper --
      (unassigned).
      
   b. Put out 4.*BSD TCP/IP update -- 
      Jacobson (and Karels).
      
   c. Revise multicast RFC.
   
   d. Prepare TTP RFC --
      Stevens.
      
   e. NETBLT test over SATNET (NTA-UCL):
      Crowcroft
      
   f. Try Slow Start in NFS/UDP --
      Jacobson, Nowicki.
      
   

