From braden  Wed Nov 30 13:26:18 1988
Received-Date: Wed, 30 Nov 88 13:26:18 PST
Received: from braden.isi.edu by venera.isi.edu (5.54/5.51)
	id AA00229; Wed, 30 Nov 88 13:26:18 PST
Date: Wed, 30 Nov 88 13:26:04 PST
From: braden
Posted-Date: Wed, 30 Nov 88 13:26:04 PST
Message-Id: <8811302126.AA02848@braden.isi.edu>
Received: by braden.isi.edu (5.54/5.51)
	id AA02848; Wed, 30 Nov 88 13:26:04 PST
To: end2end-interest
Subject: Minutes of End-to-End TF meeting
Status: R

It has been our policy to send copies of the completed minutes of TF
meetings to the -interest list.  Unfortunately, there has often been a gap
between our policy and our actions.  I want to catch up now, by sending out
minutes of meetings since April 88, in this and following messages.



                    End-to-End Task Force Meeting
                        
                          April 11-12, 1988 
            
                          Sun Microsystems
                          Mountain View, CA
                          
                   MINUTES TAKEN BY: Lorenzo Aguilar
                          
Attending members:

  Lorenzo Aguilar, SRI
  Bob Braden, ISI
  Dave Cheriton, Stanford 
  Eric Cooper, CMU
  Steve Deering, Stanford
  Joel Emer, DEC/MIT
  Van Jacobson, LBL
  Bill Nowicki, Sun
  Craig Partridge, BBN
  
Absent:

  Gerd Beling, FGAN
  Dave Clark, MIT
  Jon Crowcroft, UCL


The meeting agenda was:

			April 12

1. STATUS REPORTS
2. RECURSIVE RPC ARCHITECTURE
3. REPORT ON NETWORKING AT SUN
4. LARGE FILE TRANSFER PERFORMANCE
5. END-TO-END PROTOCOLS TF, WHITHER ? (First Day)

			April 13
			
6. RATE-BASED BUFFER MANAGEMENT FOR GATEWAYS
7. MULTICASTING
8. MERCURY TRANSPORT
9. END-TO-END PROTOCOLS TF, WHITHER ? (Second Day)
10. ACTION ITEMS



1. STATUS REPORTS

** Bill Nowicki **

SunOS Release 4.0 has been announced and will start shipping in 2-3 weeks. 
It includes Van's new TCP.  Subsequent improvements that Van may make will
be included in later updates to 4.0, although the socket interface will
be frozen.

Bill has done some work on an adaptive timeout for RPC, but nothing to
report yet.

AT&T is now licensing Wollongong's TCP/UDP, and System V 4.0 will have
the new TCP/UDP interface. An effort will be made to reduce interface
differences between BSD and System V.

Sun will produce about 30K tapes for the 4.0 release. They are leasing a
jumbo-jet to carry the tapes and documentation to Europe, which will
finally give us an experimental test of the bandwidth of this approach!
Sun is considering 6-7 month release cycles and the possibility of fixing
release dates while leaving open which features will be included in new
releases.

The NFS Revision 3 spec that the Task Force reviewed has been changed
again to accomodate the Macintosh file system.  NFS will be used to
support the Macintosh file system with a product called TOPS.  The
current TOPS protocol is essentially to be thrown away and NFS will be
used by running Sun's RPC over AppleTalk transport (DDP, ATP).  The Apple
Transport Protocol ATP follows a request-response model, while DDP is
equivalent to UDP.  Bill pointed out that this should prove that Sun XDR
and RPC are independent of transport protocol, as claimed.

Bill has made little progress on TCP measurements across a Pronet 80 ring.
Sending 3K bytes per segment on Pronet-80 vs. 1K bytes per segment on an
Ethernet gave a throughput improvement of x2.  He is beginning tests
on an FDDI prototype.

** Van Jacobson **

Van and Mike Karels have been discussing the internals of the BSD memory
management. Van has hacked a kernel that allocates two-size memory blocks
(64K, 2K) depending on type of request. The idea is to let processes use
more memory, but for a shorter time.  Sun's 4.0 will include memory
management developed jointly with Berkeley.  It has copy-on-write which
takes advantage of the memory-mapped I/O in SunOS 4.0.

Van made the observation that outboard protocol processors will not
improve overall performance much, because the important overhead is the
protocol interface to the system bus and memory.  Van wants to try the
conversion of user (page) units to net-size units only in the TCP
(transport) layer, rather than in the socket layer (or above), as the
system currently does. The idea to minimize memory accesses to the data,
since memory access is the bottleneck.  For example, checksumming should
be combined with this one copy.  Joel Emer remarked that this might be
the wrong optimization, since data really has to be marshalled/
demarshalled in the application layer anyway; it was suggested that mbufs
be passed between the transport layer and the marshall/ demarshall code,
and that checksumming should be done during marshalling/ demarshalling.
Dave Cheriton questioned the advisability of doing so. Van indicated
that, in his view, the right model should allow for marshaling/
demarshaling at some level of the layer stack, contingent on a particular
architecture.

Mike Karels wants to have a very thin TCP user-interface layer which can
be layered on top of diverse service layers.

Van pointed out that the public domain release of the BSD networking code
has been announced.  This covers all the network code in the kernel
(vmunix), in three pieces: TCP upgrade, non-TCP network files that have
changed, and the rest of the 4.3 networking code (some files have changed
since 4.3).  They do not plan to ship vendor tapes of the release until
summer.  Bob Braden observed that all vendors should have access to the
new TCP features, not only those with Internet access.

A rumor was spread that the next version of Ultrix will NOT include 
Van's new TCP [This turned out to be FALSE: RTB].

There are two major extensions that did not make it into this release:
IP multicasting, and per-route control of MTU, socket buffer sizes, etc.
The latter includes cached RTT's and congestion window sizes.  Van hopes
to include these extensions in the next BSD release, which Mike Karels
plans for summer 1988.  This release will have only mods external to UNIX
kernel. 

Van acknowledged that he has a partially-completed RFC on TCP timers.

ACTION: Steve Deering to meet with Mike Karels to discuss multicasting
I/O interface. Mike wants to preserve the transparency of multicast
to network interfaces.

ACTION: Joel Emer, Van Jacobson, and Bob Braden will checkout rumor 
about Ultrix, and attempt to fix it if it is broken. 

** Eric Cooper **

Eric reported that Mach is using TCP rather than VMTP  for moving messages 
between hosts.  The semantics of Mach IPC match TCP streams better,
and also TCP's performanc beats VMTP's, thanks to the more effective RTT
estimation of TCP.  Typically, a Mach host maintains a number of TCP
connections (about 10) which remain open until an "unactive" timer expires.
There also is a UDP-based keepalive mechanism operating outside the TCP
connections.

Mach maintains many protocols in the kernel to handle well-behaved
cases.  Errors and exceptions are handled by user-level protocol code.
This attains performance and generality benefits of two approaches.

IP multicasting is being used to multicast Satya's RPC's in a file
server system.  CMU is interested in the UCL work on bulk multicast,
since they are doing similar work.

Eric is now working in the Nectar fiber optic bus prototype which will
have 32 nodes. The communications board offloads the CPU from XON, XOFF
processing, among other things.  It is being architected as a backplane
for heterogeneous machines like systolic arrays.  The controller
processor will be a Sun SPARC chip.  Several prototype boards will be
built during next months, and a net with a few nodes will be ready in the
Summer.

The Nectar transport model follows request-response IPC, and it takes
advantage of unidirectionality of most applications; this produces a
simpler state machine.  This communications board will handle FDDI rates
(100 Mbps), with a connection setup time around one microsecond.  The
main focus of his effortis  moving functionality from host to board, 
following the same lines as Dave Clark.  Eric is considering marshalling/
demarshalling on the board.

There followed a spirited debate about inboard vs. outboard protocol
processing.  

 * Van: offloading functions to outboard processors implies moving control 
   info across processor boundaries; there is no natural cut-point in
   the protocols. 
   
 * Eric: the Nectar communication controllers will have all the info they 
   need.
   
 * Dave Cheriton: the VMTP NAB will move data across memory/bus at bus
   speed and across bus/net at net speed. It reduces the memory bandwdith
   load on the CPU's memory hierarchy.
      
 * Joel: DEC has generally found that board technology always lags
   behind central processor technology, favoring onboard processing.
   
 * Van: Cross transfers do not justify a processor dedicated to 
   communications; synchronization overhead may 
   negate gains in data transfer speed.

** Steve Deering **

Steve has generated a new multicasting RFC, and has submitted the 4.3BSD
IP multicast implementation to Berkeley. Note that the code sent to
Berkeley no longer includes a multicast router.  Steve has a new
draft proposal for a multicast routing algorithm, and is launching a
new effort in collaboration with BBN to implement multicast routers. 

Craig reported that BBN now has DARPA funding and some people to work on
multicast routing in Butterfly and 4.3BSD gateways; however, this funding
will disappear around August 30.  On 4/15, Steve Deering will meet at
Stanford with BBNers who will be doing this work:  Dave Weizman (BSD),
Rob Foster (Butterfly), and possibly Craig Partridge. Multicasting is
getting high-level attention: Ross Callon and Marianne Gardner as well as
Craig himself are all interested in putting a little time into it!  The
Wideband Net group is also interested.

Bob Braden expressed concern about the BBN proprietary problems
biting us again; Craig promised to be sensitive to the issue.

Steve also attended an IETF meeting and talked about multicasting.

ACTION: Bob Braden and Dave Clark to try to persuade DARPA to not
transfer this funding to another activity.

** Dave Cheriton **

Dave Cheriton has talked with Navy's SAFENET folks about VMTP to fill
their "real-time protocol" requirement. NBS has been tasked to look for such
protocols. One of the desired features is deadline scheduling support.
Dave observed that REX was intended to address real-time support, but at
the time of the London meeting (Sept. 86), REX was lacking many real-time
features.

Dave has gotten little feedback on his VMTP RFC. He is now revising the
UNIX VMTP implementation.  Some companies, like HP, are experimenting
with the current UNIX VMTP version.  Eric Nordmark, one of the student
authors of the UNIX VMTP, will be coming to Sun in May to port VMTP to
SunOS. The main issue is how much of the SunOS has to be changed in order
to accomodate VMTP. The current UNIX VMTP is compatible with the previous
BSD release, and Steve Deering believes it will be compatible with last
week's BSD release.

As proof of the widespread acceptance of his protocols, Dave pointed out
that in Finland, the V-System is being used to control a pulp mill.

Dave Lynch has asked D. Cheriton to talk about multicasting, VMTP, and
the NAB at the ACE conference in September.

ACTION: Bill Nowicki, Dave Cheriton: Oversee porting VMTP to SunOS

** Craig Partridge **

Craig expects to finish net management work by June 88. His main
concern at present is to drive the MIB (Management Information Base)
working group to make some good, tough decisions.

Those involved in the selection/definition of management protocols for
the Internet recognized that HEMS was technically better than SGMP or
CMIP. However, SGMP is already a product and CMIP is an OSI standard,
and thus Craig decided to widraw HEMS in order to facilitate attaining
closures about a short term solution (SGMP) and a long term one
(CMIP).

As a result of working with external representations in
network management, Craig and Marshall Rose have written up a review
of some of the external data formats (XDR, ASN.1 and Apollo's NDR)
in which they conclude that Apollo's NDR violates the M * N rule.

RFC 1009 will be a guide for the IP host-requirements document to be
produced by the IETF. Craig is writing the material concerning
transport requirements.

Craig and Len Bozack hope to write a paper on driving TCP/IP at Gbps
rates.  Len has placed several interfaces in a single board; IP
headers are checked only to choose a forwarding interface. This
switching could be further speeded-up if the IP checksum processing
could be bypassed.  Van commented that he has done experiments running
TCP at about 170 Mbps; Van wants to reach I/O channel rates of 800
Mbps.

Dave Cheriton warned about investing effort and resources in hw/sw
speed-ups of outdated protocols. Van replied that the resulting
performance improvements will keep honest those who call for new
protocols to handle high speed nets; speed limitations imposed by
technology and implementations are not enough justification for new
protocol development.

The discussion about boosting protocol performance continued through
lunch, after Craig finished his report.

Dave Cheriton and Dave Clark have proposed that offboard protocol support
should include "intelligence" similar to that of intelligent disk
controllers which do error code insertion, selection of disk areas, and
other optimizations. On the down side, the increasing diversity of
machine architectures is an obstacle to the generality of offboard
solutions like the NAB, since these cards are not portable across
architectures. Several machines (like the connection machine) can be
regarded as peripherals for a general purpose host, but such machines do
not want an intermediary to access the net because they need fast access
to the incoming data.

** Joel Emer **

Joel has been implementing RPC calls from within Gnu-Emacs. These calls
enable invocation of services like bibliographic references and news.
It takes 3 lines of code to use these RPCs; this economy is possible
thanks to the LISP function support.  Work has been continuing on the
ARGUS implementation; it is still fragile but close to being up.
There is now a way to handle out-of-band messages, like aborts, which
should bypass other stream traffic.  There are also ways to handle
communication failures so that "at-most-once" RPC semantics be fully
enforced-- on top of a stream protocol. In addition, there has been a
shortening of the window during which a server may receive an RPC call
and die before answering.

It was pointed that keeping connections open, even without traffic,
provides a conduit to ping servers when doing database transactions. Dave
Cheriton replied that such "pinging" channel incurs the cost of keeping
connection state.

Joel reported, for absent Dave Clark, on progress on the MIT network
simulator. The simulator displays diagrams of net configurations with
information such as queue lengths displayed for particular nodes. The
display is dynamically updated while the simulation is running. X.10 is
being used, since X.11 is painfully slow.

ACTION: Joel will report to Dave Clark the urgent desire of task force 
members to start experimenting with the network simulator.

** Lorenzo Aguilar **

Lorenzo reported on internal R&D work proposed to SRI. It consists of
developing an experimental prototype that aggregates computing power from
several networked Sun's for solid-modelling 3D view generation.  This
prototype will aim at providing good quality realistic 3D-graphics
without specialized hardware. A major problem will be to attain
acceptable interactive response times since large amounts of traffic will
exchanged over LANs, which may involve bridges that forward packets.

Lorenzo decided to use octrees, instead of ray tracing, in order to
control the selection of operational points within the ranges of
several important tradeoffs. In several aspects of the view generation
processing, one has to trade degree of parallelism versus network
traffic (and the consequent delays). For example, lighting and shading
of solid objects can be all done at the machine where the prototype
operator sits (originator); this incurs minimum network traffic but
has zero parallelism for this stage of the processing. In contrast,
the lighting and shading can be distributed among all the cooperating
machines attaining maximum parallelism, but then there is a large
volume of traffic from the cooperators towards the originator. The use
of octrees will enable the selection of the parallelism/traffic mix
desired, depending on LAN traffic and computational load at the
cooperators.


** Bob Braden **

The RFC that Bob and Van are co-authoring has been delayed as a
consequence of Van's teaching load.

Bob then summarized the current political environment of the task force.

Representatives from several US Government agencies have formed the
Federal Research Internet Coordinating Council (FRICC). This group makes
some decisions that may affect networking research funding, and plans to
move towards the goal of a National Research Internet.  The FRICC does
not have any obligation to take the IAB's advice, but it has expressed
interest in benefiting from the IAB expertise.

The main networking concerns of the FRICC deal with issues like
administration, accounting, access control. An unresolved issue is
whether each FRICC member network will have its own Network
Information Center. There is a possibility of sharing high-speed
lines, but probably not be sharing of low-speed ones (like 56 kps).

The IRI TF will issue a report on Internet architecture improvements
needed to meet new demands-- before OSI arrives. The report will
recommend research areas worthy of funding. The FRICC and other
agencies are expected to provide the funds. One area already
identified is scalable routing protocols. Van suggested this is
probably the principal deficiency of current architecture; all the
hierarchical models he has seen assume a central authority that
defines the hierarchical levels.  In Van's opinion, good routing
algorithms will require theoretical breakthroughs.

On July 1, the NSF Backbone Net will start operation with MCI lines,
IBM's RT switches (with pieces of Berkeley code), and management by
MERIT.  The routing protocol is DECNET Phase 5, which is a
proposed ANSI standard.  The regional NSF nets sorely need funds.


2. RECURSIVE RPC ARCHITECTURE -- Dave Cheriton

Since most of the attendants had read Dave Cheriton's paper on the
subject, he decided to suppress a planned presentation and instead go
directly into discussing the paper.

Bob questioned the paper's premise of replacing TCP with VMTP; he
raised the question of how VMTP would support Telnet or file transfer.
Dave Cheriton replied that TCP, NETBLT, and other protocols are not
excluded by VMTP; such alternative protocols would be used for slow,
high-delay nets.

Dave guessed that with good implementations and friendly conditions,
recursive RPCs (over Sun/3 and Ethernets) would add only about 100
microsecs to the local processing case.

Van warned about bootstrapping-like problems, like using the file
system to start the file system. Eric Cooper pointed out the advantage
of solutions that use generalized mechanisms for common cases, and
falls back to specialized protocol for ill-behaved unusual cases.

Dave Cheriton stated that it is necessary to have a standard Internet RPC
to preclude proliferation of specialized protocols. He suggested that a
research program for standardization should look at issues like
monitoring gateways and accessing their services through RPCs. The
program should also look into transport-level gateways, since transport
is needed for RPC. Lorenzo indicated that Internet-wide IPC based on
recursive RPC will require simple and widely-available mechanisms for
distributing service-interfaces of remote procedures. He also endorsed
Dave's plea for an Internet RPC standard, unless one is willing to adopt
whatever ISO decides is good for the entire World. Bob Braden replied
that the standardization on ASN.1 may preempt wide acceptance of another
RPC-presentation representation.

Eric noticed that there are two areas of tough problems facing
RPC-based internet operation. One area involves the bootstrapping of
net nodes up to the point where they can interact via RPCs. Another
area involves the language bindings of RPC interfaces.  Eric advised a
presentation-layer approach to the migration of the Internet toward an
RPC-oriented interaction.


3. REPORT ON NETWORKING AT SUN  -- Bill Nowicki

3a NETWORK PERFORMANCE

Bill Nowicki has been working on performance improvements for SunOS
networking code, and in the process he has instrumented the code.
Some interesting results were:

* 90% of ARP resolutions use the same address as previous one.

* 80% of incoming requests go to the same socket as previous one.

* IP header checksum can be performed in-line when there are no IP
  options.  The implied lesson is that every IP implementation should
  have the fastest IP checksum possible--  an ad-hoc one if necessary.

However, SunOS 4.0 official benchmarks did not show an overall
performance improvements as a result of IP upgrades he did. He thinks
this was due to additional software layers and the benchmark
configurations.

Sun Microsystems experienced an Ethernet "melt-down" caused by
high-priority Ethernet error messages. This was fixed by suppressing
display of an error message until 2 sec after displaying a similar
one.


3b. ADAPTIVE NFS RETRANSMISSION

Bill Nowicki has looked into adaptive setting of NFS parameters like
timeouts, readsize, and writesize, using Van's algorithms.  Currently,
these parameters are set by the network administrator.

The performance limitations on NSF transactions are:

1.- Lookups: speed of server.

    This type of transaction will dominate the traffic load.

2.- Reads: speed of network.

3.- Writes: speed of disk.

Joel Emer pointed that if several requests are queued, then lookup speed
cannot be the limiting factor. Steve Deering observed the importance of
good RTT estimation for RPC service, which underlies NFS.  Van suggested
one really wants rate-based control over RPC, to space out multiple
independent requests.

Bill wants to test NFS across the ARPANET. Bob Braden noted that network
packet losses can really hurt NFS because it fragments and reassembles large
(16KB) IP datagrams.  Bill's observations of NFS show small response
times, but large variances.

ACTION: Bill will report to the group on a high-resolution timer chip
that can be plug into a Sun board.


4. LARGE FILE TRANSFER PERFORMANCE -- Joel Emer

Joel presented vugraphs (from Tim Sheppard) showing data transfer
performance measurements using both an old TCP version and 4.3 TCP.


5.- END-TO-END PROTOCOLS TF, WHITHER ? (First Day) -- Bob Braden

Bob Braden invited ideas about new research endeavors for the TF.

The discussion largely covered two performance issues:

 ** Gateways randomly dropping packets when the queues are full.
 
    Van suggested this should be investigated before the gateways
    reach a desperate condition of congestion.  He plans to do some
    instrumented measurements on a test-bed at LBL: a heavily-loaded 
    gateway with an instrumented kernel. 
    
 ** A rate-based algorithm for NFS.
 
    Van still plans to work with Bill Nowicki on this.  He thinks that
    injecting some randomness into the timer may help prevent destructive
    phasing.  As an example, he added a small random number to TCP RTO and
    found a dramatic improvement in ARPANET performance (!). The offsets
    break host retransmission synchronization.  
    
Bob Braden observed that in order to deploy VMTP for applications like
network management, we must have a self-configuring VMTP implementation.
In addition, VMTP must be usable over thin bandwidth nets (like
Internet); this requires that the VMTP communicating ends exploit the
information about capacity available at intermediate gateways, which
implies getting the info out of the gateways. Dave Cheriton commented he
has an student looking into setting VMTP rates for rate-based flow
control.

Van suggested that the consequences of running VMTP everywhere in the
Internet is an interesting (hard) research problem.  He believes that
rate-based flow control will tend to frequency lock ("phase entrainment
to self-destruction").  Dave Cheriton challenged the idea that there is
a PhD-level research problem here. That prompted an extended debate.

Van proposed to demonstrate formally that protocols based on
deterministic algorithms fall into such undesirable stable states. Van
guesses that this undesirable stable states occur regardless of link
speeds. In fact, protocols seem to gravitate toward worst-case resource
utilization states, like lots of retransmissions that congest the net. The
problem of congesting retransmissions will really show-up on T1-based
nets, although 56kbps links do not have enough storage capacity for the
problem to produce catastrophes.

Dave Cheriton remarked that the feedback on buffer state provided by
Van's window algorithms is too indirect and too old to be of real value.
Van replied that the TCP buffer state feedback has proper frequency and
speed of response; feedback faster than the fundamental frequency of the
Internet, the RTT, would lead to an unstable system. Moreover, in
networks with high delay-bandwidth products, reserving bandwidth in order
to get fresh buffering state information would waste large portions of
the net bandwidth.  Van strongly recommended that fast nets be provided
with larger buffer capacity than the Internet, since lack of adequate
buffering is the single major Internet problem.

Dave Cheriton backed a hop-by-hop buffer reservation scheme.  Van retorted
that such a scheme would lead to "instant deadlock", and suggested that
instability can be formally proved. The main argument being that many of
the interferences (congestions) occur far away from places where changes
have to be effected, like controlling host transmission rates.

Joel suggested that a solution might be probabilistic reservations,
which would be like the telephone system.  Van replied that virtual
circuits would not provide a good solution to the congestion problem in
very large nets. In fact, capacity reservation increases the resources
allocated as the net diameter increases, since resources are held for a
time length which is a function of the RTT.

The first day of the meeting was adjourned at this point.


5. RATE-BASED BUFFER MANAGEMENT FOR GATEWAYS -- Dave Cheriton

Van was unable to attend the second day, due to his teaching load.
 
Dave, having accepted the challenge from the previous day, presented his
ideas about managing congestion in the Internet; the scenario assumes
that rate-based, flow-control protocols like VMTP produce a substantial
portion of the traffic.  These minutes report only major points and
points not included in the hardcopies of the vugraphs distributed to
meeting attendants.

Dave wants to avoid exacerbating congestion at costipation points, and
thus proposes an alternative to the exchange of control messages. His
alternative is backpressure that propagates as an increase in buffer
demands, at gateways, along the transmission paths.  [Van had pointed
that congestion feedback propagation can take a long time to reach
original transmitters, across the net from congestion points.]

Under Dave's flow control scheme, the traffic flowing out of a gateway
still equals the traffic flowing in; this, however, requires a new
"output", namely duplicate packets dropped by gateways, which keep
soft state to detect duplicates. Craig wondered whether the savings in
bandwidth is worth the duplicate-detection processing. Joel Emer warned
that Dave's flow control scheme may require very large memories at
gateways.

Dave presented his concept of a gateway buffer pool, the underlying
assumptions, a rate-control algorithm for gateways, and open issues.
He concluded with the conjecture that the memory needed per buffer
pool, at a gateway, is given by:

			K * (bandwidth * prop.delay )

where:
	K: some average hop thru network
	bandwidth: link bandwidth
	prop.delay: per net hop

Dave's final argument was that VMTP needed rate control anyway, because
of the problem of losing back-to-back packets on a LAN.  Bob Braden
suggested this is a link-level problem and ought to be solved at the link
level (e.g, in hardware), rather than at the transport level. Dave
questioned the advisability of hardware solutions to a problem which may
be better solved by higher-level software.


6. PROTOCOL ENGINE -- Greg Chesson

Greg gave a thorough presentation on the Protocol Engine (PE) and the
Xpress Transfer Protocol (XTP). Greg handed out an XTP/PE overview,
and an XTP protocol definition; accordingly, these minutes report only
highlights of the discussion.

XTP is a lightweight transport implemented by PE and it provides
distributed processing communication support similar to that of VMTP.
XTP aims at completing processing of packets within packet arrival
time (order of 10 micro sec), and the system architecture is intended
to scale up to 1 Gbps.  All PE specs and chip kit will be placed in
public domain, and standardization is pursued through ANSI X3S3.  Ten
companies prototyping the PE including: Sillicon Graphics, and
Apollo; HP probably will follow suit, while IBM is unlikely to do so.

He has a simulator program in "C" which he calls the "Reference Model".
Its XTCP is only 911 lines of "C", while the support code is 2300 lines.
He essentially compiles the "C" for XTP into microcode for the chip.

Greg considers addressing issues to be orthogonal to transport service.
XTP treats different address types (logical, group,...) as typed fields.
This address independence is claimed to buy PE compatibility with other
protocol architectures.

There is no significant difference between XTP datagrams and
connections.  A connection is established with arrival of the first
packet, and an unlimited number of concurrent open connections are
allowed.  In addition, the PE can do traffic load splitting w/o source
routing -- "path switching" in XTP lingo.

Greg indicated that at FDDI rates (100 Mbps), 32 bit transport sequence
numbers are OK, however 64 bit numbers are needed at 1 Gbps.
Furthermore, 64-bit and 32-bit sequence numbers can interoperate, using
intermediate gateways which keep connection state.

PE chips communicate through a bus using a shared memory paradigm and
without the impact of user-space paging.  The functional partitioning,
among processing units, was not influenced by chip technology (how much
can go into a chip).  The PE maintains data in contiguous memory and
keeps a pointer to it.  This pointer is claimed to leave open the
possibility of interfacing to diverse operating environments and memory
management schemes. Joel Emer warned that the contiguous-data requirement
could be restricting.

One of the interesting aspects of XTP is its reliable multicasting model,
which provides reliable delivery from a single source to N destinations.
It uses an (ID, key) pair unique in an Internet; the key field is stored
at intermediate nodes to label "next-logical-circuit" (a la virtual
circuit).  All timers are located at the sender side, and a receiver
sends acks only on request.  If the receiver sees an error, it sends a
multicast-Reject (REJ) specifying missing sequence numbers, and the
trasmitter retransmits.  Retransmissions are go-back-N and require a
handshake (between retransmission requester and sender), but unnecessary
retransmissions are eliminated.

The use of negative acks precludes positive-ack implosions towards the
sender, and the multicaster host normally does not require positive
acks.  There is, however, a way for receivers to send positive acks if
the sender wants to know the number of multicast copies received.  The N
receivers use a "slotted phase response", choosing a slot by hashing on
the machine ID.

A critical multicasting parameter is the length of time a multicaster
must hold data waiting for retransmission requests ("reject transmission"
messages); it waits for two RTT's.

Eric Cooper warned about perils of calling XTP's multicast "reliable",
because one cannot state a clean semantics (like for point-to-point) for
all receivers. The semantics concerns whether the multicast targets
received a multicast or not, and it should be stated independently of
"probabilities" (of receiving a multicast).


7. MULTICASTING -- Steve Deering

There was some discussion of Steve's draft RFC containing the new
multicast spec.  Minor changes were suggested, but the group was
generally very pleased with the memo.

Eric Cooper suggested to propose an application which benefits from IP
multicasting. This would encourage implementation of this IP option.

Sun is open to integrate multicasting into SunOS 4.1. Craig raised the
issue of a possible long gap from the time IP vendors provide level-1
multicast conformance to the time they provide full conformance.

ACTION: Steve will revise and submit the RFC to Jon Postel by next week.

ACTION: Bob Braden will lobby to get the RFC published as soon as
possible, and to alert Internet community of our intention of making it a
standard.


8. MERCURY TRANSPORT --  Joel Emer

Joel presented performance results for the Mercury's RPC running on top
of TCP.

The RPC performance for long strings (arguments) deteriorated faster
than Stream Call performance; this was caused largely by naive
marshalling. Joel found large variances in Stream Call times, some went
into 250 msec, but most were under 50 msec.

Joel characterizes his RPC service as an RPC layer on top of somebody
else's transport layer. The RPC service is independent of Mercury's
paradigm and of programming language. The language interface to this
RPC is a veneer which defines {the type of service/presentation ?}.
The C language veneer is ready, and the LISP one is in the works.
There's already a C stub generator coded in LISP. The RPC presents a
request-response service to client process, but the transport has
streaming characteristics. The client can send requests w/o responses
and receive exception notifications. One of the features needed by the
Mercury RPC (and missing in VMTP) is to batch several RPC requests as
a single packet group and receive back individual responses. Joel will
add performance optimizations, like state caching, before publishing
results.


9. END-TO-END PROTOCOLS TF, WHITHER ? (Second Day)

Bob invited more ideas on research topics, activities, and
directions for the TF.

Dave Cheriton gave an status report on the DSAB/Communication Protocols
TF.  Dennis Perry's departure from DARPA left the DSAB without a strong
supporter and without a driver of its program. The DSAB seems to lack
focus, influence, and support, and this has dampened participation of
good contributors in the DSAB, which therefore has produced limited
results.

Bob Braden suggested that the END2END TF might organize more open
meetings (like seminars, symposiums) in order to disseminate the TF ideas
and to get new ideas.  The whole TF, as a group, could undertake the
organization of a conference. Candidate topics are: RPC, performance,
multicasting, hardware architecture for protocol accelerators,
transport-level authentication.  ACM sponsorship was suggested.

Dave Cheriton proposed that DARPA/FRICC support participation of IAB
members in ANSI/ISO groups, so they can feed the Internet research
into the standards process. At present, the USA does not have a
transport expert rep in ISO. The problem is time... there are typically
4 ANSI meetings of 4-5 days each , per year!  

It was commented that ISO is open to another ASN.x (Abstract Syntax
Notation) if a proposal is worth the existence of two standards. Major
drawbacks of ASN.1 are the extra processing required by its poor data
alignment and by the ever-present tags.

Transport level security was suggested as a research topic for the TF.
Ferrari of UCB (from a DSAB TF) has a security model.  Eric pointed out
that the Andrew network does security quite well, with multiple
compartments.  Perhaps we could get someone from Kerberos.


10. ACTION SUMMARY
 
ACTION: Steve Deering to meet with Mike Karels to discuss multicasting
I/O interface. Mike wants to preserve the transparency of multicast
to network interfaces. [effectively DONE]
 
ACTION: Joel Emer, Van Jacobson, and Bob Braden to checkout rumor 
about Ultrix, and attempt to fix it if it is broken [DONE: it was
not broken].
 
ACTION: Bob Braden and Dave Clark to try to persuade DARPA to not
transfer this funding to another activity. [OBE]
 
ACTION: Bill Nowicki, Dave Cheriton: Oversee porting VMTP to SunOS.
[DONE] 

ACTION: Joel Emer to report to Dave Clark the urgent desire of task force 
members to start experimenting with the MIT network simulator. 
 
ACTION: Bill Nowicki to report to the group on a high-resolution timer chip
that can be plug in a Sun board. 
 
ACTION: Steve to revise and submit multicasting RFC to Jon Postel by 
next week [DONE].
 
ACTION: Bob Braden to lobby to get the RFC published as soon as possible,
and to alert Internet community of our intention of making it a standard.
[DONE]
 
ACTION: Bob Braden to talk to Dan Lynch about publicizing BSD/XTCP.

ACTION: Bill Nowicki to publicize Sun's "747 transport" protocol.


11. NEXT MEETING --

Next two TF meetings have been planned: video conference on June 29; a
face-to-face meeting on October 18-19, 1988.  [Ed note: neither took
place as planned.  Instead, a one-day meeting was held the day before
SIGCOMM '88 at Stanford, and a full two-day meeting was held at MIT
Nov. 3-4.]




