From van@helios.ee.lbl.gov  Mon Dec 12 08:36:46 1988
Posted-Date: Mon, 12 Dec 88 08:37:35 PST
Received-Date: Mon, 12 Dec 88 08:36:46 PST
Received: from vs.ee.lbl.gov by venera.isi.edu (5.54/5.51)
	id AA24790; Mon, 12 Dec 88 08:36:46 PST
Received: by helios.ee.lbl.gov (5.59/s2.2)
	id AA04353; Mon, 12 Dec 88 08:37:38 PST
Message-Id: <8812121637.AA04353@helios.ee.lbl.gov>
To: Jon Crowcroft <jon@cs.ucl.ac.uk>
Cc: end2end@venera.isi.edu
Subject: Re: NFS and MMDF, 4.0 tcpdump
In-Reply-To: Your message of Sat, 10 Dec 88 16:10:13 GMT.
Date: Mon, 12 Dec 88 08:37:35 PST
From: Van Jacobson <van@helios.ee.lbl.gov>
Status: R

Jon -

There are a bunch of NFS bugs that result in chunks of nulls
randomly inserted in files (I just grepped our SunSpots archives
and came up with 30 different error reports).  If clients of the
Sun-2 server are running 4.0, you're probably being bitten by a
horrible bug in the nfs client caching that results in trashed
files on the server.  The last time we reported the problem to
Sun (it's one of the reasons we don't run 4.0), we heard it's
logged as bug id #1011525 and is fixed in 4.0.1.

The problems you're seeing with tcpdump have to do with nit and
streams bugs in 4.0.  Attached are excerpts from some messages
that describe the problems.  Bill Nowicki has fixed at least one
of these (the nit_if one) in some future release (>4.0.1?).  In
the interim, you can ftp a fixed nit_if.o from venera.isi.edu,
rtsg.ee.lbl.gov or delete the two offending lines in your
source.  When I last talked to Bill, the nit_buf bug wasn't
fixed (in fact, our report didn't even make it into the bugs
database) but I'm sure it will be in some future release -- Sun
is pretty good about fixing bugs if you ever manage to get the
report past the "bozo filter" at the Sun bugs hotline.  There
are several different one line changes in nit_buf.c that will
fix the problem (I think; I haven't tried any of them).  The
crashes you see are a combination of various things:  The
streams code used by nit was taken straight from System V.
I.e., it's garbage.  Whenever any one of its fixed size resource
pools is exhausted, it will crash.  This crummy code gets
over-exercised because the nit examples are wrong:  they suggest
using 8K buffers when, in fact, there are several limits built
into the system that prevent anything more than a 6K buffer from
working correctly (and, actually, 3K appears to be a better
number -- there's some discussion of why below).

Are you sure you really want to switch to 4.0?

 - Van

 -------------------
   ... the bug is in the system module net/nit_buf.c --
it botches flush requests:  After any FLUSHR, nit input changes
from buffered to unbuffered.

Not having source when we were doing 4.0 testing, we originally
found the bug using kadb & dis-assembling the kernel.  Once we
knew what was happening, we decided to kluge around the bug by
removing the I_FLUSH calls from etherfind/tcpdump.  If you've
got a section of code that looks like:

        /*
         * Flush the read queue, to get rid of anything that
         * accumulated before the device reached its final configuration.
         */
        if (ioctl(if_fd, I_FLUSH, (char *)FLUSHR) < 0) {
                perror("ioctl (I_FLUSH)");
                exit(1);
        }

put "#ifdef notdef"s around it & the problem may go away ...
 ---------------------
 ... The [truncation] problem seems to be
in the system module net/nit_if.c, routine snit_cpmsg.  Some code
that reads

        blen = nif->nif_bodylen;
        if (snaplen > 0) {
                if (blen > snaplen - sizeof (struct ether_header))
                        blen = snaplen - sizeof (struct ether_header);
 --             else
 --                     blen = 0;
        }

should have the "else blen = 0;" deleted.
 ...
I can see at least two problems that might account for [a large
number of drops]:  There's code in net/nit_if.c:snit_ioctl that
sets the queue limit to 64*snaplen bytes.  nit_buf "hides" the
accumulated packets until it gets a full chunk or a timeout but,
from the time a chunk is full until your read completes, the
queue is credited with the 8128 byte chunk.  Since this exceeds
the 6784 byte limit, packets that arrive during this window will
be dropped (and, if your task gets preempted by "update" or
blocked by ^Z or ptrace, the window could be long).  I suspect
the way around this problem is to keep the chunk size less than
half the built-in limit, say around 3K bytes.

The other problem is stream buffer pool exhaustion:  The stream
code is taken straight from System-V and AT&T has never heard of
dynamic resource allocation (considering how badly they code,
that's probably for the best).  Every packet you snap takes one
128-byte dblk & one 64-byte dblk from the stream buffer pool.
The system configuration file, param.c, allows for only 96
128-byte dblks.  Filling one 8192-byte nit "chunk" uses up 64 of
them so, if for some reason it takes more than 32 packet times
to complete a nit read, packets will be dropped because of dblk
pool exhaustion.  Dropping the chunksize to 3K should also help
here since it will give you 72 packet times of elasticity in the
system rather than 32.  You could also bump the NBLK128 define
in param.c up to a more reasonable value, say 160, and regen the
kernel (if anything else on the system is using nit, you almost
certainly will have to bump up this define).