Re: DVI Incompatibility
Author:   Van Jacobson <van@ee.lbl.gov>
Date:     1995/06/27
Forum:    ucb.digital-video

> There is one key point that you did not address.  Henning did
> not make up the idea of having the first 16 bits be the first
> (uncompressed) sample.  I have not checked the references, but
> Henning said this is what the the IMA and Microsoft DVI ADPCM
> Wave type spec says.  You may be one of many who have no great
> love for Microsoft, but I believe it was this algorithm that
> Jack Jansen was implementing.

Steve,

I was objecting to the variation Henning proposed because it
doesn't work -- I'm not sure it matters whether Henning or
Microsoft came up with the idea (I concede that both are good at
coming up with things that don't work).  The DVI coder appears
to be a simple, fixed, first order predictor with a non-adaptive
log quantizer on the slope.  Because of simple structure,
certain choices of freq. and gain give large quantization
errors.  Try the following experiment:  Run a moderate amplitude
(say 1/2 FS) medium freq (say 500Hz) pure tone through the
encoder & decoder.  If the coder is implemented the way Jack did
it, there's a half-cycle turn-on transient then the output
settles down to a fairly good representation of the input with
no frequency structure other than the 500Hz.  If the coder is
implemented as Henning proposed, the turn-on transient is twice
as long and there are ~10% spikes every 160 samples (i.e., there
will be clearly audible 50Hz noise).  This happens because
Jack's decoder only sees quantized values so it settles down to
self-consistent, linear behavior fairly quickly.  Henning's
decoder sees an unquantized value every 160 samples, then
quantized values for the next 159.  Since there can be a
substantial difference between the reconstructions based on
quantized vs. unquantized, Henning's/Microsoft's scheme can
introduce artifacts at the start of every frame.  I.e., every
20ms for 160 sample frames.  This makes noise.

There is also a secondary problem with the longer turn-on
transient in Henning's scheme (which happens because it
essentially uses a 0th order estimator on the 1st sample of a
frame -- the slope after the first sample is always 0 -- which
screws up the slope tracking 1st order estimator for several
following samples).  I imagine this would be audible whenever
there were large changes in frequency content happening at small
multiples of the frame time (e.g., a mixture of voiced &
unvoiced phonemes) but I haven't tested this.  (I'm sure it
would be a much smaller effect than the 50Hz noise.)

Based on other things they've done, my impression is that
Microsoft does not publish standards for the same reasons we do.
Since it takes Microsoft many years to accomplish anything,
their `standards' appear intended more as a strategic weapon to
delay or derail implementation of new ideas by their more agile,
cleverer, competition -- if the competion does things right,
Microsoft says it's non-standard; if they follow `the standard',
they waste time in a dead-end rathole (a friend once showed me a
long, long list of `standards' that Microsoft published for
others then later totally ignored).

If this is the case, it's not surprising that `the standard'
doesn't work -- it was designed to not work.  (E.g., a
`standard' that requires an odd number of samples in a frame.)
Since we're already completely incompatible with `the standard'
if we use sane frame sizes, I don't see that we gain anything by
screwing up the coder just to be `less incompatible'.  Why not
simply say that "the dvi coder was developed by Jack Jansen and is
loosely based on a Microsoft spec with the same name."  That way
we're left with a working coder that has the structure that
every DSP text in the world says it has to have & Microsoft can
continue to do whatever it is they're going to do with their `standard'.

 - Van