Re: DVI Incompatibility Author: Van Jacobson Date: 1995/06/27 Forum: ucb.digital-video > There is one key point that you did not address. Henning did > not make up the idea of having the first 16 bits be the first > (uncompressed) sample. I have not checked the references, but > Henning said this is what the the IMA and Microsoft DVI ADPCM > Wave type spec says. You may be one of many who have no great > love for Microsoft, but I believe it was this algorithm that > Jack Jansen was implementing. Steve, I was objecting to the variation Henning proposed because it doesn't work -- I'm not sure it matters whether Henning or Microsoft came up with the idea (I concede that both are good at coming up with things that don't work). The DVI coder appears to be a simple, fixed, first order predictor with a non-adaptive log quantizer on the slope. Because of simple structure, certain choices of freq. and gain give large quantization errors. Try the following experiment: Run a moderate amplitude (say 1/2 FS) medium freq (say 500Hz) pure tone through the encoder & decoder. If the coder is implemented the way Jack did it, there's a half-cycle turn-on transient then the output settles down to a fairly good representation of the input with no frequency structure other than the 500Hz. If the coder is implemented as Henning proposed, the turn-on transient is twice as long and there are ~10% spikes every 160 samples (i.e., there will be clearly audible 50Hz noise). This happens because Jack's decoder only sees quantized values so it settles down to self-consistent, linear behavior fairly quickly. Henning's decoder sees an unquantized value every 160 samples, then quantized values for the next 159. Since there can be a substantial difference between the reconstructions based on quantized vs. unquantized, Henning's/Microsoft's scheme can introduce artifacts at the start of every frame. I.e., every 20ms for 160 sample frames. This makes noise. There is also a secondary problem with the longer turn-on transient in Henning's scheme (which happens because it essentially uses a 0th order estimator on the 1st sample of a frame -- the slope after the first sample is always 0 -- which screws up the slope tracking 1st order estimator for several following samples). I imagine this would be audible whenever there were large changes in frequency content happening at small multiples of the frame time (e.g., a mixture of voiced & unvoiced phonemes) but I haven't tested this. (I'm sure it would be a much smaller effect than the 50Hz noise.) Based on other things they've done, my impression is that Microsoft does not publish standards for the same reasons we do. Since it takes Microsoft many years to accomplish anything, their `standards' appear intended more as a strategic weapon to delay or derail implementation of new ideas by their more agile, cleverer, competition -- if the competion does things right, Microsoft says it's non-standard; if they follow `the standard', they waste time in a dead-end rathole (a friend once showed me a long, long list of `standards' that Microsoft published for others then later totally ignored). If this is the case, it's not surprising that `the standard' doesn't work -- it was designed to not work. (E.g., a `standard' that requires an odd number of samples in a frame.) Since we're already completely incompatible with `the standard' if we use sane frame sizes, I don't see that we gain anything by screwing up the coder just to be `less incompatible'. Why not simply say that "the dvi coder was developed by Jack Jansen and is loosely based on a Microsoft spec with the same name." That way we're left with a working coder that has the structure that every DSP text in the world says it has to have & Microsoft can continue to do whatever it is they're going to do with their `standard'. - Van