MDEC Decompression goes through three steps
1) RLE decompression
2) IDCT conversion
3) Y-to-mono (or YUV-to-RGB) conversion
RLE is quite simple. Although, I've just discovered that it contains some hidden features: rounding, saturation and special quant table disable feature. Anyways, I am quite sure that I've figured out the exact RLE output. Ie. I am SURE that I know what exact values get passed to the IDCT section.
Y-to-mono is very simple: 8bit saturation, and optional signed-to-unsigned conversion. The only special case is that the saturation bugs when the input value exceeds 9bit range.
IDCT is not simple. I've spent a couple of days on running hardware tests, but I am absolutely unable to reproduce this step without rounding errors.
Basically it's doing two 8x8 matrix multiplications:
TempMatrix = DiagonallyMirrored.InputMatrix * ScaleTableMatrix
OutputMatrix = DiagonallyMirrored.TempMatrix * ScaleTableMatrix
InputMatrix is the RLE decoded data (signed 11bit values).
TempMatrix contains some temporary result (signed numbers with unknown amount of bits)
OutputMatrix is the Y data for Y-to-Mono conversion (signed 9bit values or bigger).
ScaleTableMatrix contains some signed 13bit values (which be assigned as an array of 16bit values via an MDEC command; the hardware uses only the upper 13bit of those 16bit values).
And "Diagonally.Mirrored" means that matrix is multiplication is done "column-by-column" (instead of the usual row-by-column).
Unfortunatley, the matrix multiplications include several multiplications, and additions. The hardware may strip fractional parts after the multiplications, and/or after ther additions, possibly stripping different amounts of bits in the TempMatrix and OutputMatrix steps, and possibly stripping some final fraction before passing the result to the Y-to-Mono section.
At least in some cases, the result is rounded UP before stripping fractions. And at least in same cases, that rounding is done only if the fractions are BIGGER than 0.5.
Other than that, I didn't find out anything useful, everytime that I've found a good formula for some test values... it did turn out to fail on other test values.
Is there anything on the decapped chip that could help on that problems? For example, knowing the width of the multiplication results would be great!
Did you already discover an MDEC section the chip? I hope there is some sort of an MDEC section (without the MDEC stuff being randomly scattered across the chip).
Some characteristic thing for identifying the MDEC hardware should be:
Several arrays with 64 entries (11bit rle output, 8bit quanttable, 13bit scaletable, and Nbit Y, Cr, Cb buffers).
Nbit * 13bit multiplication unit for the scaletable multiplication (since it's a matrix multiplication, there might be 8 such multipliers to speed up things).