I THINK that I understood the multiplication algorithm that was used in the IDCT multiplier, and at least the bottom part of the circuit (MUX ARRAYS CONTROL and MUX ARRAYS, as well as the very bottom of the multiplication tree). I'll just write down my thoughts, so it might not be the best explanation. Also, I didn't yet think about fixed-point or negative numbers, so there might still be some aspects missing.

The algorithm leverages the fact that you can express every product of an N-bit number A and an M-bit number B as the sum of ceil(N/2) (possibly negative) terms of B. For example, assume that N=4 and M arbitrary. Then every possible product can be written as sum of N/2=2 terms:

```
0 * B = 0 * B
1 * B = 1 * B
2 * B = (4 - 2) * B
3 * B = (4 - 1) * B
4 * B = 4 * B
...
13 * B = (-4 + 1) * B, + 16 * B is assumed
...
```

The circuit looks at bit triples at a distance of two bits each (i.e. they overlap by 1 bit) to achieve this; a -1th bit of 0 is assumed. Each 12-bit number has 12/2=6 of these triples that correspond to the the 12/2=6 14-bit outputs L0-L5 of MUXs. Each triple represents a factor that Ln gets multiplied with:

```
000 := 0
001 := 1
010 := 1
011 := 2
100 := -2
101 := -1
110 := -1
111 := "-0"
```

MUX ARRAYS CONTROL looks at the triples and multiplies the input by 2 (shifts it by 1) if necessary (Ln_D1,2) - that's why Ln is 14 bits long. If all bits are 0 or 1, the input is treated as -1 (all 1s). Apart from MUX ARRAYS CONTROL the 2nd bit is used to get the complement of the input (all triples with the 2nd bit = 1 have a negative factor; Ln_C1,2). This fits perfectly with the chart above. Observe that for the 2's complement, there's still a +1 missing; that one is in the multiplier tree: if for a triple, the associated factor is negative or zero (because then the input = -1), +1 is added. This is checked by the 12-OAIs.

In the end, L0-L5 get summed up to get the final result; each Ln has the actual value (Ln << 2*n). Here's an example for our 4-bit numbers:

```
13 * B = 1101b * B => assume implicit -1th bit: 1101.0
divide into overlapping triples:
L0: 010 := 1 * B
L1: 110 := (-1 * B)
=> ((L0 << 0) + (L1 << 2)) = (1 - (1 << 2)) * B = (1 - 4) * B = -3 * B; +16 * B is assumed so = 13*B
```

However, I still don't know how the tree is working exactly (especially as there are still errors in the Logisim circuit). There's not enough levels to allow for all additions, so that's probably what the WS elements are for. The AND_FLAG input (as I think nocash has already mentioned) serves to initialize the accumulation operation if it is 0; if it is 1, the result of the multiplication is added to the result of the multiplication one cycle before. I think one multiplication needs 2? cycles, but being pipelined each cycle a result is output and accumulated.