Hi, I still have several question after reading your code and paper. <p dir="a

The M sub-matrices have their own codebook, thus M codebooks should be stored.</

then there must be sth wrong in my understanding, please correct

The codebook is a collection of codewords, and each codeword is a

Oh, I see! each codeword is a vector, not a matrix <br

questions about codebook about quantized-cnn HOT 4 CLOSED

jiaxiang-wu commented on July 1, 2024

questions about codebook

from quantized-cnn.

jiaxiang-wu commented on July 1, 2024

The M sub-matrices have their own codebook, thus M codebooks should be stored.
To learn D and B, you only need unlabeled images from the training subset. The optimization process is to approximate the original network's each layer's activation, so there is no need for category labels. However, if you want to fine-tune the network, then the labelled training data is required.
The inner products are only computed when you have an input image to the network. It is input-dependent. Different input images will produce different look-up tables.

from quantized-cnn.

jiangzidong commented on July 1, 2024

then there must be sth wrong in my understanding, please correct me.
Each sub-matrix has own codebook, then just set the codeword in the codebook same with the submatrix. in another word, there is only one word in the codebook, and the word is exactly the submatrix.
And there is M codebooks need to be stored, instead of the whole weight matrix. but does the size of M codebooks smaller than the original matrix?
so the computation cost of the product should also be included in the FLOP? and why do we need the look-up table? Every input will result in different product.

Thanks very much for your prompt reply :)

from quantized-cnn.

jiaxiang-wu commented on July 1, 2024

The codebook is a collection of codewords, and each codeword is a vector, not a matrix.
Let us consider the first fully-connected layer in AlexNet, which takes a 9216-D vector (9216=256*6*6) as input, and outputs a 4096-D vector. Here, we let the number of subspace dimensions to be 4, so the number of subspaces, i.e. M, equals to 9216/4=2304. So we split the 9216x4096 weighting matrix, i.e. W, into 2304 sub-matrices, each of size 4x4096. For each sub-matrix, we learn a codebook of size 4xK, which consists of K codewords, each of which is a 4-D vector. Note that K is much smaller than 4096. Then we use this codebook to quantize the sub-matrix, i.e. each column in the sub-matrix is approximated (or replaced) by a codeword selected from this codebook.
The computation cost of inner products is included in the FLOPs.
Continue with the above example. For a 9216-D input vector, we split it into 2304 sub-vectors, each of 4-D. For each sub-vector, its inner products with all the 4096 column vectors in the corresponding sub-matrix is needed. Since the sub-matrix is quantized with the codebook, we can compute a look-up table of K elements, which are the sub-vector's inner products with all codewords in that codebook. So we can reduce the number of inner product computations from 4096 to K. The look-up table contains all the 4096 inner products we needed to compute the layer response.

from quantized-cnn.

jiangzidong commented on July 1, 2024

Oh, I see! each codeword is a vector, not a matrix
Thanks very much for your explanation!!

from quantized-cnn.