Coder Social home page Coder Social logo

Confusion about "Quantization" about slalom HOT 4 CLOSED

ftramer avatar ftramer commented on June 9, 2024
Confusion about "Quantization"

from slalom.

Comments (4)

ftramer avatar ftramer commented on June 9, 2024

Great questions!

1) To apply Freivald's algorithm to verify matrix multiplication, you have to operate over an algebraic field (https://en.wikipedia.org/wiki/Field_(mathematics)). The simplest example of a field are the "integers modulo a prime". E.g., for p=11, you work over the set of integers Z_11 = {0,1,2,...,10}, and map an arbitrary integer N to N mod p. So for instance 11 becomes 0, 13 becomes 2 (i.e., 13=1*11+2), 37 becomes 4 (i.e., 37=4*11+4), etc.
So to verify DNN computations, we need to evaluate the DNN over Z_p for some prime p. The problem is that we also need to ensure that computing the DNN over Z_p gives us the same results as computing the DNN over "regular" integers (otherwise we'd be destroying our model's accuracy). We thus choose a prime p large enough so that we never actually need to compute a modulo (i.e., all intermediate values computed by the DNN are in the interval [-p/2, p/2]).

2) We first convert all numbers in our neural network (weights and inputs) to integers, by scaling them by 2^k and then rounding. You're right that to get back to the "original" representation, you should scale the output of a linear layer by 2^{-2k}. But there are usually more layers after that, which again need their inputs to be of the form round(2^k * x). So, we just rescale by 2^{-k} to keep outputs of one layer in the right format expected by the inputs of the next layer.

3.1) Yes, correct.

3.2) As described above, we choose a prime p so that all intermediate values in a DNN evaluation are in the interval [-p/2, p/2]. We choose p to be close to 2^24, because that is (roughly) the largest integer representable in a float, without loss of precision.
In the end, we select quantization parameters (i.e., k) so that we can do all integer computations in our DNN using floats, and we never end up with a value larger than 2^24.

from slalom.

gutouyu avatar gutouyu commented on June 9, 2024

@ftramer It is quite clearly for your explanation. Thanks a lot. I have understood why we are doing this Quantization, for we want a finite field for Freivald's algorithm. In details I still have some questions described below:

  1. intermediate values in a DNN evaluation are in the interval [-p/2, p/2]. We found a large prime P, every number mod P, we get [0, P], why it is [-p/2, p/2]? Could you plz explain how we get the -p/2 and p/2 in details?Thanks.
  2. We choose p to be close 2^24。 It's roughly the largest integer representable in a float. As far as I know float32 could be much larger than 2^24, something around 10^38. I still don't understand why it's 2^24.
  3. Some confusions about the code file: python/slalom/quant_layers.py:
P = 2**23 + 2**21 + 7
INV_P = 1.0 / P
MID = P // 2
assert(P + MID < 2**24)
q = float(round(np.sqrt(MID))) + 1
inv_q = 1.0 / q

So this P is the large prime we choose , right?How come of the 2**23 + 2**21 + 7 exactly? Also why we still need a q MID, what's the meaning of them and what they are used for ? Also, why we must guarntee P + MID < 2^24?

Really want to understand everything in this paper and the code. Looking froward to your reply. Best wishes. 😆

from slalom.

ftramer avatar ftramer commented on June 9, 2024

1. You can view the output of a modulo as lying in the interval [0, p] or [-p/2, p/2]. They are equivalent, it is just a choice of convention. E.g., 5 mod 7 = -2 is perfectly correct. We prefer using [-p/2, p/2] as the values in the DNN can be positive or negative.

2. Yes floats can go as high as 10^38, but with loss of precision. E.g., if you do operations with integers bigger than 2^24, the results will not necessarily be exact.

3. p=2**23 + 2**21 + 7 is the largest prime such that p+p/2 is smaller than 2^24. This is a small (and not particularly important) optimization for the privacy part. There, we end up performing operations of the form x+r mod p where x is in [0, p] and r is in [-p/2, p/2]. So x+r can be as high as p+p/2 and we make sure this never overflows 2^24.
The q parameter is another optimization quirk. When evaluating convolutions on encrypted data on the GPU, we need to use double precision to avoid precision loss. But CUDA is extremely bad at doing double convolutions. So instead, we take the input x in [-p/2, p/2] and write is as x = x1 + q*x2, where |x1| and |x2| are smaller than sqrt(p). Then, we can compute Conv(x) = Conv(x1 + q*x2) = Conv(x1) + q*Conv(x2). So we compute two floating-point convolutions on smaller inputs rather than one double convolution on the full input.

from slalom.

gutouyu avatar gutouyu commented on June 9, 2024

Wow! It is so clear, thanks for your reply.

from slalom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.