Comments (4)
Great questions!
1) To apply Freivald's algorithm to verify matrix multiplication, you have to operate over an algebraic field (https://en.wikipedia.org/wiki/Field_(mathematics)). The simplest example of a field are the "integers modulo a prime". E.g., for p=11
, you work over the set of integers Z_11 = {0,1,2,...,10}
, and map an arbitrary integer N
to N mod p
. So for instance 11 becomes 0, 13 becomes 2 (i.e., 13=1*11+2), 37 becomes 4 (i.e., 37=4*11+4), etc.
So to verify DNN computations, we need to evaluate the DNN over Z_p
for some prime p
. The problem is that we also need to ensure that computing the DNN over Z_p
gives us the same results as computing the DNN over "regular" integers (otherwise we'd be destroying our model's accuracy). We thus choose a prime p
large enough so that we never actually need to compute a modulo (i.e., all intermediate values computed by the DNN are in the interval [-p/2, p/2]
).
2) We first convert all numbers in our neural network (weights and inputs) to integers, by scaling them by 2^k
and then rounding. You're right that to get back to the "original" representation, you should scale the output of a linear layer by 2^{-2k}
. But there are usually more layers after that, which again need their inputs to be of the form round(2^k * x)
. So, we just rescale by 2^{-k}
to keep outputs of one layer in the right format expected by the inputs of the next layer.
3.1) Yes, correct.
3.2) As described above, we choose a prime p
so that all intermediate values in a DNN evaluation are in the interval [-p/2, p/2]
. We choose p
to be close to 2^24
, because that is (roughly) the largest integer representable in a float, without loss of precision.
In the end, we select quantization parameters (i.e., k
) so that we can do all integer computations in our DNN using floats, and we never end up with a value larger than 2^24
.
from slalom.
@ftramer It is quite clearly for your explanation. Thanks a lot. I have understood why we are doing this Quantization, for we want a finite field for Freivald's algorithm. In details I still have some questions described below:
- intermediate values in a DNN evaluation are in the interval [-p/2, p/2]. We found a large prime P, every number mod P, we get [0, P], why it is [-p/2, p/2]? Could you plz explain how we get the
-p/2
andp/2
in details?Thanks. - We choose p to be close 2^24。 It's roughly the largest integer representable in a float. As far as I know
float32
could be much larger than 2^24, something around10^38
. I still don't understand why it's2^24
. - Some confusions about the code file:
python/slalom/quant_layers.py
:
P = 2**23 + 2**21 + 7
INV_P = 1.0 / P
MID = P // 2
assert(P + MID < 2**24)
q = float(round(np.sqrt(MID))) + 1
inv_q = 1.0 / q
So this P
is the large prime we choose , right?How come of the 2**23 + 2**21 + 7
exactly? Also why we still need a q
MID
, what's the meaning of them and what they are used for ? Also, why we must guarntee P + MID < 2^24
?
Really want to understand everything in this paper and the code. Looking froward to your reply. Best wishes.
from slalom.
1. You can view the output of a modulo as lying in the interval [0, p] or [-p/2, p/2]. They are equivalent, it is just a choice of convention. E.g., 5 mod 7 = -2 is perfectly correct. We prefer using [-p/2, p/2] as the values in the DNN can be positive or negative.
2. Yes floats can go as high as 10^38, but with loss of precision. E.g., if you do operations with integers bigger than 2^24, the results will not necessarily be exact.
3. p=2**23 + 2**21 + 7
is the largest prime such that p+p/2
is smaller than 2^24
. This is a small (and not particularly important) optimization for the privacy part. There, we end up performing operations of the form x+r mod p
where x
is in [0, p]
and r
is in [-p/2, p/2]
. So x+r
can be as high as p+p/2
and we make sure this never overflows 2^24
.
The q
parameter is another optimization quirk. When evaluating convolutions on encrypted data on the GPU, we need to use double precision to avoid precision loss. But CUDA is extremely bad at doing double
convolutions. So instead, we take the input x
in [-p/2, p/2]
and write is as x = x1 + q*x2
, where |x1|
and |x2|
are smaller than sqrt(p)
. Then, we can compute Conv(x) = Conv(x1 + q*x2) = Conv(x1) + q*Conv(x2)
. So we compute two floating-point convolutions on smaller inputs rather than one double convolution on the full input.
from slalom.
Wow! It is so clear, thanks for your reply.
from slalom.
Related Issues (20)
- Why ActivationQ doesn't support Sigmoid function? HOT 1
- What's the meaning of running param "--no_slalom"? HOT 1
- Integrity and Privacy guarantee question HOT 6
- Can not use --batch_verify and --verify_preproc toghther HOT 1
- Freivald check for Dense and Conv Layer questions HOT 2
- Using Eigen/Sparse HOT 2
- Random r vector HOT 3
- loop over output_map HOT 6
- compile error
- error
- the code about implementing DNN in SGX without outsourcing HOT 1
- Three issues for building HOT 2
- About Imagenet dataset HOT 6
- Undefinied reference error in lib/sgxdnn.so HOT 3
- How to run TEE base line HOT 1
- Parallelization is working? HOT 1
- ModuleNotFoundError: No module named 'python.slalom.resnet2 HOT 1
- Question about fixed-point representation HOT 2
- I cannot run this project. HOT 5
- Error executing make while tring to build SGX application
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slalom.