Coder Social home page Coder Social logo

facebookresearch / diffq Goto Github PK

View Code? Open in Web Editor NEW
229.0 10.0 14.0 177 KB

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

License: Other

Makefile 0.92% Python 95.14% Cython 3.94%

diffq's Introduction

Differentiable Model Compression via Pseudo Quantization Noise

linter badge tests badge cov badge

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Go read our paper for more details.

What's up?

See the changelog for details on releases.

  • 2022-08-24: v0.2.3: fixed a bug when loading old quantized states.
  • 2021-11-25: version 0.2.2: adding support for torchscript.

Requirements

DiffQ requires Python 3.7, and a reasonably recent version of PyTorch (1.7.1 ideally). To install DiffQ, you can run from the root of the repository:

pip install .

You can also install directly from PyPI with pip install diffq.

Usage

import torch
from torch.nn import functional as F
import diffq
from diffq import DiffQuantizer

model = MyModel()
optim = ...  # The optimizer must be created before the quantizer
quantizer = DiffQuantizer(model)
quantizer.setup_optimizer(optim)

# Distributed data parallel must be created after DiffQuantizer!
dmodel = torch.distributed.DistributedDataParallel(...)

penalty = 1e-3
model.train()  # call model.eval() on eval to automatically use true quantized weights.
for batch in loader:
    ...
    optim.zero_grad()

    # The `penalty` parameter here will control the tradeoff between model size and model accuracy.
    loss = F.mse_loss(x, y) + penalty * quantizer.model_size()
    optim.step()

# To get the true model size with when doing proper bit packing.
print(f"Model is {quantizer.true_model_size():.1f} MB")

# When you want to dump your final model:
torch.save(quantizer.get_quantized_state(), "some_file.th")

# You can later load back the model with
model = MyModel()
diffq.restore_quantized_state(model, torch.load("some_file.th"))

# For DiffQ models, we support exporting the model to Torscript with optimal storage.
# Once loaded, the model will be stored in fp32 in memory (int8 support coming up).
from diffq.ts_export import export
export(quantizer, 'quantized.ts')

Documentation

See the API documentation for detailed documentation. We cover hereafter a few aspects.

Quantizer object

A Quantizer is attached to a model at its creation. All Quantizer objects provide the same basic capabilities:

  • automatically switches to quantized weights on the forward if the model is in eval mode.
  • quantizer-specific code on training forward (e.g. STE for UniformQuantizer with QAT, noise injection for DiffQ).
  • provide access to the quantized model size and state.

Quantized size and state

The method quantizer.model_size() provide a differentiable model size (for DiffQ), while quantizer.true_model_size() provide the true, optimally bit-packed, model size (non differentiable). With quantizer.compressed_model_size() you can get the model size using gzip. This can actually be larger than the true model size, and reveals interesting information on the entropy usage of a specific quantization method.

The bit-packed quantized state is obtained with quantizer.get_quantized_state() , and restored with quantizer.restore_quantized_state(). Bit packing is optimized for speed and can suffer from some overhead (in practice no more than 120B for Uniform and LSQ, and not more than 1kB for DiffQ).

If you do not have access to the original quantizer, for instance at inference time, you can load the state with diffq.restore_quantized_state(model, quantized_state).

Quantizer and optimization

Some quantizer will add extra optimizable parameters (DiffQuantizer and LSQ). Those parameters can require different optimizers or hyper-parameters than the main model weights. Typically, DiffQ bits parameters are always optimized with Adam. For that reason, you should always create the main optimizer before the quantizer. You can then setup the quantizer with this optimizer or another:

model = MyModel(...)
opt = torch.optim.Adam(model.parameters())
quantizer = diffq.DiffQuantizer(model)
quantizer.setup_optimizer(opt, **optim_overrides)

This offers the freedom to use a separate hyper-params. For instance, DiffQuantizer will always deactivate weight_decay for the bits parameters.

If the main optimizer is SGD, it is advised to have a second Adam optimizer for the quantizer.

Warning: you must always wrap your model with DistributedDataParallel after having created the quantizer, otherwise the quantizer parameters won't be optimized!

TorchScript support

At the moment the TorchScript support is experimental. We support saving the model with TorchScript to disk with optimal storage. Once loaded, the model is stored in FP32 in memory. We are working towards adding support for int8 in memory. See the diffq.ts_export.export function in the API.

Examples

We provide three examples in the examples/ folder. One is for CIFAR-10/100, using standard architecture such as Wide-ResNet, ResNet or MobileNet. The second is based on the DeiT visual transformer. The third is a language modeling task on Wikitext-103, using Fairseq

The DeiT and Fairseq examples are provided as a patch on the original codebase at a specific commit. You can initialize the git submodule and apply the patches by running

make examples

For more details on each example, go checkout their specific READMEs:

Installation for development

This will install the dependencies and a diffq in developer mode (changes to the files will directly reflect), along with the dependencies to run unit tests.

pip install -e '.[dev]'

Updating the patch based examples

In order to update the patches, first run make examples to properly initialize the sub repos. Then perform all the changes you want, commit them and run make patches. This will update the patches for each repo. Once this is done, and you checked that all the changes you did are properly included in the new patch files, you can run make reset (this will remove all your changes you did from the submodules, so do check the patch files before calling this) before calling git add -u .; git commit -m "my changes" and pushing.

Test

You can run the unit tests with

make tests

Citation

If you use this code or results in your paper, please cite our work as:

@article{defossez2021differentiable,
  title={Differentiable Model Compression via Pseudo Quantization Noise},
  author={D{\'e}fossez, Alexandre and Adi, Yossi and Synnaeve, Gabriel},
  journal={TMLR},
  year={2022}
}

License

This repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file, except for the following parts that is under the MIT license. The files examples/cifar/src/mobilenet.py and examples/cifar/src/src/resnet.py are taken from kuangliu/pytorch-cifar, released as MIT. The file examples/cifar/src/wide_resnet.py is taken from meliketoy/wide-resnet, released as MIT. See each file headers for the detailed license.

diffq's People

Contributors

adefossez avatar eurus-holmes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffq's Issues

Number of parameters doubled

โ“ Questions

I tried to use DiffQ with PyTorch Lightning and found that the number of parameters doubled. Is this behavior expected? I am forced to reduce the batch size to avoid out-of-memory error. Minimal example:

import os

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import pytorch_lightning as pl
from sklearn.datasets import make_classification

from diffq import DiffQuantizer


class DataSet(Dataset):
    def __init__(self):
        self.X, self.y = make_classification(
            n_samples=100, 
            n_features=512, 
            shuffle=True, 
            random_state=0,
        )
  
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, i):
        x = torch.tensor(self.X[i]).float()
        y = torch.tensor(self.y[i]).long()
        return {'x': x, 'y': y}
    
    
class Model(pl.LightningModule):
    def __init__(self, q=True):
        super().__init__()
        self.q = q
        self.l1 = nn.Linear(512, 2**12)
        self.l2 = nn.Linear(2**12, 2)
        self.criterion = nn.NLLLoss()
        
    def forward(self, x): 
        y = self.l1(x)
        y = self.l2(y)
        return y
    
    def training_step(self, batch, batch_idx):
        y = batch['y']
        y_ = self(batch['x'])
        loss = self.criterion(y_,y)
        if self.q:
            loss += 1e-3 * self.quantizer.model_size()
        return loss
    
    def configure_optimizers(self):
        opt = torch.optim.Adam(self.parameters(), lr=1e-5)
        if self.q:
            self.quantizer = DiffQuantizer(self)
            self.quantizer.setup_optimizer(opt) 
        return {'optimizer': opt}
    
    
loader = DataLoader(
    DataSet(),
    batch_size=32,
    num_workers=os.cpu_count()
)

model = Model(q=bool(0))
trainer = pl.Trainer(max_epochs=2)
trainer.fit(model, loader)

model = Model(q=bool(1))
trainer = pl.Trainer(max_epochs=2)
trainer.fit(model, loader)

Getting error by pip install diffq on Windows

๐Ÿ› Bug Report

I'm trying to install the diffp on windows, I try the powershell,cmd on administrator mode, but didn't work, it output same error for my .

I find out there are specific cmd that not work

"F:\C#\Microsoft Visual Studio\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe" /E /C /nologo /O2 /W3 /GL /DNDEBUG /MD -Ic:\Users\chow\Desktop\programme\python\project\.venv\include -IC:\Users\chow\AppData\Local\Programs\Python\Python311\include -IC:\Users\chow\AppData\Local\Programs\Python\Python311\Include "-IF:\C#\Microsoft Visual Studio\VC\Tools\MSVC\14.35.32215\include" "-IF:\C#\Microsoft Visual Studio\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /Tcbitpack.c /Fobuild\temp.win-amd64-cpython-311\Release\bitpack.obj
this command output are error

error: c1 fatal error c1083 cannot open source file: 'bitpack.c': no such file or directory

I try a different cl.exe that store in .\Hostx64\x64, add /E to ci cmmand, still not work.

So I think there something wrong about VS , so I install the python toolkit in vs installer, and yet still not work

  • Python version:3.10.2
  • Operating system and version (desktop or mobile):Windows 10

require 'override' keyword

๐Ÿ› Bug Report

run
cd examples/cifar/ and run flowing readme, but got the flowing error:

envs/diffq/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:389: UserWarning: In config.yaml: Invalid over
riding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.

change examples/cifar/conf/config.yaml
from
ย  1 defaults:
ย  2 ย  - hydra/job_logging: colorlog
ย  3 ย  - hydra/hydra_logging: colorlog

into

ย  1 defaults:
ย  2 ย  - override hydra/job_logging: colorlog
ย  3 ย  - override hydra/hydra_logging: colorlog
can fix the issue

To Reproduce

(Write your steps here:)

pip install .
make examples
cd examples/cifar
pip install -r requirements
./train.py db.name=cifar100 model=mobilenet quant.bits=3 quant.qat=True

Expected behavior

(Write what you thought would happen.)

Actual Behavior

(Write what happened. Add screenshots, if applicable.)

Your Environment

  • Python and PyTorch version:
  • Operating system and version (desktop or mobile):
  • Hardware (gpu or cpu, amount of RAM etc.):

Will diffq make model faster?

โ“ Questions

I try diffq, size of model file get smaller. But during inference, the gpu memory cost and inference speed keep almost the same. Is this usual?

where the activation/feature-map is quantized?

โ“ Questions

(Please ask your question here.)
Hi after studying code, I found that the UniformQuantizer quantize the layer weights using forward pre hook
But what confused me is that, I have not found where is the quantize operation on activation.
seems that the hook do not operate the input
Thanks

Quantized Model Output NaN / 0

โ“ Questions

Hi, I want to apply DiffQ to my source separation model with PyTorch Lightning framework, and I added the quantizer following the Usage on ReadMe.

In my callback function during training, it works fine when evaluating the unquantized model with SDR.
But when I use the quantized model to run separation on MUSDB test set or other pop songs, the separation result is NaN or contains lots of 0s.

Do you have any comments or suggestions on this? Hope to get your reply!
Sincerely,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.