Coder Social home page Coder Social logo

duanzhiihao / lossy-vae Goto Github PK

View Code? Open in Web Editor NEW
53.0 5.0 6.0 4.23 MB

Authors' PyTorch implementation of lossy image compression methods that are based on hierarchical VAEs

Python 28.63% Jupyter Notebook 71.37%
lossy-compression lossy-image-compression pytorch vae variational-autoencoder

lossy-vae's Introduction

Lossy Image Compression using Hierarchical VAEs

This repository contains authors' implementation of several deep learning-based methods related to lossy image compression.
This project is under active development.

Models

Implemented Methods (Pre-Trained Models Available)

  • Lossy Image Compression with Quantized Hierarchical VAEs [arxiv] [cvf] [ieee]
  • QARV: Quantization-Aware ResNet VAE for Lossy Image Compression [arxiv] [ieee]
    • Published at TPAMI 2023
    • Abstract: an improved version of the previous model; Variable-rate, faster decoding, better performance.
    • [Code & pre-trained models]: lossy-vae/lvae/models/qarv
  • An Improved Upper Bound on the Rate-Distortion Function of Images [arxiv] [ieee]
    • Published at ICIP 2023
    • Abstract: a 15-layer VAE model used to estimate the information R(D) function. This model proves that -30% BD-rate w.r.t. VTM is theoretically achievable.
    • [Code & pre-trained models]: lossy-vae/lvae/models/rd

Features

Progressive coding: our models learn a deep hierarchy of latent variables and compress/decompress images in a coarse-to-fine fashion. This feature comes from the hierarchical nature of ResNet VAEs.

Compression performance: our models are powerful in terms of both rate-distortion and decoding speed. Please see the results section below.

Results

Bpp-PSNR results in JSON format

Notes on metric computation:

  • Bpp and PSNR are first compute for each image and then averaged over all images in a dataset.
  • Bpp is the saved file size (in bits) divided by # of image pixels.
  • PSNR is computed in RGB space (not YUV).

Encoding/decoding latency on CPU/GPU, and BD-rate

Model Name CPU* Enc. CPU* Dec. 3080 ti Enc. 3080 ti Dec. BD-rate* (lower is better)
qres34m 0.899s 0.441s 0.116s 0.083s -3.95 %
qarv_base 0.757s 0.295s 0.096s 0.063s -7.26 %

*Time is the latency to encode/decode a 512x768 image, averaged over 24 Kodak images. Tested in plain PyTorch (v1.13 + CUDA 11.7) code, ie, no mixed-precision, torchscript, ONNX/TensorRT, etc.
*CPU is Intel 10700k.
*BD-rate is w.r.t. VTM 18.0, averaged on three common test sets (Kodak, Tecnick TESTIMAGES, and CLIC 2022 test set).

Install

Requirements:

Download and Install:

  1. Download the repository;
  2. Modify the dataset paths in lossy-vae/lvae/paths.py.
  3. [Optional] pip install the repository in development mode:
cd /pasth/to/lossy-vae
python -m pip install -e .

Usage

Get pre-trained weights

from lvae import get_model
model = get_model('qarv_base', pretrained=True) # weights are downloaded automatically
model.eval()
model.compress_mode(True) # initialize entropy coding

Compress images

Encode an image:

model.compress_file('/path/to/image.png', '/path/to/compressed.bits')

Decode an image:

im = model.decompress_file('/path/to/compressed.bits')
# im is a torch.Tensor of shape (1, 3, H, W). RGB. pixel values in [0, 1].

Datasets

COCO

  1. Download the COCO dataset "2017 Train images [118K/18GB]" from https://cocodataset.org/#download
  2. Unzip the images anywhere, e.g., at /path/to/datasets/coco/train2017
  3. Edit lossy-vae/lvae/paths.py such that
known_datasets['coco-train2017'] = '/path/to/datasets/coco/train2017'

Kodak (link), Tecnick TESTIMAGES (link), and CLIC (link)

python scripts/download-dataset.py --name kodak         --datasets_root /path/to/datasets
                                          clic2022-test
                                          tecnick

Then, edit lossy-vae/lvae/paths.py such that known_datasets['kodak'] = '/path/to/datasets/kodak', and similarly for other datasets.

Custom Dataset

  1. Prepare a folder containing images. The folder should contain only images (may contain subfolders).
  2. Edit lossy-vae/lvae/paths.py such that known_datasets['custom-name'] = '/path/to/my-dataset', where custom-name is the name of your dataset, and /path/to/my-dataset is the path to the folder containing images.
  3. Then, you can use custom-name as the dataset name in the training/evaluation scripts.

Training and evaluation scripts

Training and evaluation scripts vary from model to model. For example, qres34m uses fixed-rate train/eval scheme, while qarv_base uses variable-rate train/eval scheme.
Detailed training/evaluation instructions are provided in each model's subfolder (see the section Models).

License

Code in this repository is freely available for non-commercial use.

lossy-vae's People

Contributors

duanzhiihao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lossy-vae's Issues

Availability of lmb=1024 for qres17m model

Hi Zhihao,
Thanks for sharing this great codebase.
Is the checkpoint for lambda=1024 for the qres17m model available to download? I saw it was mentioned in Fig. 5(a) of the paper, but not available for download.
Thanks!

Lossless one code

Hi Zhihao,

Could you please also share or publish the code for lossless image compression with the trained model?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.