Coder Social home page Coder Social logo

imgcomp-cvpr's Introduction

Conditional Probability Models for Deep Image Compression

TensorFlow implementation of Conditional Probability Models for Deep Image Compression, published in CVPR 2018.

Prerequisites

  • Download checkpoints/pre-trained models here and extract them to ckpts
  • Python 3 (tested with Python 3.4.5)
  • TensorFlow (tested with tensorflow-gpu version 1.4.1)
  • Python packages as specified by requirements.txt (pip install -r requirements.txt)
  • A CUDA-compatible GPU

Notes about naming in code vs. paper

  • qbar is the output of Eq. (4), i.e., qhard in the forward pass, and qsoft in the backward pass, where qhard corresponds to Eq. (2) and qsoft to Eq. (3).
  • We quantize z to one value in the centers C. We refer to the index in C as symbols. So, if e.g. C = {-2, 1, 0, 1, 2}, and z=0.75, z is quantized to C[3] = 1, making qhard = 1 and symbol = 3 (indices start from 0).
  • Our context model (Fig. 3) is called a probability classifier in the code, since it resembels a classifier (predicting the symbols). The relevant file is probclass.py, and is frequently abbreviated to pc.
  • The auto-encoder is found in autoencoder.py and abbreviated ae.
  • The importance map is called heatmap in the code.

Inference

Plot Kodak

NOTE: Inference only works on CUDA-compatible GPUs.


To do inference, use the following command

python val.py ../ckpts MODEL_ID DATASET --save_ours

where MODEL_ID is one of

  • 0515_1103: Point A in the plot (on Kodak: bpp: 0.370, MS-SSIM: 0.975)
  • 0515_1309: Point B in the plot (on Kodak: bpp: 0.677, MS-SSIM: 0.987)
  • 0515_1310: Point C in the plot (on Kodak: bpp: 1.051, MS-SSIM: 0.992)

and DATASET is either the path to a directory of png files or an escaped glob (e.g., some/images/\*/\*.jpg). All images readable with PIL should be supported.

This will save outputs in ckpts/MODEL_ID\ DATASET/imgs and display the mean bpp and MS-SSIM on console. Detailed measures per image are written to ckpts/MODEL_ID\ DATASET/measures.csv. Note that some images may be padded.

Encoding to bitstream

By default, val.py will use cross entropy to estimate the actual bitrate. In our experiments, this is very close to the real bitrate (<0.1% difference for most images). But to evaluate this yourself, you can use

python val.py ../ckpts MODEL_ID DATASET --save_ours --real_bpp

which will use an arithmetic encoder to write the symbols of an image to a file, count the number of bits, and then decode the bits to restore the symbols. We note that this is not optimized at all (images from the Kodak validation set take ~350s to encode and ~200s to decode). For a practical implementation, the following should be done:

  • A faster arithmetic encoder should be used (we use the clean but non-optimized code from here).
  • The probability classifier network should output the logits for all symbols in parallel, instead of sequentially.
  • Decoding should re-use activations, as in Fast PixelCNN++, which achieves speedups of up to 183x.
  • Like in classical approaches, the image could be split into blocks and those blocks could be encoded in parallel.

Plot

The plot above was created using

python plotter.py ../ckpts 0515_1103,0515_1309,0515_1310 kodak --style mean --ids A B C --latex

For reference, the curve corresponding to our model in Fig. 1 in the paper can be reproduced with the following data:

# bpp -> MS-SSIM on Kodak
CVPR_FIG1 = [
    (0.1265306, 0.9289356),
    (0.1530612, 0.9417454),
    (0.1795918, 0.9497924),
    (0.2061224, 0.9553684),
    (0.2326531, 0.9598574),
    (0.2591837, 0.9636625),
    (0.2857143, 0.9668663),
    (0.3122449, 0.9695684),
    (0.3387755, 0.9718446),
    (0.3653061, 0.9738012),
    (0.3918367, 0.9755308),
    (0.4183673, 0.9770696),
    (0.4448980, 0.9784622),
    (0.4714286, 0.9797252),
    (0.4979592, 0.9808753),
    (0.5244898, 0.9819255),
    (0.5510204, 0.9828875),
    (0.5775510, 0.9837722),
    (0.6040816, 0.9845877),
    (0.6306122, 0.9853407),
    (0.6571429, 0.9860362),
    (0.6836735, 0.9866768),
    (0.7102041, 0.9872690),
    (0.7367347, 0.9878184),
    (0.7632653, 0.9883268),
    (0.7897959, 0.9887977),
    (0.8163265, 0.9892346),
    (0.8428571, 0.9896379)]

Training

If you want to train on the ImageNet dataset as described in the paper, follow the steps below (Prepare ImageNET). After doing this, you can pass --dataset_train imgnet_train --dataset_test imgnet_test to train.py (make sure you set $RECORDS_ROOT for this, see below). Otherwise, set --dataset_train and --dataset_test to an escaped glob matching images files (e.g. some/images/\*/\*.jpg).

python train.py ae_configs/cvpr/AE_CONFIG pc_configs/cvpr/PC_CONFIG \
        --dataset_train TRAIN_DATASET \
        --dataset_test TEST_DATASET \
        --log_dir_root LOG_DIR_ROOT

where AE_CONFIG and PC_CONFIG are one of the configs in the respective folders. The models in ckpts where obtained with the following configs:

  • 0515_1103: ae_configs/cvpr/low pc_configs/cvpr/res_shallow
  • 0515_1309: ae_configs/cvpr/med pc_configs/cvpr/res_shallow
  • 0515_1310: ae_configs/cvpr/high pc_configs/cvpr/res_shallow

Various options are available for train.py, such as --restore to continue training from a previous checkpoint. See python train.py -h.

Prepare ImageNET

The following instructions assume that you have the following tools installed:

  • GNU parallel (you can do without but it might take a really long time. Installing should be as simple as (wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash, see e.g. here)
  • ImageMagick to downscale images to 256 pixels
  • fjcommon (pip install fjcommon) to create TF Records

Note that creating all records will likely take several hours. Note that the following was tesed using zsh.

1. Get ImageNET, in the proper format

You need to download ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar (a good resource is the Inception download_imagenet.sh script). For the following instructions, we assume both tar files are located in a directory data.

    # in data/

    mkdir train val

    pushd train
    tar xvf ../ILSVRC2012_img_train.tar
    popd

    pushd val
    tar xvf ../ILSVRC2012_img_val.tar
    popd

This will unpack to 1000 .tar containers into train/ and 50000 .JPEG images into val/. Now, we need to extract the training images. This may take a while depending on your setup.

    # in data/
    pushd train
    find . -name "n*.tar" | parallel -j64 'mkdir -vp {/.} && tar xf {} -C {/.}'
    popd

2. Downsample

We downsample each image to have 256 pixels on the shorter side, by executing the following command in data/. Again, this is very time-consuming, so if you have access to some CPU cluster, it might make sense to run it there.

    # in data/
    find . -name "*.JPEG" |  parallel -j64 convert -verbose {} -resize "256x256^" {}

3. Create records

Now it's time to pack the images into TF record files. We will save them in data/records/:

    # in data/
    mkdir -p records/train
    mkdir -p records/val

    pushd train
    find . -name "*.JPEG" | parallel --bar -j64 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/train/train-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
    popd

    pushd val
    find . -name "*.JPEG" | parallel --bar -j16 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/val/val-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
    popd

4. Set RECORDS_ROOT

Make sure the following environment variable is set before running train.py:

    export RECORDS_ROOT=path_to_data/records

Citation

If you use this code for your research, please cite this paper:

@inproceedings{mentzer2018conditional1,
    Author = {Mentzer, Fabian and Agustsson, Eirikur and Tschannen, Michael and Timofte, Radu and Van Gool, Luc},
    Booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    Title = {Conditional Probability Models for Deep Image Compression},
    Year = {2018}}

imgcomp-cvpr's People

Contributors

fab-jul avatar fmentzer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imgcomp-cvpr's Issues

Softmax with _HARD_SIGMA

In quantizer.py (line 82), there are two softmax usages over distances. I wonder if there is any special meaning to phi_hard softmax.

I would say that argmax could have been done directly on distances, but I might be missing something here.

Decoding of Bitstream.

Hello, there are some questions about the decoding of Bitstream.
1、How can we get the distribution of each location in the adaptive arithmetic decoding phase, by computing the distribution of each location one by one as an auto-regressive model through the context model?
2、 And what kind of information should be encoded into the bitstream?
Looking forward to your reply.

Saving quantized images.

If I save the quantized volume after the quantizer, the numpy file is larger in size than the original image. How to save the compressed form so that it takes lesser memory than the original image ?

about soft and hard quantization

hello, I don't figure out that when testing the model , why to use the hardout for the input of the P network .In my opinions ,the hardout and the symbols are equal in a sense .

compressed images are large

Hello!
I used your models and my trained models to compress images. Even though the bpp is low, the images reconstructed are large in size, e.g. the original image is 671.48kB, the compressed image is 500kB of 0.388 bpp, however, the image compressed by JPEG is 350kB of 0.6 bpp. I cannot find out why the JPEG-compressed image of a larger bpp is smaller than the model-compressed image of a small bpp.

inference using real_bpp

Hello!

When I run val.py and I pass --real_bpp, I got an error "Expected bpp_theory to match loss! ". I can't understand why ? bpp_real and bpp_theory are equal. You can see the results in the pic below.

Thank you.

Capture

Question about 'distortion_to_minimize'

hello!

If i want to optimized with mse, What should I change to mse?
I thought
"imgcomp-cvpr/code/ae_configs/cvpr/base
distortion_to_minimize=ms_ssim
-> distortion_to_minimize=mse" to optimize to mse.
Is it right???

Thank you.

log_dir_root

I don't the meaning of "log_dir_root" in the training process

inference error

I‘m sorry!I did just as your advice said,but the error still exist:
1
here is The directory structure of the code:
2
I wonder whether the error is related to the directory stucture.Hoping for your reply!

training

In the train process:
python train.py ae_configs/cvpr/low pc_configs/cvpr/res_shallow
--dataset_train ~/data/train
--dataset_test ~/data/val
--log_dir_root ~/imgcomp-cvpr-master/code/log

I got this error:
*- STARTING TRAINING ------------------------------------------------------------
Traceback (most recent call last):
File "train.py", line 530, in
main()
File "train.py", line 526, in main
description=flags.description if not flags.temporary else None)
File "train.py", line 194, in train
train_flags, logdir, saver, is_restored=restore_manager is not None)
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/contextlib.py", line 66, in exit
next(self.gen)
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/sitepackages/fjcommon/tf_helpers.py", l
ine 39, in start_queues_in_sess coord.join(threads)
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/sitepackages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(self._exc_info_to_raise)
File
"/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/site-packages/six.py", line 703, in reraise raise value
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/sitepackages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run enqueue_callable()
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/sitepackages/tensorflow/python/client/session.py", line 1231, in _single_operation_run target_list_as_strings, status, None)
File "/public/home/xqqstu/anaconda3/envs/py35/lib/python3.5/sitepackages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: /public/home/xqqstu/data/train; Is a directory
[[Node: input_glob__public_home_xqqstu_data_train/images_decoded/ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](input_glob__public_home_xqqstu_data_train/images_decoded/WholeFileReaderV2, input_glob__public_home_xqqstu_data_train/images_decoded/input_producer)]]

train stop at"-STARTING TRAINING-------------"

I am sorry for bothering you again, but please allow me to show my issue for the last time. After I prepared all the environment, including the python packages and TFrecords, my training always stopped at the string "-STARTING TRAINING-------------", then it won't show any infomation at all ,it just stopped there, and will never finish itself.I don't know why.Here is my training command:

why only mask the last channel?

I noticed that in create_first/other_mask functions in probclass.py, you only set the causality mask on the last one on the C or D channel, which is different to Algo 1 in the supplemental material, where causality are supposed to be set across the C/D.
In other words, why not
mask[:, K // 2, K // 2:] instead of mask[-1, K // 2, K // 2:] ?

Thank you

CodecDistance

Sorry to bother you again!but a warming appers when I am train a model using ImageNet :

python version

After I created the python3.4.5,I used "pip install -r requirements.txt" to install the packages,a warning appeared:imageio requires Python '>=3.5' but the running Python is 3.4.5.I don't whether I should switch to python3.5

create tf_records argument error

when I used the follow comand to create tf_records of imagenet datasets:

    pushd train
    find . -name "*.JPEG" | parallel --bar -j64 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/train/train-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
    popd

    pushd val
    find . -name "*.JPEG" | parallel --bar -j16 -N 1250 \
        'OUTPUT_PATH=$(printf "../records/val/val-%05d.tfrecord" {#});' \
        'python -m fjcommon tf_records mk_img_rec {} -o $OUTPUT_PATH --feature_key image/encoded'
    popd

I got the error log:

usage: __main__.py [-h] {mk_img_recs,mk_img_recs_dist,join,extract,check} ...
__main__.py: error: argument mode: invalid choice: 'mk_img_rec' (choose from 'mk_img_recs', 'mk_img_recs_dist', 'join', 'extract', 'check')

The size of your trained model

@fab-jul
hello~The Model-size of ckpts provied by you(0515_1103) is 136530KB = 1KB(checkpoint) +117723KB(.data) + 32KB(.index) + 18687KB(.meta) + 60KB(.pkl).
But the size of the model I trained with the network you provided is 232814 KB = 1KB(checkpoint) + 196160KB(.data) + 47KB(.index) + 36511KB(.meta) + 95KB(.pkl).

I think the model-size trained on the same network should be the same. So what i am confused about is whether the ckpts you provided are trained by the network you gave here? What is the difference between the network you trained and the network provided here?
I am looking forward to your reply.

bpp comparison

I want to compare the bpp between real value and estimated value, and I put the Kodak dataset named 'kodak' under the folder "code/ckpts", why I use the command "python val.py ../ckpts 0515_1103 ../ckpts/kodak --save_ours --real_bpp", I cannot see any results unless "***All given job_ids validated"?. Is there something wrong?

restore

Hello, when I using the argument of --restore, there is something wrong:
OutOfRangeError (see above for traceback): RandomShuffleQueue '2_input_glob__data_wency_CLIC_mobile_train_.png/shuffle_batch_join/random_shuffle_queue' is closed and has insufficient elements (requested 30, current size 0)

Can you give me some advice? Thank you~

entropy estimation

Hello, I am very confused of the part of context model in your code, what's the output of the context model represents? And what's the meaning of pc_loss = beta * tf.maximum(H_soft - H_target, 0), the reason for setting H_target is for what? Thank you very much and looking forward to your reply~

plot

In the plot chapter,how can we get the *.csv files

performance issue

@fab-jul Excuse me.The MS-SSIM of the model I trained is almost as high as described in the article, but the psnr and ssim have no performance of JPEG compression. Is this the case for your training model?

How to optimize with mse?

Hello
I don't know if it still solves the issue related to the code, but I'll share it with you.

I would like to optimize the code you provided to me in mse.

I thought
"imgcomp-cvpr/code/ae_configs/cvpr/base
distortion_to_minimize=ms_ssim
-> distortion_to_minimize=mse" to optimize to mse.

However, the above method causes the following error.
For your information, there is nothing fixed except a distortion (ms-ssim optimization did a good job).

error

I tried to solve it by myself, but I'm a beginner, so I don't know how to solve it.
Thank you for sharing such a wonderful code. I'd appreciate it if you could help me with my problem.

Thank you :)

about soft_quantize sigma parameter

In the quantizer.py file, soft quantization is used. How do I update the sigma parameter during training? Is sigma only 1 during training?

thank you

quantize center

Hello, the center of the quantized value is selected randomly, why? If it was randomly allocated, how to realize the part of arithmetic coding? Looking for your reply~

how to train a lower bit rate model

Excuse me. I'm confused with the configurations of training part.
I want to train a lower bit rate model such as bpp 0.2 or lower.
I used the ae_configs/cvpr/low directly or modified it by H_target = 2*0.1.
But after about 200000 iterations the average bpp of models on Kodak dataset are still about 0.35.
How can I train a lower bit rate model on Kodak dataset?
Thanks!

test with val.py

Hi, I cannot fully understand your parameters in val.py, such as job_ids, what kind of parameter should i give? I very appricate that you can give a detailed explation for others parameters. Thanks a lot.

test error

I just did all as the README said,but when I run the test code:val.py like this: python val.py ../ckpts 0515_1103 ./Kodak --save_ours.A error occurs like this:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.