Coder Social home page Coder Social logo

transtic's Introduction

TransTIC: Transferring Transformer-based Image Compression from Human Visualization to Machine Perception

Accpeted to ICCV 2023

This repository contains the source code of our ICCV 2023 paper TransTIC arXiv.

Abstract

This work aims for transferring a Transformer-based image compression codec from human vision to machine perception without fine-tuning the codec. We propose a transferable Transformer-based image compression framework, termed TransTIC. Inspired by visual prompt tuning, we propose an instance-specific prompt generator to inject instance-specific prompts to the encoder and task-specific prompts to the decoder. Extensive experiments show that our proposed method is capable of transferring the codec to various machine tasks and outshining the competing methods significantly. To our best knowledge, this work is the first attempt to utilize prompting on the low-level image compression task.

Install

git clone https://github.com/NYCU-MAPL/TransTIC
cd TransTIC
pip install -U pip && pip install -e .
pip install timm tqdm click

Install Detectron2 for object detection and instance segementation.

Dataset

The following datasets are used and needed to be downloaded.

  • Flicker2W (download here, and use this script for preprocessing)
  • ImageNet1K
  • COCO 2017 Train/Val
  • Kodak

Example Usage

Specify the data paths, target rate point, corresponding lambda, and checkpoint in the config file accordingly.

Base Codec (for PSNR)

python examples/train.py -c config/base_codec.yaml

Classification

python examples/classification.py -c config/classification.yaml
Add argument -T for evaluation.

Object Detection

python examples/detection.py -c config/detection.yaml
Add argument -T for evaluation.

Instance Segmentation

python examples/segmentation.py -c config/segmentation.yaml
Add argument -T for evaluation.

Pre-trained Weights

Tasks
Base codec (TIC) 1 2 3 4
Classification 1 2 3 4
Object Detection 1 2 3 4
Instance Segmentation 1 2 3 4

Citation

If you find our project useful, please cite the following paper.

@inproceedings{TransTIC,
  title={TransTIC: Transferring Transformer-based Image Compression from Human Visualization to Machine Perception},
  author={Chen, Yi-Hsin and Weng, Ying-Chieh and Kao, Chia-Hao and Chien, Cheng and Chiu, Wei-Chen and Peng, Wen-Hsiao},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={},
  year={2023}
}

Ackownledgement

Our work is based on the framework of CompressAI. The base codec is adopted from TIC/TinyLIC and the prompting method is modified from VPT. We thank the authors for open-sourcing their code.

transtic's People

Contributors

africanxadmiral avatar joek6279 avatar nycu-mapl avatar jamesqian1999 avatar tl32rodan avatar

Stargazers

SakuHx avatar Shoma Iwai avatar  avatar Michał Dyczko avatar  avatar LXie avatar Qi Zhang avatar Sa Yan avatar Vinh Van Duong avatar TianYuan avatar  avatar  avatar Zhisen Tang avatar Chen-Kai Chang avatar YAO, YI-CHEN avatar fei tao avatar FishNotFish avatar JefferyHe avatar TSAI, SHIAU RUNG avatar  avatar  avatar  avatar Tseng, Yu-Jen avatar

Watchers

 avatar  avatar

transtic's Issues

Question about training detail

Excellent work! I'm attempting to reproduce this research, but I find that some of the training details in the paper are not entirely clear. I was wondering if you could provide information on the training hardware, the number of epochs used, and approximately what the training speed was like (maybe approximate training hours or days)? Thanks a lot!
I'm looking forward to your response.

I have questions about two parameters

Firstly, I would like to express my gratitude to the author for open-sourcing this project, which has provided significant support for my work. I have a couple of questions regarding two parameters: MODEL_DECODER and VPT_LAMBDA. What does the first parameter signify? Is it a switch for saving weights or for training the decoder? Regarding the second parameter, should the same setting be maintained for all four models? I look forward to and appreciate your response. Could you also provide an email for further communication?

In detection.py, distortion_ loss is very large

Hello, when I run detection.py, bpp_ Loss is 0.3, Distortion_ Loss decreased from 54030, and the P2, P3, P4, P5, and P6 generated by d and x_hat are all on the order of 8e+02, but I did not make any modifications. May I ask if this is a normal situation?

Details for reproduction of Figure A5. on paper?

Hi! Thank you for sharing your great work!
Currently I'm trying to reproduce detection results on your paper, especially on Figure A5. (c).
I was able to get the result of TIC and TransTIC thanks to your sharing of pretrained weights.
But I can't reproduce the result of VVC (VTM-20.0) which shows better performance than TIC.
Can you share the details for VVC coding process for the experiment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.