Coder Social home page Coder Social logo

varadgunjal / clipa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ucsc-vlaa/clipa

0.0 0.0 0.0 16.5 MB

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

License: Apache License 2.0

Shell 2.59% Python 97.41%

clipa's Introduction

An Inverse Scaling Law for CLIP Training

This repo contains official Pytorch and JAX implementation of CLIPA in our paper: An Inverse Scaling Law for CLIP Training

Overview of the Inverse Scaling Law: larger image/text encoders enable training with fewer image/text tokens while maintaining competitive performance

๐Ÿ“ฐ News

[2023.10.4] We have achieved a successful scale-up of our model to bigG/14, attaining an impressive 83.0% zero-shot top-1 accuracy on the ImageNet-1K dataset. For the detailed training configuration, please refer to the t5x branch. Additionally, you can access the pre-trained and fine-tuned weights for both JAX and PyTorch version in the Google Drive.

[2023.9.21] Our paper is accepted by NeurIPS 2023!

[2023.6.16] We release CLIPA-v2. Compared to the prior best publicly available CLIP model, our CLIPA-v2 can be trained significantly faster and yields stronger performance. Our best model is H/14@336x336 on DataComp-1B with an accuracy of 81.8, and its estimated training cost is <$15k!

[Note] All of our CLIPA-v2 models were trained on TPU using our JAX codebase. We followed the same pre-training process as CLIPA-v1, but with a more efficient fine-tuning strategy. In order to replicate our results, we have provided the training configuration (e.g. the H-14 model in this folder here), along with the pre-trained weights, configuration, and logs, which can be found here.

data Schedule GPU Hours Estimated Cost zero-shot IN-1K model weight
H/14 LAION-2B 12.8B@84 + 512M@224 + 128M@336 8640 $13613 79.1 PyTorch / JAX
L/14 DataCOMP-1B 12.8B@84 + 512M@224 +128M@336 4520 $7124 80.3 PyTorch / JAX
H/14 DataCOMP-1B 12.8B@84 + 512M@224 + +128M@336 8640 $13613 81.8 PyTorch / JAX
bigG/14 DataCOMP-1B 12.8B@84 + 512M@224 + +128M@336 23742 $39056 83.0 PyTorch / JAX

Our CLIPA-v2โ€™s GPU hour is estimated using an 8-A100 80GB GPU machine on Google Cloud. The corresponding training cost is estimated based on 80GB A100โ€™s cloud pricing.

Introduction

CLIP, the first foundation model that connects images and text, has enabled many recent breakthroughs in computer vision. However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration. In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law.

As a result of this finding, we are able to successfully train CLIP even by using academic resources. For example, on an A100 eight-GPU server, our CLIP models achieve zero-shot top-1 ImageNet accuracies of 63.2% in about 2 days, 67.8% in about 3 days, and 69.3% in about 4 days. By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics.

TPU Usage

Our experiments are conducted on both GPUs and TPUs. Both the JAX and PyTorch implementations enable TPU training. But how to gain access and setup TPU machines? Check this brief doc. In a nutshell, you can access TPU machines on Google Cloud for free!

License

This project is under the Apache 2.0 License.

Acknowledgement

The jax repo is built on big vision, and the pytorch repo is built on OpenCLIP. We've also borrowed some code from TIMM and MAE. Many thanks to the awesome works from the open-source community!

We are also very grateful that this work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.

Citation

@inproceedings{li2023clipa,
      title={An Inverse Scaling Law for CLIP Training}, 
      author={Xianhang Li and Zeyu Wang and Cihang Xie},
      booktitle={NeurIPS},
      year={2023},
}
@article{li2023clipav2,
      title={CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy}, 
      author={Xianhang Li and Zeyu Wang and Cihang Xie},
      journal={arXiv preprint arXiv:2306.15658},
      year={2023},
}

Contact

If you have any questions, please feel free to raise an issue or contact us directly: Xianhang Li: [email protected]; Zeyu Wang: [email protected]

clipa's People

Contributors

xhl-video avatar zw615 avatar cihangxie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.