Coder Social home page Coder Social logo

nvlabs / fastervit Goto Github PK

View Code? Open in Web Editor NEW
745.0 18.0 60.0 1.31 MB

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

Home Page: https://arxiv.org/abs/2306.06189

License: Other

Dockerfile 0.02% Python 91.93% Shell 0.47% C++ 0.69% Cuda 6.90%
ade20k backbone deep-learning image-net pre-trained-model self-attention vision-transformer visual-recognition coco object-detection semantic-segmentation foundation-models image-classification

fastervit's Introduction

FasterViT: Fast Vision Transformers with Hierarchical Attention

Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention.

Star on GitHub

Ali Hatamizadeh, Greg Heinrich, Hongxu (Danny) Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing


FasterViT achieves a new SOTA Pareto-front in terms of Top-1 accuracy and throughput without extra training data !

We introduce a new self-attention mechanism, denoted as Hierarchical Attention (HAT), that captures both short and long-range information by learning cross-window carrier tokens.

teaser

Note: Please use the latest NVIDIA TensorRT release to enjoy the benefits of optimized FasterViT ops.

πŸ’₯ News πŸ’₯

  • [04.02.2024] πŸ”₯ Updated manuscript now available on arXiv !
  • [01.24.2024] πŸ”₯πŸ”₯πŸ”₯ Object Tracking with MOTRv2 + FasterViT is now open-sourced (link) !
  • [01.17.2024] πŸ”₯πŸ”₯πŸ”₯ FasterViT paper has been accepted to ICLR 2024 !
  • [10.14.2023] πŸ”₯πŸ”₯ We have added the FasterViT object detection repository with DINO !
  • [08.24.2023] πŸ”₯ FasterViT Keras models with pre-trained weights published in keras_cv_attention_models !
  • [08.20.2023] πŸ”₯πŸ”₯ We have added ImageNet-21K SOTA pre-trained models for various resolutions !
  • [07.20.2023] We have created official NVIDIA FasterViT HuggingFace page.
  • [07.06.2023] FasterViT checkpoints are now also accecible in HuggingFace!
  • [07.04.2023] ImageNet pretrained FasterViT models can now be imported with 1 line of code. Please install the latest FasterViT pip package to use this functionality (also supports Any-resolution FasterViT models).
  • [06.30.2023] We have further improved the TensorRT throughput of FasterViT models by 10-15% on average across different models. Please use the latest NVIDIA TensorRT release to use these throughput performance gains.
  • [06.29.2023] Any-resolution FasterViT model can now be intitialized from pre-trained ImageNet resolution (224 x 244) models.
  • [06.18.2023] We have released the FasterViT pip package !
  • [06.17.2023] Any-resolution FasterViT model is now available ! the model can be used for variety of applications such as detection and segmentation or high-resolution fine-tuning with arbitrary input image resolutions.
  • [06.09.2023] πŸ”₯πŸ”₯ We have released source code and ImageNet-1K FasterViT-models !

Quick Start

Object Detection

Please see FasterViT object detection repository with DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection for more details.

Classification

We can import pre-trained FasterViT models with 1 line of code. Firstly, FasterViT can be simply installed:

pip install fastervit

Note: Please upgrate the package to fastervit>=0.9.8 if you have already installed the package to use the pretrained weights.

A pretrained FasterViT model with default hyper-parameters can be created as in:

>>> from fastervit import create_model

# Define fastervit-0 model with 224 x 224 resolution

>>> model = create_model('faster_vit_0_224', 
                          pretrained=True,
                          model_path="/tmp/faster_vit_0.pth.tar")

model_path is used to set the directory to download the model.

We can also simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 224, 224)
>>> output = model(image) # torch.Size([1, 1000])

We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0 model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of 64:

>>> from fastervit import create_model

# Define any-resolution FasterViT-0 model with 576 x 960 resolution
>>> model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

Note that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.)

We can test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 576, 960)
>>> output = model(image) # torch.Size([1, 1000])

Catalog

  • ImageNet-1K training code
  • ImageNet-1K pre-trained models
  • Any-resolution FasterViT
  • FasterViT pip-package release
  • Add capablity to initialize any-resolution FasterViT from ImageNet-pretrained weights.
  • ImageNet-21K pre-trained models
  • Detection code + models

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

Name Acc@1(%) Acc@5(%) Throughput(Img/Sec) Resolution #Params(M) FLOPs(G) Download
FasterViT-0 82.1 95.9 5802 224x224 31.4 3.3 model
FasterViT-1 83.2 96.5 4188 224x224 53.4 5.3 model
FasterViT-2 84.2 96.8 3161 224x224 75.9 8.7 model
FasterViT-3 84.9 97.2 1780 224x224 159.5 18.2 model
FasterViT-4 85.4 97.3 849 224x224 424.6 36.6 model
FasterViT-5 85.6 97.4 449 224x224 975.5 113.0 model
FasterViT-6 85.8 97.4 352 224x224 1360.0 142.0 model

ImageNet-21K

FasterViT ImageNet-21K Pretrained Models (ImageNet-1K Fine-tuned)

Name Acc@1(%) Acc@5(%) Resolution #Params(M) FLOPs(G) Download
FasterViT-4-21K-224 86.6 97.8 224x224 271.9 40.8 model
FasterViT-4-21K-384 87.6 98.3 384x384 271.9 120.1 model
FasterViT-4-21K-512 87.8 98.4 512x512 271.9 213.5 model
FasterViT-4-21K-768 87.9 98.5 768x768 271.9 480.4 model

Raw pre-trained ImageNet-21K model weights for FasterViT-4 is also available for download in this link.

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

Name A-Acc@1(%) A-Acc@5(%) R-Acc@1(%) R-Acc@5(%) V2-Acc@1(%) V2-Acc@5(%)
FasterViT-0 23.9 57.6 45.9 60.4 70.9 90.0
FasterViT-1 31.2 63.3 47.5 61.9 72.6 91.0
FasterViT-2 38.2 68.9 49.6 63.4 73.7 91.6
FasterViT-3 44.2 73.0 51.9 65.6 75.0 92.2
FasterViT-4 49.0 75.4 56.0 69.6 75.7 92.7
FasterViT-5 52.7 77.6 56.9 70.0 76.0 93.0
FasterViT-6 53.7 78.4 57.1 70.1 76.1 93.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Installation

We provide a docker file. In addition, assuming that a recent PyTorch package is installed, the dependencies can be installed by running:

pip install -r requirements.txt

Training

Please see TRAINING.md for detailed training instructions of all models.

Evaluation

The FasterViT models can be evaluated on ImageNet-1K validation set using the following:

python validate.py \
--model <model-name>
--checkpoint <checkpoint-path>
--data_dir <imagenet-path>
--batch-size <batch-size-per-gpu

Here --model is the FasterViT variant (e.g. faster_vit_0_224_1k), --checkpoint is the path to pretrained model weights, --data_dir is the path to ImageNet-1K validation set and --batch-size is the number of batch size. We also provide a sample script here.

ONNX Conversion

We provide ONNX conversion script to enable dynamic batch size inference. For instance, to generate ONNX model for faster_vit_0_any_res with resolution 576 x 960 and ONNX opset number 17, the following can be used.

python onnx_convert --model-name faster_vit_0_any_res --simplify --resolution-h 576 --resolution-w 960 --onnx-opset 17

CoreML Conversion

To generate FasterViT CoreML models, please install coremltools==5.2.0 and use our provided script.

It is recommended to benchmark the performance by using Xcode14 or newer releases.

Star History

Star History Chart

Third-party Extentions

We always welcome third-party extentions/implementations and usage for other purposes. The following represent third-party contributions by other users.

Name Link Contributor Framework
keras_cv_attention_models Link @leondgarse Keras

If you would like your work to be listed in this repository, please raise an issue and provide us with detailed information.

Citation

Please consider citing FasterViT if this repository is useful for your work.

@article{hatamizadeh2023fastervit,
  title={FasterViT: Fast Vision Transformers with Hierarchical Attention},
  author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},
  journal={arXiv preprint arXiv:2306.06189},
  year={2023}
}

Licenses

Copyright Β© 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

fastervit's People

Contributors

ahatamiz avatar chohk88 avatar developer0hye avatar longlee0622 avatar molchanovp avatar saeejithnair avatar sunnyqgg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastervit's Issues

Size mismatch when loading pretrained weights.

After install fastervit(0.9.8) from pip, I tired to run the example test code:

from fastervit import create_model
    model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

However , the error occurs that the size mismatch for some params:

The model and loaded state dict do not match exactly

size mismatch for levels.2.blocks.0.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.1.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.2.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.3.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.4.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.5.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.3.blocks.0.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.1.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.2.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.3.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.4.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
unexpected key in source state_dict: levels.2.blocks.0.hat_pos_embed.relative_bias, levels.2.blocks.0.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.0.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.0.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.1.hat_pos_embed.relative_bias, levels.2.blocks.1.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.1.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.1.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.2.hat_pos_embed.relative_bias, levels.2.blocks.2.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.2.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.2.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.3.hat_pos_embed.relative_bias, levels.2.blocks.3.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.3.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.3.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.4.hat_pos_embed.relative_bias, levels.2.blocks.4.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.4.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.4.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.5.hat_pos_embed.relative_bias, levels.2.blocks.5.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.5.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.5.hat_pos_embed.cpb_mlp.2.weight

I noticed that all the mismatches are about position embedding within the window attention. Is it a bug or just work like this?

About model profiling.

Hello,

I'm intrigued by the NVIDIA DLSIM profiling tool as mentioned in Appendix H of the paper. My searches on Google haven't yielded much information about it. Could you please provide guidance on how to use it for profiling my model?

Thank you in advance.

import error

even though i used the recommended version of timm

I get the following error:


ImportError Traceback (most recent call last)
Cell In[10], line 14
12 import torch.nn as nn
13 from timm.models.registry import register_model
---> 14 from timm.models.layers import trunc_normal_, DropPath, LayerNorm2d
15 from timm.models._builder import resolve_pretrained_cfg, _update_default_kwargs
16 from .registry import register_pip_model

File /opt/conda/lib/python3.10/site-packages/timm/models/layers/init.py:40
38 from timm.layers.selective_kernel import SelectiveKernel
39 from timm.layers.separable_conv import SeparableConv2d, SeparableConvNormAct
---> 40 from timm.layers.space_to_depth import SpaceToDepthModule
41 from timm.layers.split_attn import SplitAttn
42 from timm.layers.split_batchnorm import SplitBatchNorm2d, convert_splitbn_model

ImportError: cannot import name 'SpaceToDepthModule' from 'timm.layers.space_to_depth' (/opt/conda/lib/python3.10/site-packages/timm/layers/space_to_depth.py

FSDP causes graph break in forward

Hello,

I tried to use FSDP to accelerate the training, but get graph break in forward error in PosEmbMLPSwinv2D's self.relative_bias = self.pos_emb in forward function. However the error went away when I disabled torch.compile(). In my other projects torch.compile works well with FSDP so kind of curious what happened here. Any insight is highly appreciated. Thank you!

HAT

Hello author, the work you have done is very good, I hope I can use HAT in other places, I don't know how to modify the medicine, I hope the author can teach me, thank you, I look forward to your early reply.

Window size and ct_size setting when resolution_h doesn't equal resolution_w

Hello, really impressive work! I notice that Faster-vit is trained on ImageNet with resolution (224, 224). In the code, everything is writen assuming that resolution_h equals reolution_w. However, if I need to use Faster-ViT (eg. faster_vit_0) for other resolutions such as (576, 960), here the size isn't 224 and resolution_h doesn't equal resolution_w, how should I set the window size and ct_size so that the model could achieve its best performance? Really need some advice here, thanks in advance.

Drawing Heatmap

image

I want to apply this model for my downstream task.
And it is very necessary to draw heatmap that model focus on.

So, is this model possible to draw heatmap like the picture above?
If possible, which layer I have to utilize to draw heatmap.

Thank you.

TokenInitializer for any resolution

When we create a model of 224x224, we can no longer input a image of 1024x1024 or 600x800.
How does TokenInitializer of carrier token support any resolution for multi-scale training tasks, such as detection, segmentation?

Bug in fastervit package: faster_vit_2_any_res

Downloaded FasterViT with "pip install fastervit" today (July 25th), getting error when using faster_vit_2_any_res

model = FasterViT(depths=depths,
...
   1073                   drop_path_rate=drop_path_rate,
-> 1074                   hat=hat
   1075                   **kwargs)
   1076 model.default_cfg = default_cfgs['faster_vit_2_any_res']
   1077 if pretrained:

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'dict'

It appears theres no comma after hat parameter in faster_vit_2_any_res. There does exist a comma in the github repo though.

Inquiry for detection codes

Hi author,

Thanks for your impressive work! May I ask if you have plans to release your Segmentation/ Detection codes and checkpoints?

export onnx error

when I run python onnx_convert.py
there is bugs

    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[64, 1, 1], expected input with shape [*, 64, 1, 1], but got input of size[1, 64, 56, 56]

How to solve this problem?

any_ res is not truly any resolution

Resolution=[576, 960] is right, but [1152, 1920], [1088, 1920], [1024, 1024], [384, 384], etc. are all not feasible. Therefore, it is difficult for us to conduct multi-scale training.

Object Detection with FasterViT-0 an 1

Hi, thanks for releasing the code. Excellent paper, indeed.
I just wonder if you conducted experiments with object detection using smaller backbone, such as -0 and -1 size.
In the Table 3, I see that FasterViT-2 has almost +2mAP compared to SwinT, and speed x1.78.
It gives me the interpret that V0 and V1 can have almost equal mAP with Swin-T while can be 2.5x faster (based on Table 1).

Lastly, is the speed on Table 3 reported with TensorRT or Pytorch ?
Thank you.

Detection and segmentation models

Hi , I saw in the catalog that you have plans to train models for detection and segmentation. I was wondering

  1. if you plan to modify the network structure for the detection task ?
  2. When do you plan to release the related models?
  3. I want to replace the Swin-T backbone in BevFusion's camera backbone with FasterVit and use existing ImageNet-1K pre-trained models. Do you think this would be a good solution?

Improved fastervit_any_res_0 has larger TensorRT latency than the original version

Hello, I used your effective fastervit_any_res_0 with input resolution (576, 960) as the backbone of my occupancy predcition model last month. I export the model to onnx and then to TensorRT, everything worked well. Then I noticed that you improved the TensorRT throughtput for the models. So I turn to use the new fastervit_any_res_0 (576,960) from your updated script https://github.com/NVlabs/FasterViT/blob/main/fastervit/models/faster_vit_any_res.py. However, the TensorRT latency increased instead of decreasing as I expected.

input resolution: (3, 3, 576, 960)
driver: NVIDIA geforce 3090Ti

Original faster_vit_any_res_0 :
fastervit

Improved faster_vit_any_res_0 :
fastervit_better

See, the throughputs decrease from 90 to 64, and the mean latency increases from 13.28ms to 17.92ms. I wonder why this result is totally contrary to what you wrote in the News. Please help. Thanks.

How to optimize the performance of Fastervit's int8

Fastervit has very powerful performance. Thank you for your work.

I found that for tensorrt,the int8 (best) and fp16 performances are very close, at 1.46077ms and 1.36375ms, respectively. Due to the fact that both the last two stages of the network are fused into a single Myelin layer, it is not possible to analyze the timing in detail.

If I want to improve the int8 performance of Fastervit, are there any feasible directions?

TensorRT Version: 8.6.1
machine:Tesla T4
onnx opset: 17

trtexec --onnx=./deployment/faster_vit_0_224_17.onnx --best
trtexec --onnx=./deployment/faster_vit_0_224_17.onnx --fp16

ONNX simplify error occurs when either width or height is not a multiple of 7

I'm using the provided onnx_convert.py function to export the faster_vit_0_any_res model, which accepts input images of size 800x1536, into the ONNX format.

However, I've encountered an issue when attempting to simplify the ONNX model (I included the ONNX simplify process). This problem arises when either the width or height of the image isn't divisible by 7. The error message is as follows:

onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:Slice, node name: /levels.2/blocks.0/Slice): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (3) vs (0)
root@5820-Tower:/develop/FasterViT# python onnx_convert.py --model-name faster_vit_0_any_res --resolution-h 576 --resolution-w 960 --onnx-opset 17
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/develop/FasterViT/fastervit/models/faster_vit_any_res.py:853: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_r > 0 or pad_b > 0:
/develop/FasterViT/fastervit/models/faster_vit_any_res.py:352: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  seq_length = int(seq_length**0.5)
/develop/FasterViT/fastervit/models/faster_vit_any_res.py:280: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if n_global_feature > 0 and self.ct_correct:
/develop/FasterViT/fastervit/models/faster_vit_any_res.py:297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if n_global_feature>0 and self.ct_correct:
/develop/FasterViT/fastervit/models/faster_vit_any_res.py:866: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_r > 0 or pad_b > 0:
  if n_global_feature>0 and self.ct_correct:
========== Diagnostic Run torch.onnx.export version 1.14.0a0+44dac51 ===========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Traceback (most recent call last):
  File "onnx_convert.py", line 129, in <module>
    main()
  File "onnx_convert.py", line 53, in main
    export_onnx(model,
  File "onnx_convert.py", line 107, in export_onnx
    model_simp, check = simplify(onnx_model)
  File "/usr/local/lib/python3.8/dist-packages/onnxsim/onnx_simplifier.py", line 197, in simplify
    model_opt_bytes = C.simplify(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:Slice, node name: /levels.2/blocks.0/Slice): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (3) vs (0)

For instance, the following commands work as expected:

python onnx_convert.py --model-name faster_vit_0_any_res --resolution-h 224 --resolution-w 672 --onnx-opset 17

or

python onnx_convert.py --model-name faster_vit_0_any_res --resolution-h 896 --resolution-w 896 --onnx-opset 17

However, I encounter the error when I run:

python onnx_convert.py --model-name faster_vit_0_any_res --resolution-h 224 --resolution-w 256 --onnx-opset 17

or

python onnx_convert.py --model-name faster_vit_0_any_res --resolution-h 800 --resolution-w 800 --onnx-opset 17

This error occurs during the ONNX simplification step.

I set the batch size to 1 (non-dynamic batch size) in torch.onnx.export and added ONNX simplification before shape inference, as shown in the code snippet below:

from onnxsim import simplify

def export_onnx(
    model: torch.nn.Module,
    sample_inputs,
    export_params: bool = False,
    opset_version: int = 17,
    result_dir: str = "",
    batch_first: bool = True,
    is_training: bool = False,
    onnx_file_name: str ="",
):
    f = io.BytesIO()
    torch.onnx.export(
        model,
        # ONNX has issue to unpack the tuple of parameters to the model.
        # https://github.com/pytorch/pytorch/issues/11456
        (sample_inputs,) if type(sample_inputs) == tuple else sample_inputs,
        f,
        export_params=export_params,
        training=torch.onnx.TrainingMode.TRAINING
        if is_training
        else torch.onnx.TrainingMode.EVAL,
        do_constant_folding=True,
        opset_version=opset_version,
    )
    onnx_model = onnx.load_model_from_string(f.getvalue(), onnx.ModelProto)
    onnx.checker.check_model(onnx_model)

    model_simp, check = simplify(onnx_model)
    f.close()
    model_simp_shape = onnx.shape_inference.infer_shapes(model_simp)

    # Constant folding to simplify the ONNX
    graph = gs.import_onnx(model_simp_shape)
    graph.fold_constants().cleanup()

    onnx.save(
        gs.export_onnx(graph),
        os.path.join(
            result_dir, onnx_file_name
        ),
    )
    return model_simp_shape

I've been trying to solve this problem by examining the parts where add padding or perform slicing. Unfortunately, I haven't figured out the underlying reason behind it yet. Could you provide any help or insights?

model in deepcopy will get error.

hello, i try to use train.py and set --model-ema, it will have some error:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
How can I solve this problem? Looking forward to your reply!

different resolution between train and inference

Thanks for your nice work, I want to know if it is possible to train with a low resolution and inference with a high resolution by fasterViT? I just test create_model from any_res faster vit in resolution 256x256 and inference with other resolution 512x512 would meet this error: RuntimeError: shape '[-1, 2, 2, 4, 4, 512]' is invalid for input of size 73728

How to profile the DL training and inference

Hello, it's a wonderful work.

I'm interested in FasterViT profiling in Appendix H. I knew that the NVIDIA DLSIM was the internal tool.
Could you please introduce some open-source tools to profile latency, FLOPs, memory?
I always want to find a similar profiling tool.

Thanks a lot.

ModuleNotFoundError: No module named 'utils'

when run 'from fastervit import create_model', it got wrong with follows, how to solve it ? Thank you very much!
Traceback (most recent call last):
File "/remote-home/cs_cs_lhy/code/kaggle/bm/breast_cancer_classification_benign_malignantgai.py", line 29, in
from fastervit import create_model
File "/root/anaconda3/envs/kbm/lib/python3.10/site-packages/fastervit/init.py", line 1, in
from .models.registry import create_model
File "/root/anaconda3/envs/kbm/lib/python3.10/site-packages/fastervit/models/init.py", line 2, in
from .faster_vit_any_res import *
File "/root/anaconda3/envs/kbm/lib/python3.10/site-packages/fastervit/models/faster_vit_any_res.py", line 17, in
from utils.checkpoint import load_checkpoint
ModuleNotFoundError: No module named 'utils'

Training FasterViT on VisDrone2019 dataset

Hello,

I am trying to train FasterViT model on VisDrone2019 dataset for small object detection. Is there any resources that can guide me on this task. Is it possible to train FasterViT model to train on VisDrone2019 dataset? Does it provide any API or python code to train FasterViT model?

Any help is appreciated.

Issue with faster_vit_4_224 checkpoint

Hi,
Interesting paper, and thanks for maintaining the repo :)

There seems to be an issue with the faster_vit_4_224 checkpoint, specifically. For example:

from fastervit.models.faster_vit import faster_vit_4_224

model = faster_vit_4_224(pretrained=True)

will lead to the following size mismatch:

size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 53, 53]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.cpb_mlp.2.weight: copying a param with shape torch.Size([16, 512]) from checkpoint, the shape in current model is torch.Size([8, 512]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 16, 16]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.cpb_mlp.2.weight: copying a param with shape torch.Size([16, 512]) from checkpoint, the shape in current model is torch.Size([8, 512]).
...
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.cpb_mlp.2.weight: copying a param with shape torch.Size([32, 512]) from checkpoint, the shape in current model is torch.Size([16, 512]).

Perhaps the number of heads was incorrect for levels 3 and 4?

I've checked faster_vit_0~2 but they seem to be fine. But, there is still a size mismatch when using faster_vit_{0~6}_any_res with resolution of [224, 224]. Not sure why the size mismatch would occur when the resolution for _any_res is matched to the pretrain size.

Issue Converting FasterViT to CoreML

First off, I want to thank you for the incredible work on FasterViT. It's evident that it's a state-of-the-art model and its performance has been truly outstanding.

I'm writing to bring up an issue I encountered when attempting to convert FasterViT to CoreML. Given the model's promise for speed and power, it was surprising to find that there's no straightforward way to convert it to CoreML for deployment on Apple devices.

i got RuntimeError: PyTorch convert function for op 'bernoulli_' not implemented.

Is there a known workaround to successfully convert FasterViT to CoreML?
Are there any plans to provide support for CoreML conversion in the near future?
Having this capability would significantly broaden the potential deployment platforms for the model, especially in mobile and edge environments.

Thank you for your time and looking forward to your feedback.

Best regards,
Rotem

level-dimension

Thanks for your job,I used Segformer before in my work.FasterVit has a similiar dimension structure with its encoder.When i deal with image size 320*320.
In Segformer encoder,The dimension of image change is (1,3,320,320),s1:(1,64,80,80),s2:(1,128,40,40),s3:(1,256,20,20),s4:(1,512,10,10).
In faster_vit_0_any_res encoder,I add a print(x.shape) behind Class FasterVit()-(forward_features)-(x=level).
The dimension of image change is (1,3,320,320),s1:(1,128,40,40),s2:(1,256,20,20),s3:(1,512,10,10),s4:(1,512,10,10).
I don't know why s3&s4 have the same dimension and s1 H$W//4 should be (80,80).

And i find in PatchEmbed,the dimension is (1,64,80,80) not H$W//2
the first ConV-nn.Conv2d(in_chans, in_dim, 3, 2, 1, bias=False) is correct not (3,1,1) in fig3?

ImageNet-21K checkpoint without fine-tuning!

Congrats on the acceptance and great work!

Any chance you could provide an ImageNet-21K faster-vit checkpoint without ImageNet-1K fine-tuning.

I would be very interested to see how the ImageNet-21K-based learned features can be used for downstream tasks.

Thanks a lot and have a nice day!

N.A.

Paper error

In the 13 pages of the paper, Figure S.1. (d) FasterViT-3 (d) FasterViT-4
(d) FasterViT-4 should be (eοΌ‰FasterViT-4

How to determine the args.mesa ratio?

First, thank you for sharing your wonderful work as an open source. I appreciate your work.
I found in the TRAINING.md you provided that the args.mesa parameter settings differ for each model as follows:
FasterViT-0 : args.mesa 0.1
FasterViT-1 : args.mesa 0.2
FasterViT-2/3 : args.mesa 0.5
FasterViT-4/5/6 : args.mesa 5.0

I am curious about the criteria used to determine the mesa parameter for each model and whether the mesa-start-ratio is uniformly applied at 25% for all models.
Your response would be greatly appreciated. Thank you.

confusion about ct_dewindow

The dimensions of init ct and curent ct are different when ct_dewindow. How to solve it when mulit-scale training?

Loading pre-trained model

Hello, Thank you work sharing your work.

I'm attempting to load a pre-trained model using the following code:

from fastervit import create_model

model = create_model('faster_vit_0_224', pretrained = '/checkpoint_path/fastervit_0_224_1k.pth.tar')

However, I'm encountering an error message which suggests that the checkpoint I'm trying to load is incompatible with the model architecture : ' Unexpected key(s) in state_dict: "epoch", "arch", "state_dict", "optimizer", "version", "args", "amp_scaler", "metric"'. This typically occurs when the model architecture does not match the architecture of the pretrained model, which may be due to differences in the number or types of layers, layer parameters, naming of layers, or the number of output classes. Could you please assist me in resolving this issue?

Kind regards,

Model Evaluation

Hello,

I am trying to run inference test using FasterViT model (faster_vit_1_224_1k) and evaluate the model using metrics like IoU, mAP, Average Precision and Average Recall. I did not see the command or API to run the inference test and evaluate the model. Could you please provide me with the resources or guidelines to perform the model evaluation?

You help is highly appreciated.

Regards,
Bijay Shakya

Help

Hello, I want to know how to draw the Throughput / Accuracy diagram on the front page of your paper ? Look forward to reply !

Use FasterViT model like a backbone for segmentation model

Hi!

First of all, thank you for your perfect work.

I'm working on training a segmentation model in the mmseg pipeline and I'm wondering about:

  1. Which model is more suitable for the segmentation task classic fastervit or faster_vit_any_res?
  2. Are you going to create models for segmentation?

Thank you in advance!

Training Issue

Hello,

I was trying to train DINO_4scale_faster_vit_0_224 model on COCO 2017 dataset for object detection.The default dataset url /comp_robot/cv_public_dataset/COCO2017/does not work. So, I used the COCO 2017 dataset from Google Drive. When I train the model for the first pass, it works. But, when I train the model on the same training dataset after interuption. It is providing issue on training and few of the images get lost while training. The snippet of the output is attached:
error1

Could you provide me with the resource that elaborates the command to train and evaluate the FasterVIT DINO model for object detection?

I am currently using this python command in Google Collab Pro:

!python main.py --config_file ./config/DINO/DINO_4scale_faster_vit_0_224.py --coco_path ./FasterViT_Object_Detection/FasterViT/object_detection/coco-2017-dataset/coco2017 --num_workers=6 --amp --output_dir ./outputs/ --save_results --save_log

The structure of the dataset directory is as follows:

coco-2017-dataset

  • coco2017
    - train2017
    - test2017
    - val2017
    - annotations

I appreciate your guidance.

Regards,
Bijay

Allow for running HAT via seperate flag

Based on conversation in #56 , it is a good feature to allow running HAT also from a separate flag to avoid adding new layers when the goal is to only use a different resolution for inference.

A new MR could address this.

ImportError: cannot import name

Screenshot_31

Two weeks ago, i could create mode easily without any problem and smoothly with code based on FasterViT docs. Now im trying to use create model and there is import error attached in screenshot 'ImportError: cannot import name '_update_default_kwargs' from 'timm.models._builder' (/opt/conda/lib/python3.10/site-packages/timm/models/_builder.py)'.

Here is my code (same as docs):

from fastervit import create_model

fastermodel = create_model('faster_vit_0_224', 
                          pretrained=True,
                          model_path="/tmp/faster_vit_0.pth.tar")

And here is the error (detailed):

--------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[98], line 7
      1 from transformers import ViTForImageClassification
      3 model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224-in21k',
      4                                                   id2label=id2label,
      5                                                   label2id=label2id)
----> 7 from fastervit import create_model
      9 fastermodel = create_model('faster_vit_0_224', 
     10                           pretrained=True,
     11                           model_path="/tmp/faster_vit_0.pth.tar")

File /opt/conda/lib/python3.10/site-packages/fastervit/__init__.py:1
----> 1 from .models.registry import create_model

File /opt/conda/lib/python3.10/site-packages/fastervit/models/__init__.py:1
----> 1 from .faster_vit import *
      2 from .faster_vit_any_res import *
      4 from .registry import create_model

File /opt/conda/lib/python3.10/site-packages/fastervit/models/faster_vit.py:15
     13 from timm.models.registry import register_model
     14 from timm.models.layers import trunc_normal_, DropPath, LayerNorm2d
---> 15 from timm.models._builder import resolve_pretrained_cfg, _update_default_kwargs
     16 from .registry import register_pip_model
     17 from pathlib import Path

ImportError: cannot import name '_update_default_kwargs' from 'timm.models._builder' (/opt/conda/lib/python3.10/site-packages/timm/models/_builder.py)

Any idea how to solve this? Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.