Coder Social home page Coder Social logo

nvidia-ai-iot / nanoowl Goto Github PK

View Code? Open in Web Editor NEW
232.0 4.0 42.0 21.91 MB

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.

License: Apache License 2.0

Dockerfile 0.46% Shell 0.60% Python 98.94%
detect fast inference jetson jetson-agx-orin jetson-orin-nano nvidia real-time tensorrt transformers

nanoowl's Introduction

NanoOWL

๐Ÿ‘ Usage - โฑ๏ธ Performance - ๐Ÿ› ๏ธ Setup - ๐Ÿคธ Examples
- ๐Ÿ‘ Acknowledgment - ๐Ÿ”— See also

NanoOWL is a project that optimizes OWL-ViT to run ๐Ÿ”ฅ real-time ๐Ÿ”ฅ on NVIDIA Jetson Orin Platforms with NVIDIA TensorRT. NanoOWL also introduces a new "tree detection" pipeline that combines OWL-ViT and CLIP to enable nested detection and classification of anything, at any level, simply by providing text.

Interested in detecting object masks as well? Try combining NanoOWL with NanoSAM for zero-shot open-vocabulary instance segmentation.

๐Ÿ‘ Usage

You can use NanoOWL in Python like this

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.1)

print(output)

Or better yet, to use OWL-ViT in conjunction with CLIP to detect and classify anything, at any level, check out the tree predictor example below!

See Setup for instructions on how to build the image encoder engine.

โฑ๏ธ Performance

NanoOWL runs real-time on Jetson Orin Nano.

Model โ€  Image Size Patch Size โฑ๏ธ Jetson Orin Nano (FPS) โฑ๏ธ Jetson AGX Orin (FPS) ๐ŸŽฏ Accuracy (mAP)
OWL-ViT (ViT-B/32) 768 32 TBD 95 28
OWL-ViT (ViT-B/16) 768 16 TBD 25 31.7

๐Ÿ› ๏ธ Setup

  1. Install the dependencies

    1. Install PyTorch

    2. Install torch2trt

    3. Install NVIDIA TensorRT

    4. Install the Transformers library

      python3 -m pip install transformers
    5. (optional) Install NanoSAM (for the instance segmentation example)

  2. Install the NanoOWL package.

    git clone https://github.com/NVIDIA-AI-IOT/nanoowl
    cd nanoowl
    python3 setup.py develop --user
  3. Build the TensorRT engine for the OWL-ViT vision encoder

    mkdir -p data
    python3 -m nanoowl.build_image_encoder_engine \
        data/owl_image_encoder_patch32.engine
  4. Run an example prediction to ensure everything is working

    cd examples
    python3 owl_predict.py \
        --prompt="[an owl, a glove]" \
        --threshold=0.1 \
        --image_encoder_engine=../data/owl_image_encoder_patch32.engine

That's it! If everything is working properly, you should see a visualization saved to data/owl_predict_out.jpg.

๐Ÿคธ Examples

Example 1 - Basic prediction

This example demonstrates how to use the TensorRT optimized OWL-ViT model to detect objects by providing text descriptions of the object labels.

To run the example, first navigate to the examples folder

cd examples

Then run the example

python3 owl_predict.py \
    --prompt="[an owl, a glove]" \
    --threshold=0.1 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine

By default the output will be saved to data/owl_predict_out.jpg.

You can also use this example to profile inference. Simply set the flag --profile.

Example 2 - Tree prediction

This example demonstrates how to use the tree predictor class to detect and classify objects at any level.

To run the example, first navigate to the examples folder

cd examples

To detect all owls, and the detect all wings and eyes in each detect owl region of interest, type

python3 tree_predict.py \
    --prompt="[an owl [a wing, an eye]]" \
    --threshold=0.15 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine

By default the output will be saved to data/tree_predict_out.jpg.

To classify the image as indoors or outdoors, type

python3 tree_predict.py \
    --prompt="(indoors, outdoors)" \
    --threshold=0.15 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine

To classify the image as indoors or outdoors, and if it's outdoors then detect all owls, type

python3 tree_predict.py \
    --prompt="(indoors, outdoors [an owl])" \
    --threshold=0.15 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine

Example 3 - Tree prediction (Live Camera)

This example demonstrates the tree predictor running on a live camera feed with live-edited text prompts. To run the example

  1. Ensure you have a camera device connected

  2. Launch the demo

    cd examples/tree_demo
    python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine
  3. Second, open your browser to http://<ip address>:7860

  4. Type whatever prompt you like to see what works! Here are some examples

    • Example: [a face [a nose, an eye, a mouth]]
    • Example: [a face (interested, yawning / bored)]
    • Example: (indoors, outdoors)

๐Ÿ‘ Acknowledgement

Thanks to the authors of OWL-ViT for the great open-vocabluary detection work.

๐Ÿ”— See also

nanoowl's People

Contributors

burningion avatar jaybdub avatar ssmmoo1 avatar tokk-nv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanoowl's Issues

Instructions to train / fine tune on our own data

Hey

Thank you for releasing nanoowl, I think it's really helpful for my ongoing work. Is there a way to fine-tune the weights for my own data?

Instructions on how train / fine tune would be great!

Thank you

Torch2TRT not being found

I have put on a fresh install of JetPack 5.1.2 on my Seeed Orin NX 8GB using a J401 carrier board. I have installed Torch 2.1.0, Torchvision 0.16, and Torch2trt. I can import torch2trt in Python3 when I first start up the device. However, when I attempt to start NanoOWL, it cannot find torch2trt (ModuleNotFoundError: No module named 'torch2trt'). The command that causes this is predictor = OwlPredictor(args.model, image_encoder_engine=args.image_encoder_engine). Once I call this, torch2trt is no longer able to be imported to Python3, as it was before I called this command. Is OwlPredictor somehow changing paths to libraries? Thoughts?

TracerWarning๏ผŒbut the code process is killed, why

nx8@ubuntu:~/Downloads/nanoowl$ TRANSFORMERS_OFFLINE=1 \

python3 -m nanoowl.build_image_encoder_engine
./data/owl_image_encoder_patch32.engine

/home/nx/.local/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/nx/transformers/src/transformers/models/owlvit/modeling_owlvit.py:386: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/nx/transformers/src/transformers/models/owlvit/modeling_owlvit.py:429: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ๅทฒๆ€ๆญป

Please share the name of "post_layernorm" OwlPredictor.encode_image_torch for output-blob-names for deepstream configuration

in OwlPredictor.encode_image_torch

def encode_image_torch(self, image: torch.Tensor)
        vision_outputs = self.model.owlvit.vision_model(image)
        last_hidden_state = vision_outputs[0]
        image_embeds = self.model.owlvit.vision_model.post_layernorm(last_hidden_state)
        class_token_out = image_embeds[:, :1, :]
        image_embeds = image_embeds[:, 1:, :] * class_token_out
        image_embeds = self.model.layer_norm(image_embeds) 

Please share the name of post_layernorm to address output-blob-names in deepstream configuration file
this is my config file

property]
gpu-id=0
model-engine-file=/nanoowl_utils/data/owl_image_encoder_patch32.engine
process-mode=2
network-mode=2
net-scale-factor=0.0146
offsets=122.77;116.75;104.094
secondary-reinfer-interval=0
gie-unique-id=2
output-blob-names=LayerNorm
output-tensor-meta=1
network-type=1
operate-on-gie-id=1
operate-on-class-ids=2

Batch inference time

Hi,

We have successfully created batch version of the model using onnx and trt. We are trying this on a A10 GPU, here is what we have observed: for a batch of 16 we get 96ms inference time and if we run that in non-batch mode we are getting 224ms for 16 images.

I wanted to check these numbers with you and see if they make sense. Also, do you have a batch implementation that we can compare against?

Camera image not loading in live demo

I followed the instructions here https://www.jetson-ai-lab.com/vit/tutorial_nanoowl.html?=&linkId=100000237007328 on my Jetson device.

ls /dev/video* /dev/video0 /dev/video1

After port-forwarding to my local machine to open the demo in Chrome, the camera feed looks broken:

image

Here are the logs from my command:
root@jetson001:/opt/nanoowl/examples/tree_demo# python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine /usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:123: FutureWarning: Using TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOME instead. warnings.warn( config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4.42k/4.42k [00:00<00:00, 685kB/s] model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 613M/613M [00:31<00:00, 19.5MB/s] preprocessor_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 392/392 [00:00<00:00, 77.2kB/s] tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 775/775 [00:00<00:00, 703kB/s] vocab.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.06M/1.06M [00:00<00:00, 4.16MB/s] merges.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 525k/525k [00:00<00:00, 3.87MB/s] special_tokens_map.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 460/460 [00:00<00:00, 352kB/s] /usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] [01/22/2024-23:13:43] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 338M/338M [00:19<00:00, 18.5MiB/s] INFO:root:Opening camera. [ WARN:0] global /opt/opencv/modules/videoio/src/cap_gstreamer.cpp (1760) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module v4l2src0 reported: Internal data stream error. [ WARN:0] global /opt/opencv/modules/videoio/src/cap_gstreamer.cpp (888) open OpenCV | GStreamer warning: unable to start pipeline [ WARN:0] global /opt/opencv/modules/videoio/src/cap_gstreamer.cpp (480) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created INFO:root:Loading predictor. ======== Running on http://0.0.0.0:7860 ======== (Press CTRL+C to quit) [ WARN:1] global /opt/opencv/modules/videoio/src/cap_v4l.cpp (1004) tryIoctl VIDEOIO(V4L2:/dev/video0): select() timeout. INFO:root:handle_index_get INFO:aiohttp.access:127.0.0.1 [22/Jan/2024:23:16:25 +0000] "GET / HTTP/1.1" 200 235 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" INFO:root:Websocket connected. INFO:aiohttp.access:127.0.0.1 [22/Jan/2024:23:16:25 +0000] "GET /favicon.ico HTTP/1.1" 404 172 "http://localhost:9000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" INFO:root:handle_index_get INFO:aiohttp.access:127.0.0.1 [22/Jan/2024:23:16:57 +0000] "GET / HTTP/1.1" 304 176 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" INFO:aiohttp.access:127.0.0.1 [22/Jan/2024:23:16:25 +0000] "GET /ws HTTP/1.1" 101 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" INFO:root:Websocket connected. INFO:aiohttp.access:127.0.0.1 [22/Jan/2024:23:16:57 +0000] "GET /ws HTTP/1.1" 101 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" INFO:root:Websocket connected.

Bug: Single threshold results in single label

The first example in your readme (https://github.com/NVIDIA-AI-IOT/nanoowl/tree/main#-usage) implies that calling predict with a single threshold should apply that threshold to each class. However, it seems using a single threshold causes the model to instead ignore all but the first class.

The example from the readme with a lower threshold for demonstration purposes:

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.01, text_encodings=None)

print(output)

results in
OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 0, 0, 0]...

whereas

from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=[0.01, 0.01], text_encodings=None)

print(output)

results in
OwlDecodeOutput(labels=tensor([0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0]...

Instructions to run Docker image

Hey

Could the README.md be updated to have instructions to run the docker image? I see there is code already implemented and I could help contribute too

Thanks

Bug : Model can't detect all the labels in a given class

The second example in the readme (https://github.com/nv-vankit/nanoowl) says we're giving a single threshold for this single class. But however, It looks like the single threshold given to the single class doesn't detect the inner multi labels. Like in this example 2 prompt, we're having single class label "owl" and inside that we're detecting the region of interest "wing, eye". But given the code, it's only detecting wings, not eyes.

The example from the readme is as follows :
python3 tree_predict.py
--prompt="[an owl [a wing, an eye]]"
--threshold=0.15
--image_encoder_engine=../data/owl_image_encoder_patch32.engine

The output image has been attached.
tree_predict_out

nanoowl container downloads /root/.cache/clip/ViT-B-32.pt everytime

The ViT model file is not included in the docker image, so it will be downloaded when tree_demo is started.
I modified the Dockerfile as follows so that the created image contains the ViT model file:
diff --git a/packages/vit/nanoowl/Dockerfile b/packages/vit/nanoowl/Dockerfile
index 1254f00..c0b40e9 100644
--- a/packages/vit/nanoowl/Dockerfile
+++ b/packages/vit/nanoowl/Dockerfile
@@ -40,6 +40,9 @@ RUN cd /opt/nanoowl/examples/ &&
--threshold=0.1
--image_encoder_engine=../data/owl_image_encoder_patch32.engine

+RUN cd /opt/nanoowl/examples/ && \

  • python3 tree_predict.py

COPY benchmark.py /opt/nanoowl/

-WORKDIR /opt/nanoowl
\ No newline at end of file
+WORKDIR /opt/nanoowl

Image roi bug in OwlPredictor leads to cropped encoding

There is a bug in the OwlPredictor class as the image roi dimensions are incorrectly specified. This bug causes incorrect encoding of image regions of interest (rois) as the height and width are swapped. This effectively causes the images to get cropped before being fed to the encoder. Please check the attached images for an example of the input image and the encoded image with pad_square param set to true.

image

roi_image

data/owlvit-base-patch32-image-encoder.engine" not found

Hi @jaybdub ,
I have a question about populating the data folder with different engine files. How should one populate this folder? After installing all the dependencies, when I run the script below, I encounter FileNotFoundError error:


from nanoowl.owl_predictor import OwlPredictor

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine"
)

image = PIL.Image.open("assets/owl_glove_small.jpg")

output = predictor.predict(image=image, text=["an owl", "a glove"], threshold=0.1)

print(output)

The error is:
FileNotFoundError: [Errno 2] No such file or directory: 'data/owlvit-base-patch32-image-encoder.engine'

Thank you very much for your patience and help.

Best,
Ehsan

Implementing NanoOWL in nvInfer for DeepStream on Jetson

Hello NVIDIA-AI-IOT Team,

I have successfully followed the steps outlined in the NanoOWL tutorial on Jetson AI Lab, and have managed to get it functioning with various types of cameras.

I am currently exploring the integration of NanoOWL into a DeepStream pipeline on a Jetson device, specifically leveraging the nvInfer plugin. Given the real-time capabilities and advanced features of NanoOWL, it seems like a promising addition to a DeepStream application.

Could you provide insights or confirm if it's feasible to implement NanoOWL within the NVInfer plugin of a DeepStream pipeline? Furthermore, if this integration is possible, I would appreciate guidance on the configuration.

Additionally, are there any demos or examples available that showcase the integration of NanoOWL with DeepStream? Such resources would be incredibly helpful for understanding the implementation process and best practices.

Thank you for your time and assistance.

Error generating TensorRT engines

Hello!

I am using a docker image running JP5.1 to test nanoowl on the jetson Orin. I followed the instructions for generating the engine file but I am getting errors. I believe it could also be a TensorRT problem / memory issue. I tried downloading the latest torch2trt build, gave trtexec rwx permissions and set the workspace size to 2048 according to NVIDIA/TensorRT#1581 (comment)

the logs:

root@ultraviolet:/home/ultraviolet/nanoowl# python3 -m nanoowl.build_image_encoder_engine     data/owl_image_encoder_patch32.engine
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/transformers/models/owlvit/modeling_owlvit.py:401: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/owlvit/modeling_owlvit.py:444: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
============ Diagnostic Run torch.onnx.export version 2.0.0+nv23.05 ============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=/tmp/tmpjw0kswm6/image_encoder.onnx --saveEngine=data/owl_image_encoder_patch32.engine --fp16 --shapes=image:1x3x768x768 --workspace=2048
[10/19/2023-14:48:00] [I] === Model Options ===
[10/19/2023-14:48:00] [I] Format: ONNX
[10/19/2023-14:48:00] [I] Model: /tmp/tmpjw0kswm6/image_encoder.onnx
[10/19/2023-14:48:00] [I] Output:
[10/19/2023-14:48:00] [I] === Build Options ===
[10/19/2023-14:48:00] [I] Max batch: explicit batch
[10/19/2023-14:48:00] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/19/2023-14:48:00] [I] minTiming: 1
[10/19/2023-14:48:00] [I] avgTiming: 8
[10/19/2023-14:48:00] [I] Precision: FP32+FP16
[10/19/2023-14:48:00] [I] LayerPrecisions: 
[10/19/2023-14:48:00] [I] Calibration: 
[10/19/2023-14:48:00] [I] Refit: Disabled
[10/19/2023-14:48:00] [I] Sparsity: Disabled
[10/19/2023-14:48:00] [I] Safe mode: Disabled
[10/19/2023-14:48:00] [I] DirectIO mode: Disabled
[10/19/2023-14:48:00] [I] Restricted mode: Disabled
[10/19/2023-14:48:00] [I] Build only: Disabled
[10/19/2023-14:48:00] [I] Save engine: data/owl_image_encoder_patch32.engine
[10/19/2023-14:48:00] [I] Load engine: 
[10/19/2023-14:48:00] [I] Profiling verbosity: 0
[10/19/2023-14:48:00] [I] Tactic sources: Using default tactic sources
[10/19/2023-14:48:00] [I] timingCacheMode: local
[10/19/2023-14:48:00] [I] timingCacheFile: 
[10/19/2023-14:48:00] [I] Heuristic: Disabled
[10/19/2023-14:48:00] [I] Preview Features: Use default preview flags.
[10/19/2023-14:48:00] [I] Input(s)s format: fp32:CHW
[10/19/2023-14:48:00] [I] Output(s)s format: fp32:CHW
[10/19/2023-14:48:00] [I] Input build shape: image=1x3x768x768+1x3x768x768+1x3x768x768
[10/19/2023-14:48:00] [I] Input calibration shapes: model
[10/19/2023-14:48:00] [I] === System Options ===
[10/19/2023-14:48:00] [I] Device: 0
[10/19/2023-14:48:00] [I] DLACore: 
[10/19/2023-14:48:00] [I] Plugins:
[10/19/2023-14:48:00] [I] === Inference Options ===
[10/19/2023-14:48:00] [I] Batch: Explicit
[10/19/2023-14:48:00] [I] Input inference shape: image=1x3x768x768
[10/19/2023-14:48:00] [I] Iterations: 10
[10/19/2023-14:48:00] [I] Duration: 3s (+ 200ms warm up)
[10/19/2023-14:48:00] [I] Sleep time: 0ms
[10/19/2023-14:48:00] [I] Idle time: 0ms
[10/19/2023-14:48:00] [I] Streams: 1
[10/19/2023-14:48:00] [I] ExposeDMA: Disabled
[10/19/2023-14:48:00] [I] Data transfers: Enabled
[10/19/2023-14:48:00] [I] Spin-wait: Disabled
[10/19/2023-14:48:00] [I] Multithreading: Disabled
[10/19/2023-14:48:00] [I] CUDA Graph: Disabled
[10/19/2023-14:48:00] [I] Separate profiling: Disabled
[10/19/2023-14:48:00] [I] Time Deserialize: Disabled
[10/19/2023-14:48:00] [I] Time Refit: Disabled
[10/19/2023-14:48:00] [I] NVTX verbosity: 0
[10/19/2023-14:48:00] [I] Persistent Cache Ratio: 0
[10/19/2023-14:48:00] [I] Inputs:
[10/19/2023-14:48:00] [I] === Reporting Options ===
[10/19/2023-14:48:00] [I] Verbose: Disabled
[10/19/2023-14:48:00] [I] Averages: 10 inferences
[10/19/2023-14:48:00] [I] Percentiles: 90,95,99
[10/19/2023-14:48:00] [I] Dump refittable layers:Disabled
[10/19/2023-14:48:00] [I] Dump output: Disabled
[10/19/2023-14:48:00] [I] Profile: Disabled
[10/19/2023-14:48:00] [I] Export timing to JSON file: 
[10/19/2023-14:48:00] [I] Export output to JSON file: 
[10/19/2023-14:48:00] [I] Export profile to JSON file: 
[10/19/2023-14:48:00] [I] 
[10/19/2023-14:48:00] [I] === Device Information ===
[10/19/2023-14:48:00] [I] Selected Device: Orin
[10/19/2023-14:48:00] [I] Compute Capability: 8.7
[10/19/2023-14:48:00] [I] SMs: 8
[10/19/2023-14:48:00] [I] Compute Clock Rate: 1.3 GHz
[10/19/2023-14:48:00] [I] Device Global Memory: 30587 MiB
[10/19/2023-14:48:00] [I] Shared Memory per SM: 164 KiB
[10/19/2023-14:48:00] [I] Memory Bus Width: 128 bits (ECC disabled)
[10/19/2023-14:48:00] [I] Memory Clock Rate: 0.612 GHz
[10/19/2023-14:48:00] [I] 
[10/19/2023-14:48:00] [I] TensorRT version: 8.5.2
[10/19/2023-14:48:00] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 11612 (MiB)
[10/19/2023-14:48:03] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +285, now: CPU 574, GPU 11913 (MiB)
[10/19/2023-14:48:03] [I] Start parsing network model
[10/19/2023-14:48:03] [I] [TRT] ----------------------------------------------------------------
[10/19/2023-14:48:03] [I] [TRT] Input filename:   /tmp/tmpjw0kswm6/image_encoder.onnx
[10/19/2023-14:48:03] [I] [TRT] ONNX IR version:  0.0.8
[10/19/2023-14:48:03] [I] [TRT] Opset version:    16
[10/19/2023-14:48:03] [I] [TRT] Producer name:    pytorch
[10/19/2023-14:48:03] [I] [TRT] Producer version: 2.0.0
[10/19/2023-14:48:03] [I] [TRT] Domain:           
[10/19/2023-14:48:03] [I] [TRT] Model version:    0
[10/19/2023-14:48:03] [I] [TRT] Doc string:       
[10/19/2023-14:48:03] [I] [TRT] ----------------------------------------------------------------
[10/19/2023-14:48:03] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/19/2023-14:48:07] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/19/2023-14:48:08] [I] Finish parsing network model
[10/19/2023-14:48:09] [I] [TRT] ---------- Layers Running on DLA ----------
[10/19/2023-14:48:09] [I] [TRT] ---------- Layers Running on GPU ----------
[10/19/2023-14:48:09] [I] [TRT] [GpuLayer] COPY: /vision_model/Cast
[10/19/2023-14:48:09] [I] [TRT] [GpuLayer] CONVOLUTION: /vision_model/embeddings/patch_embedding/Conv
[10/19/2023-14:48:09] [I] [TRT] [GpuLayer] MYELIN: {ForeignNode[parent.model.owlvit.vision_model.embeddings.position_embedding.weight.../Concat]}
[10/19/2023-14:48:11] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +438, now: CPU 1457, GPU 12789 (MiB)
[10/19/2023-14:48:11] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +127, now: CPU 1540, GPU 12916 (MiB)
[10/19/2023-14:48:11] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/19/2023-14:49:21] [I] [TRT] Total Activation Memory: 32084996096
[10/19/2023-14:49:21] [I] [TRT] Detected 1 inputs and 5 output network tensors.
[10/19/2023-14:49:21] [I] [TRT] Total Host Persistent Memory: 2656
[10/19/2023-14:49:21] [I] [TRT] Total Device Persistent Memory: 0
[10/19/2023-14:49:21] [I] [TRT] Total Scratch Memory: 28402688
[10/19/2023-14:49:21] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 182 MiB, GPU 570 MiB
[10/19/2023-14:49:21] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 9 steps to complete.
[10/19/2023-14:49:21] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.264098ms to assign 7 blocks to 9 nodes requiring 30769664 bytes.
[10/19/2023-14:49:21] [I] [TRT] Total Activation Memory: 30769664
[10/19/2023-14:49:21] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[10/19/2023-14:49:21] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[10/19/2023-14:49:21] [W] [TRT] Check verbose logs for the list of affected weights.
[10/19/2023-14:49:21] [W] [TRT] - 163 weights are affected by this issue: Detected subnormal FP16 values.
[10/19/2023-14:49:21] [W] [TRT] - 66 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[10/19/2023-14:49:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +170, GPU +256, now: CPU 170, GPU 256 (MiB)
[10/19/2023-14:49:21] [E] Saving engine to file failed.
[10/19/2023-14:49:21] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=/tmp/tmpjw0kswm6/image_encoder.onnx --saveEngine=data/owl_image_encoder_patch32.engine --fp16 --shapes=image:1x3x768x768 --workspace=2048
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ultraviolet/nanoowl/nanoowl/build_image_encoder_engine.py", line 34, in <module>
    predictor.build_image_encoder_engine(
  File "/home/ultraviolet/nanoowl/nanoowl/owl_predictor.py", line 444, in build_image_encoder_engine
    return self.load_image_encoder_engine(engine_path, max_batch_size)
  File "/home/ultraviolet/nanoowl/nanoowl/owl_predictor.py", line 375, in load_image_encoder_engine
    with open(engine_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/owl_image_encoder_patch32.engine'
root@ultraviolet:/home/ultraviolet/nanoowl# 

any feedback or help is appreciated

Unable to stream camera on Jetson Orin NX 8GB, gstreamer warnings

I am attempting to run the NanoOWL examples on my Seeed ReComputer J4011 (Orin NX 8GB with J401 carrier board). I can get the container to work with the Owl test, but not the Tree test. Same results for this repo running directly on the Jetson, outside the container. I do not appear to be able to stream the live video at all through the code with the following errors:

(test_owl.py:3347): Gdk-CRITICAL **: 17:22:29.833: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
/home/user/.local/lib/python3.8/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3490.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:root:Opening camera.
[ WARN:0] global /tmp/build_opencv/opencv/modules/videoio/src/cap_gstreamer.cpp (1760) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module v4l2src0 reported: Internal data stream error.
[ WARN:0] global /tmp/build_opencv/opencv/modules/videoio/src/cap_gstreamer.cpp (888) open OpenCV | GStreamer warning: unable to start pipeline
[ WARN:0] global /tmp/build_opencv/opencv/modules/videoio/src/cap_gstreamer.cpp (480) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created
INFO:root:Loading predictor.
======== Running on http://0.0.0.0:7860 ========
(Press CTRL+C to quit)
[ WARN:1] global /tmp/build_opencv/opencv/modules/videoio/src/cap_v4l.cpp (1004) tryIoctl VIDEOIO(V4L2:/dev/video1): select() timeout.

This occurs on both camera ports. I am able to stream the two camera feeds with Gstreamer just fine through my own Python code.

Thoughts of where to diagnose? I can even built OpenCV with CUDA support for the device and has passed the basic tests.

data/owl_image_encoder_patch32.engine file does not exist in the main directory

While following the setup guide, I encountered an error when attempting to build the TensorRT engine for the OWL-ViT vision encoder using the provided command:
python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine

The error message received was:
FileNotFoundError: [Errno 2] No such file or directory: 'data/owl_image_encoder_patch32.engine'

suggesting data/owl_image_encoder_patch32.engine file does not exist in the main directory.

Thank you.

RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/tmp/pytorch/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch.

t-tech@ubuntu:~/nanoowl/examples$ python3 tree_predict.py \
    --prompt="[an owl [a wing, an eye]]" \
    --threshold=0.15 \
    --image_encoder_engine=../data/owl_image_encoder_patch32.engine
/home/t-tech/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/t-tech/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/home/t-tech/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /tmp/pytorch/aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "/home/t-tech/nanoowl/examples/tree_predict.py", line 51, in <module>
    output = predictor.predict(
  File "/home/t-tech/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/t-tech/nanoowl/nanoowl/tree_predictor.py", line 121, in predict
    owl_image_encodings[label_index] = self.owl_predictor.encode_rois(image_tensor, boxes[label_index])
  File "/home/t-tech/nanoowl/nanoowl/owl_predictor.py", line 267, in encode_rois
    roi_images, rois = self.extract_rois(image, rois, pad_square, padding_scale)
  File "/home/t-tech/nanoowl/nanoowl/owl_predictor.py", line 257, in extract_rois
    roi_images = roi_align(image, [rois], output_size=self.get_image_size())
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 236, in roi_align
    return _roi_align(input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned)
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 168, in _roi_align
    val = _bilinear_interpolate(input, roi_batch_ind, y, x, ymask, xmask)  # [K, C, PH, PW, IY, IX]
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 62, in _bilinear_interpolate
    v1 = masked_index(y_low, x_low)
  File "/home/t-tech/.local/lib/python3.10/site-packages/torchvision/ops/roi_align.py", line 55, in masked_index
    return input[
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/tmp/pytorch/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch. 

Fps can't reach the Performance on readme.

Hi,

I am trying to run this example on jetson agx orin.
Following the readme and run nvpmodel -0 , the fps only around 60.
Is there any trick to get the fps 95?
Or is there anyone to reach the fps 95?

Thanks

Request for steps to optimize original OWL-Vit to nanoowl

By optimising OWL-Vit to nanoowl, it makes it possible to do live detections, however the optimisations makes it hard for us to finetune the model on our domain specific dataset. Would you consider adding a section which teaches users how to use compatible OWL-Vit models from the original repo and optimise such that it can run faster which enables live detections?

Such a tutorial or guide will be much appreciated!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.