Coder Social home page Coder Social logo

ailab-cvc / yolo-world Goto Github PK

View Code? Open in Web Editor NEW
3.5K 3.5K 333.0 2.06 MB

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Home Page: https://www.yoloworld.cc

License: GNU General Public License v3.0

Python 98.78% Shell 0.16% Dockerfile 0.21% CMake 0.19% C++ 0.66%

yolo-world's People

Contributors

capjamesg avatar digma avatar eltociear avatar greatv avatar hechenghui avatar jradikk avatar liuhuicnn avatar natanbagrov avatar onuralpszr avatar partheee avatar prashantdixit0 avatar shaswatpanda avatar skalskip avatar stevengrove avatar swalehmwadime avatar taofuyu avatar wondervictor avatar zsxkib avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yolo-world's Issues

config file

I am able to run the new model with CC3M-lite using the 2 yolo_world_l config files. Is there supposed to be a new config file for this new model or which of the 2 am i suppose to run?

huggingface在国内被墙

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like pretrained_models/clip-vit-base-patch32-projection is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1402) of binary: /usr/bin/python

huggingface在国内被墙了,服务器没有代理,我想要下载下来放到服务器上利用本地文件加载,请问需要修改哪些内容

Possible to run Gradio demo in Windows?

I cloned the repo, downloaded the weights file
yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth

But when I attempt to run the demo, I get this:

D:\Python Stuff\yolo-world>python demo.py configs/pretrain/yolo_world_l_dual_vlpan_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth
2024-02-15 18:53:13.530020: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-02-15 18:53:13.530312: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\Python Stuff\yolo-world\demo.py:145 in <module>                                               │
│                                                                                                  │
│   142 │   args = parse_args()                                                                    │
│   143 │                                                                                          │
│   144 │   # load config                                                                          │
│ ❱ 145 │   cfg = Config.fromfile(args.config)                                                     │
│   146 │   if args.cfg_options is not None:                                                       │
│   147 │   │   cfg.merge_from_dict(args.cfg_options)                                              │
│   148                                                                                            │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :459 in fromfile                                                                                 │
│                                                                                                  │
│    456 │   │   filename = str(filename) if isinstance(filename, Path) else filename              │
│    457 │   │   if lazy_import is False or \                                                      │
│    458 │   │      lazy_import is None and not Config._is_lazy_import(filename):                  │
│ ❱  459 │   │   │   cfg_dict, cfg_text, env_variables = Config._file2dict(                        │
│    460 │   │   │   │   filename, use_predefined_variables, use_environment_variables,            │
│    461 │   │   │   │   lazy_import)                                                              │
│    462 │   │   │   if import_custom_modules and cfg_dict.get('custom_imports', None):            │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :943 in _file2dict                                                                               │
│                                                                                                  │
│    940 │   │   except Exception as e:                                                            │
│    941 │   │   │   if osp.exists(temp_config_dir):                                               │
│    942 │   │   │   │   shutil.rmtree(temp_config_dir)                                            │
│ ❱  943 │   │   │   raise e                                                                       │
│    944 │   │                                                                                     │
│    945 │   │   # check deprecation information                                                   │
│    946 │   │   if DEPRECATION_KEY in cfg_dict:                                                   │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :885 in _file2dict                                                                               │
│                                                                                                  │
│    882 │   │   │   │   │   │   temp_config_file.name):                                           │
│    883 │   │   │   │   │   base_cfg_path, scope = Config._get_cfg_path(                          │
│    884 │   │   │   │   │   │   base_cfg_path, filename)                                          │
│ ❱  885 │   │   │   │   │   _cfg_dict, _cfg_text, _env_variables = Config._file2dict(             │
│    886 │   │   │   │   │   │   filename=base_cfg_path,                                           │
│    887 │   │   │   │   │   │   use_predefined_variables=use_predefined_variables,                │
│    888 │   │   │   │   │   │   use_environment_variables=use_environment_variables,              │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :841 in _file2dict                                                                               │
│                                                                                                  │
│    838 │   │   Returns:                                                                          │
│    839 │   │   │   Tuple[dict, str]: Variables dictionary and text of Config.                    │
│    840 │   │   """                                                                               │
│ ❱  841 │   │   if lazy_import is None and Config._is_lazy_import(filename):                      │
│    842 │   │   │   raise RuntimeError(                                                           │
│    843 │   │   │   │   'The configuration file type in the inheritance chain '                   │
│    844 │   │   │   │   'must match the current configuration file type, either '                 │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :1657 in _is_lazy_import                                                                         │
│                                                                                                  │
│   1654 │   def _is_lazy_import(filename: str) -> bool:                                           │
│   1655 │   │   if not filename.endswith('.py'):                                                  │
│   1656 │   │   │   return False                                                                  │
│ ❱ 1657 │   │   with open(filename, encoding='utf-8') as f:                                       │
│   1658 │   │   │   codes_str = f.read()                                                          │
│   1659 │   │   │   parsed_codes = ast.parse(codes_str)                                           │
│   1660 │   │   for node in ast.walk(parsed_codes):                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Python
Stuff\\yolo-world\\configs\\pretrain\\../../third_party/mmyolo/configs/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco.py'

I checked Huggingface repo from the link:
https://huggingface.co/spaces/stevengrove/YOLO-World/tree/main

and found that there are much more files in mmyolo folder to download. Could you please clarify what I should download to make it work?
Thanks!

image_demo.py

I have a problem when running image_demo.py

Loads checkpoint by local backend from path: yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth
02/16 23:06:23 - mmengine - INFO - Load checkpoint from yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth
[ ] 0/1, elapsed: 0s, ETA:/home/kemove/anaconda3/envs/YOLO_World/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "image_demo.py", line 154, in
inference_detector(runner,
File "image_demo.py", line 78, in inference_detector
pred_instances.scores.float() > score_thr]
TypeError: '>' not supported between instances of 'Tensor' and 'str'

Improved visualisation of results with Supervision

Hi 👋🏻 First of all, I am very impressed with your model!

I noticed you have supervision in the dependency list, but I don't see it used in the code. I am the creator of this package. Would you be willing to accept a PR updating the visualization of the results?

If you have any other ideas on how to use supervision, I am also very open to help.

TensorRT

Thanks for your awsome work. Could you release the TensorRT version of YOLO-World? The inference time of YOLO-World onnx is kind of slow.

How to extract object nouns from caption?

From your paper, you propose a module used to extract object nouns from captions provided by the user. It's very interesting, but I have not found this processor in your demo. Can you provide it?
Thanks very much.

Error during Evaluation

While i was running the evaluation script, i ended up with the following error. I tried signing into hugging face using this command huggingface-cli login, still ended up with the same error.

Repository Not Found for url: https://huggingface.co/pretrained_models/clip-vit-base-patch32-projection/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.

Also with this error,

OSError: pretrained_models/clip-vit-base-patch32-projection is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>
Traceback (most recent call last):
File "/home/mu480317/.conda/envs/yoloworldnew/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
response.raise_for_status()
File "/home/mu480317/.conda/envs/yoloworldnew/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/pretrained_models/clip-vit-base-patch32-projection/resolve/main/tokenizer_config.json

My virtual env has,

Pytorch 2.0.0
CUDA 11.8
Transformers 4.37.2
huggingface-hub 0.20.3

Please help me solve this error. Thanks !!

询问

配置文件中的class_text_path:ata/captions/coco_class_captions.json是自己处理得到的么?可以给出一个样例么?谢谢

如何微调出适合自己的数据集?

作者你好,一项十分出色的工作!太牛了!想请教您如何在预训练模型的基础上去微调出适合自己的模型?简单来说是,使用少量的数据进行微调,然后利用微调出的模型去标注类似大量的数据

Some undefined keys leading to error while finetuning

Hi, I am trying to finetune the model on my object detection dataset. I am using the config for Efficient YOLO-World with the no-mask annotation _base_.

In the finetune config file yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py, there are some fields such as _base_.copypaste_prob or _base_.min_area_ratio that is not present in the base file yolov8_l_syncbn_fast_8xb16-500e_coco.py.

I am currently unable to run the finetune scripts due to the errors caused by these keys. Can you provide some methods to tackle this problem? If any more context is needed, please let me know.

change texts in demo.py web ui, can't work.

texts on gradio web ui only work once, but restart the server cant work correctly.
I wonder if it's a mechanism,
or is there a way to reset the runner status to load new texts prompt ?

yoloworld

Support Torch==2.0.0?

Hey Authors,
Thanks for the excellent job! I am wandering whether your project can support pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8? Or in general any PyTorch>2.0.0? I believe the runtime efficiency and compatibility of PyTorch>2.0.0 is way more better for python >= 3.10

Do looking forward to deploying your model, instead of grounding-DINO, in the real-time on-board system.

Thanks!

Can't create an instance of YOLOWorldDetector

I'm having problems with creating an instance of YOLOWorldDetector due to the model not being present in the mmengine registry. I'm using the model configuration from the file: https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py. I think it could be caused by different versions of mmengine, mmdet or mmyolo, but it's not clear which versions of the libraries you need from the requirements.txt file. I also couldn't run the YOLO-World without also installing mmengine and mmcv, which isn't mentioned in there either. This is the error I get:

File "scripts/yolo_world_db.py", line 93, in _load_model
runner = Runner.from_cfg(model_dict)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
runner = cls(
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in init
self.model = self.build_model(model)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model
model = MODELS.build(model)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
raise KeyError(
KeyError: 'YOLOWorldDetector is not in the mmengine::model registry. Please check whether the value of YOLOWorldDetector is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

Inference on test image

Is there a sample inference script on a test image?
With NMS, number of boxes, and text parameters?

ONNX export questions

Hello i've been trying to toy with the onnx export from the huggingface demo
i spinned a quick ORT test code but seems to get a problem when executing :

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running NonMaxSuppression node. Name:'/NonMaxSuppression' Status Message: non_max_suppression.cc:91 onnxruntime::NonMaxSuppressionBase::PrepareCompute boxes and scores should have same spatial_dimension.

Seems like the box and score lists are mismatched in length during the NMS so this should be very close to the end of the execution so i assume that if i messed up with the inputs the model should have crashed much earlier

Can you provide some information if you have any idea what's going on ? I can provide more info on the test code if needed but it's very basic (load image, resize to 640x640, reshape in 1x3x640x640, create input dict with "images" as input name and run the model)
I tried with and without standard RGB normalization as i wasn't sure if we had to do it but both give the same error.

Deployment Options / Instructions

Very impressive work! It would be great if this could be published to HuggingFace as a model as well, allowing easy API deployments.
HuggingFace doesn't allow deploying the current one.
or at least some instructions on how to deploy the ONNX/PyTorch pre-trained model in a minimal setup.

I had find a workable Dockerfile to run the demo, FYI

Dockerfile Content

assume file save in ./docker/Dockerfile

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

RUN apt-get update && apt-get install -y software-properties-common && rm -rf /var/lib/apt/lists/*
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update && apt-get install -y --no-install-recommends \
    libcurl4-openssl-dev \
    wget \
    vim \
    zlib1g-dev \
    git \
    pkg-config \
    sudo \
    ssh \
    libssl-dev \
    pbzip2 \
    pv \
    bzip2 \
    unzip \
    devscripts \
    lintian \
    fakeroot \
    dh-make \
    libgl1-mesa-glx \
    python3 \
    python3-pip \
    python3-dev \
    python3-wheel \
    gcc \
    g++ \
    && cd /usr/local/bin \
    && ln -s /usr/bin/python3 python \
    && ln -s /usr/bin/pip3 pip \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
RUN pip install torch torchvision torchaudio --no-cache-dir --index-url https://download.pytorch.org/whl/cu118
ENV FORCE_CUDA="1"
ENV MMCV_WITH_OPS=1
RUN pip install gradio --no-cache-dir -i  https://pypi.tuna.tsinghua.edu.cn/simple

Build Scripts

docker build -f ./docker/Dockerfile -t yolo_world:0.1 .

Run Demo

# start container in code dir
docker run -it --runtime=nvidia -p 8080:8080 -v ${PWD}:${PWD} -w ${PWD} yolo_world:0.1 bash
# install deps
pip3 install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple
# download pretrain model

# start demo
python3 ./demo.py ./configs/pretrain/yolo_world_m_dual_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py  ./model/yolo_world_m_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-2b7bd1be.pth 

visit web

http://0.0.0.0:8080/

Open for Contributions

Hii @StevenGrove and YOLO-World team,
Great work. I tried demo.py worked awesome.
But I see the scope for lots of integration and improvements. Are you guys open to Community Contributions?

What is the difference between supervised object detection and closed (fixed) vocabulary detection ??

Hello. I've got a question regarding article about fine-tuning on custom dataset.

I would like to finetune YOLO-World on my custom dataset.

Here, I am just wondering if I have to name my custom classes for fine-tuning.

If so, what would it differ from original supervised object detection which is required to label custom classes??

Could anyone provide assistance with this?? Thank you in advance :)

ONNX export

I was trying to export pytorch model to onnx model using the export_model function in huggingface spaces demo file:

https://huggingface.co/spaces/stevengrove/YOLO-World/blob/main/tools/demo.py#L92

However, doing so results in the error

torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive_max_pool2d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues  [Caused by the value '1303 defined in (%1303 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 3  3 [ CPULongType{2} ]]()

I was confused on how should i proceed here.

Pre-trained model ckpt file: yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth

And config file used for runner loading: HERE

有个问题想请教。

groundingdino的推理是同同时需要对文本和图像 编码,是同时进行的,因为要用到交叉注意力的计算。而yolo-world是离线方式的,但是不是很明白论文中的离线方式是什么?我的理解是:不用同时对文本和图像处理,在推理过程中,文本和图像不是同时进行的。我可以这么理解吗?

ONNX input

Currently, while exporting the YOLOWorldDetector model to onnx format, the text input for the model is locked for all possible input images at inference time.
Can onnx model weights where the model accepts both image and text as input be shared?
Otherwise can the pytorch to onnx conversion script be modified s.t. it is also possible to provide text (tokenized input ids and attention mask) as input to onnx model at inference time.

FileNotFoundError with demo.py

Ran into a bit of trouble trying to get the demo running with one of the finetune_coco configs. I'm getting a FileNotFoundError when I run the demo script. Looks like it's missing a file from the mmyolo configs.

  • Cloned the repo and set everything up as per the instructions.
  • Tried running the demo with this command:
python demo.py YOLO-World-master/configs/finetune_coco/yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py YOLO-World-master/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth 
  • And hit this error:
FileNotFoundError: [Errno 2] No such file or directory: 'YOLO-World-master/configs/finetune_coco/../../third_party/mmyolo/configs/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py'

Any ideas on what's going on or how to fix this? Would really appreciate the help!

Thanks!

TypeError: __init__() got an unexpected keyword argument 'lines'

运行demo.py时候遇见的这个问题,请问怎么解决?使用这个命令
root@zmj:/build/YOLO-World# python demo.py configs/pretrain/yolo_world_l_dual_vlpan_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth
报错如下
7a878f80-3ac6-4aa3-ade4-3e96ed80409a

Segmentation results

Could the demo output the segmentation results rather than the detection bounding box

How to run the demo?

I have create the conda env, download the yolo_world_l_clip_base_dua....pth

Then when I run the command follow the steps in README.md:

python3 demo.py configs/pretrain/yolo_world_s_dual_vlpan_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py  yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth

or

python3 demo.py configs/pretrain/yolo_world_l_dual_vlpan_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_val.py  yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth

Error occurs:

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like pretrained_models/clip-vit-base-patch32-projection is not the path to a directory containing a file named config.json.

So, should I download pretrained_models/clip-vit-base-patch32-projection first? then where to get it?

很赞的工作&与detic的比较

感谢作者,这是一项很赞的工作!!!我想从detect everything的角度上来使用这个模型,所以选取了同一张测试图片,与detic做了推理上的比较,有两个问题:
1.yolo-world的检测confidence会稍微有些低,但这不影响使用
2.yolo-world在这张图片上检出率会稍微差一些,是因为detic是大模型的缘故?
ceshi2
and i use 'cone' world set confidence to 0.01
image

image
image

Deployment Issue

Thank you very much for the excellent work done by the authors! I have a few questions and would like to discuss them:

  1. When attempting to perform inference using the exported ONNX model with ort, I encountered the following error:
Traceback (most recent call last):
  File "main_onnxruntime.py", line 111, in <module>
    main()
  File "main_onnxruntime.py", line 86, in main
    decoder_outputs = decoder(
  File "/home/cvhub/workspace/projects/python/detection/YOLO-World/yolo_world/easydeploy/examples/numpy_coder.py", line 43, in __call__
    feats = [
  File "/home/cvhub/workspace/projects/python/detection/YOLO-World/yolo_world/easydeploy/examples/numpy_coder.py", line 44, in <listcomp>
    np.ascontiguousarray(feat[0].transpose(1, 2, 0))
ValueError: axes don't match array

The relevant command used for running is:

python main_onnxruntime.py /home/cvhub/workspace/projects/python/detection/YOLO-World/third_party/mmyolo/demo/dog.jpg /home/cvhub/workspace/projects/python/detection/YOLO-World/work_dirs/yolow-l.onnx --type YOLOV5

Upon observation, the inference process seems normal, but there appears to be an issue with decoding.

  1. How can open vocabulary detection be supported? I noticed that the provided demo and running examples are based on the 80 classes of the COCO dataset for detection results.

Question about pretrained weights

Hi, YOLO-World Team!
Big shoutout to the team for bringing such an excellent work!🚀 bringing open-vocabulary detector to real-time world! Thanks! 😄
I'm a core maintainer and ML engineer of Ultralytics YOLOv8, recently I'm trying to migrate YOLO-World weights into our YOLOv8 repo.

I've got really close to it. However today I found the weights from the Hugging Face/YOLO-World repo is kind of different from the ones in current github/YOLO-World repo.
pic-240207-1122-35
pic-240207-1122-30
From the mAP tables it seems like the ones lying in github page are better, but just to confirm, I'd like to ask which one is the major(better) weights and what's the difference between them. Thanks!

Export TensortRT for triton server

First thing I want to say thanks so much to the author for this work!
Could we export YOLO-World to ONNX or TensorRT now?
Thank you in advance!

训练显存问题

请问一下你们在用mmyolo和mmdet训练yolo world模型的时候有没有遇到显存增长的问题。我使用mmdet中的rtmdet构建了类似方案,但训练时存在显存泄露,

关于微调

您好,我在调试train.py的过程中,runner = Runner.from_cfg(cfg)会卡住无法向下运行,请问有什么解决办法吗?还有一个问题就是我在自己的数据集(10000张图,只有一类baby)上进行微调时,发现定位loss一直是0,几个epoch后分类loss也降为0,但是我训练了40个epoch后,用image_demo进行测试发现无法检测到目标(图中的目标一个都检测不到),想问下您有没有一些建议。

Running demo.py on my local

I got the error below when I run the demo.py file:

usage: demo.py [-h] [--work-dir WORK_DIR] [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] config checkpoint
demo.py: error: the following arguments are required: config, checkpoint

缺少脚本

python setup.py build develop

报错:
FileNotFoundError: [Errno 2] No such file or directory: 'yolo_world/version.py'

置信度低的问题该怎么解决呢

image
如果通过置信度去判断是否是该物体,那么很多都是比较低的分数,那么该怎么优化呢
annotations = {
"tv": 13.4,
"laptop": 26.7,
"keyboard": 20.9,
"dining table": 13.1,
"chair": 76.6,
"potted plant": 44.2
}
annotations={
“电视”:13.4,
“笔记本电脑”:26.7,
“键盘”:20.9,
“餐桌”:13.1,
“椅子”:76.6,
“盆栽”:44.2
}

`UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 25` while installing in Colab

Hi 👋🏻 I was trying to install YOLO-World in Google Colab, but unfortunately, an error appears after executing the python setup.py build develop command. link

Installed /usr/local/lib/python3.10/dist-packages/semantic_version-2.10.0-py3.10.egg
Searching for ruff>=0.1.7
Reading https://pypi.org/simple/ruff/
Downloading https://files.pythonhosted.org/packages/07/1e/fa1c65330787f08e73980e8401f7996882f0c556975f0cb31ef742b9908a/ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=3826fb34c144ef1e171b323ed6ae9146ab76d109960addca730756dc19dc7b22
Best match: ruff 0.2.0
Processing ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Installing ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl to /usr/local/lib/python3.10/dist-packages
Adding ruff 0.2.0 to easy-install.pth file
Traceback (most recent call last):
  File "/content/YOLO-World/setup.py", line 163, in <module>
    setup(
  File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 130, in install_for_development
    self.process_distribution(None, self.dist, not self.no_deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 750, in process_distribution
    distros = WorkingSet([]).resolve(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 827, in resolve
    dist = self._resolve_dist(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 863, in _resolve_dist
    dist = best[req.key] = env.best_match(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1133, in best_match
    return self.obtain(req, installer)
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1145, in obtain
    return installer(requirement)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 677, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 705, in install_item
    self.process_distribution(spec, dist, deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 731, in process_distribution
    self.install_egg_scripts(dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 152, in install_egg_scripts
    return easy_install.install_egg_scripts(self, dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 607, in install_egg_scripts
    dist.get_metadata('scripts/' + script_name)
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1519, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 25: invalid start byte in scripts/ruff file at path: /usr/local/lib/python3.10/dist-packages/ruff-0.2.0-py3.10-linux-x86_64.egg/EGG-INFO/scripts/ruff

Finetuning

I know it only will be released but I would like to fine-tune this on a very small dataset.

I have two questions:

Is there any good example of how a dataset should look like?
Is it possible to fine-tune it on a just one GPU with 40GB VRAM ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.