ailab-cvc / yolo-world Goto Github PK

View Code? Open in Web Editor NEW

3.5K 3.5K 333.0 2.06 MB

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Home Page: https://www.yoloworld.cc

License: GNU General Public License v3.0

Python 98.78% Shell 0.16% Dockerfile 0.21% CMake 0.19% C++ 0.66%

yolo-world's People

Contributors

Stargazers

Watchers

Forkers

liuzl orefaleoluwayinka yinpeidai eltociear suryatmodulus joberzheng ai-jie01 gongjizhang sunwood-ai-labs perfyperfect petercao zhouzg0226 lucky-yym taofuyu zi-ang-cao seokmin-oh vertyxzz saulocatharino charliechap3 aakashapoorv gayesha12 ninyuu xiaohongqi99 jiangwen0105 wangxuebing0906 limeng101192 yaoqian0616 yanxg cv-det tspannhw haikuoxin laughing-q i-saw neo-oit-nakanosima jie311 thanhpham1987 mexicanamerican uurcelikk onuralpszr yumianhuli1 decentralizedbug rhinojosa maheshmouli arpg talkam f901107 touristshaun nozium eichi7 kp-forks kimwoonggon skyceeee skalskip zsxkib jordanesikati alialemimatinpour gyanachand1 sweezin palaei gmavaliani sorokinvld chirpient-nepheway bagotoxic-y idealed-godzillayellow agriffondirty vangelical 40versinda nonfutouz gament7chattymp 48-itectinq mirth izederry8 kroolspicenogginne agediava64 b-izederry babixzbabydeckfunk jjhw yesmessages-trust4me surrealsleekn ksinceringe mahopenta-mercyfeline sporkseneta vital121 hiroking0523 shengshenlan yinho999 evelynmitchell jjiggyou mowervie istorywar s-fource udogger zdanovic binderpost-b mandralg91 crescenturer7 danglive bozsyineedmoon weblogik-deluxeni john-rice

yolo-world's Issues

What A Great Job！

Interested in this work, when to open source?

config file

I am able to run the new model with CC3M-lite using the 2 yolo_world_l config files. Is there supposed to be a new config file for this new model or which of the 2 am i suppose to run?

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like pretrained_models/clip-vit-base-patch32-projection is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1402) of binary: /usr/bin/python

huggingface在国内被墙了，服务器没有代理，我想要下载下来放到服务器上利用本地文件加载，请问需要修改哪些内容

Clicking Clear will cause an Error

Possible to run Gradio demo in Windows?

I cloned the repo, downloaded the weights file
yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth

But when I attempt to run the demo, I get this:

D:\Python Stuff\yolo-world>python demo.py configs/pretrain/yolo_world_l_dual_vlpan_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth
2024-02-15 18:53:13.530020: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-02-15 18:53:13.530312: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\Python Stuff\yolo-world\demo.py:145 in <module>                                               │
│                                                                                                  │
│   142 │   args = parse_args()                                                                    │
│   143 │                                                                                          │
│   144 │   # load config                                                                          │
│ ❱ 145 │   cfg = Config.fromfile(args.config)                                                     │
│   146 │   if args.cfg_options is not None:                                                       │
│   147 │   │   cfg.merge_from_dict(args.cfg_options)                                              │
│   148                                                                                            │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :459 in fromfile                                                                                 │
│                                                                                                  │
│    456 │   │   filename = str(filename) if isinstance(filename, Path) else filename              │
│    457 │   │   if lazy_import is False or \                                                      │
│    458 │   │      lazy_import is None and not Config._is_lazy_import(filename):                  │
│ ❱  459 │   │   │   cfg_dict, cfg_text, env_variables = Config._file2dict(                        │
│    460 │   │   │   │   filename, use_predefined_variables, use_environment_variables,            │
│    461 │   │   │   │   lazy_import)                                                              │
│    462 │   │   │   if import_custom_modules and cfg_dict.get('custom_imports', None):            │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :943 in _file2dict                                                                               │
│                                                                                                  │
│    940 │   │   except Exception as e:                                                            │
│    941 │   │   │   if osp.exists(temp_config_dir):                                               │
│    942 │   │   │   │   shutil.rmtree(temp_config_dir)                                            │
│ ❱  943 │   │   │   raise e                                                                       │
│    944 │   │                                                                                     │
│    945 │   │   # check deprecation information                                                   │
│    946 │   │   if DEPRECATION_KEY in cfg_dict:                                                   │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :885 in _file2dict                                                                               │
│                                                                                                  │
│    882 │   │   │   │   │   │   temp_config_file.name):                                           │
│    883 │   │   │   │   │   base_cfg_path, scope = Config._get_cfg_path(                          │
│    884 │   │   │   │   │   │   base_cfg_path, filename)                                          │
│ ❱  885 │   │   │   │   │   _cfg_dict, _cfg_text, _env_variables = Config._file2dict(             │
│    886 │   │   │   │   │   │   filename=base_cfg_path,                                           │
│    887 │   │   │   │   │   │   use_predefined_variables=use_predefined_variables,                │
│    888 │   │   │   │   │   │   use_environment_variables=use_environment_variables,              │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :841 in _file2dict                                                                               │
│                                                                                                  │
│    838 │   │   Returns:                                                                          │
│    839 │   │   │   Tuple[dict, str]: Variables dictionary and text of Config.                    │
│    840 │   │   """                                                                               │
│ ❱  841 │   │   if lazy_import is None and Config._is_lazy_import(filename):                      │
│    842 │   │   │   raise RuntimeError(                                                           │
│    843 │   │   │   │   'The configuration file type in the inheritance chain '                   │
│    844 │   │   │   │   'must match the current configuration file type, either '                 │
│                                                                                                  │
│ C:\Users\Max\AppData\Local\Programs\Python\Python310\lib\site-packages\mmengine\config\config.py │
│ :1657 in _is_lazy_import                                                                         │
│                                                                                                  │
│   1654 │   def _is_lazy_import(filename: str) -> bool:                                           │
│   1655 │   │   if not filename.endswith('.py'):                                                  │
│   1656 │   │   │   return False                                                                  │
│ ❱ 1657 │   │   with open(filename, encoding='utf-8') as f:                                       │
│   1658 │   │   │   codes_str = f.read()                                                          │
│   1659 │   │   │   parsed_codes = ast.parse(codes_str)                                           │
│   1660 │   │   for node in ast.walk(parsed_codes):                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Python
Stuff\\yolo-world\\configs\\pretrain\\../../third_party/mmyolo/configs/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco.py'

I checked Huggingface repo from the link:
https://huggingface.co/spaces/stevengrove/YOLO-World/tree/main

and found that there are much more files in mmyolo folder to download. Could you please clarify what I should download to make it work?
Thanks!

image_demo.py

I have a problem when running image_demo.py

Loads checkpoint by local backend from path: yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth
02/16 23:06:23 - mmengine - INFO - Load checkpoint from yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth
[ ] 0/1, elapsed: 0s, ETA:/home/kemove/anaconda3/envs/YOLO_World/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "image_demo.py", line 154, in
inference_detector(runner,
File "image_demo.py", line 78, in inference_detector
pred_instances.scores.float() > score_thr]
TypeError: '>' not supported between instances of 'Tensor' and 'str'

预训练权重无法下载

预训练权重下载出现这个问题，请问怎么解决？

couldn't find pretrained_models/clip-vit-base-patch32-projection

Hi，
In config : text model name is 'pretrained_models/clip-vit-base-patch32-projection'.
But I can't load tokenizer or clip_model by model name.
I look into huggingface.co, can't find the model.
Can you fix it ?

Improved visualisation of results with Supervision

Hi 👋🏻 First of all, I am very impressed with your model!

I noticed you have supervision in the dependency list, but I don't see it used in the code. I am the creator of this package. Would you be willing to accept a PR updating the visualization of the results?

If you have any other ideas on how to use supervision, I am also very open to help.

TensorRT

Thanks for your awsome work. Could you release the TensorRT version of YOLO-World? The inference time of YOLO-World onnx is kind of slow.

How to extract object nouns from caption?

From your paper, you propose a module used to extract object nouns from captions provided by the user. It's very interesting, but I have not found this processor in your demo. Can you provide it?
Thanks very much.

Error during Evaluation

While i was running the evaluation script, i ended up with the following error. I tried signing into hugging face using this command huggingface-cli login, still ended up with the same error.

Repository Not Found for url: https://huggingface.co/pretrained_models/clip-vit-base-patch32-projection/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.

Also with this error,

OSError: pretrained_models/clip-vit-base-patch32-projection is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>
Traceback (most recent call last):
File "/home/mu480317/.conda/envs/yoloworldnew/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
response.raise_for_status()
File "/home/mu480317/.conda/envs/yoloworldnew/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/pretrained_models/clip-vit-base-patch32-projection/resolve/main/tokenizer_config.json

My virtual env has,

Pytorch 2.0.0
CUDA 11.8
Transformers 4.37.2
huggingface-hub 0.20.3

Please help me solve this error. Thanks !!

询问

配置文件中的class_text_path:ata/captions/coco_class_captions.json是自己处理得到的么？可以给出一个样例么？谢谢

找不到配置文件

新的readme中好像没有创建third_party文件夹这一步

如何微调出适合自己的数据集？

作者你好，一项十分出色的工作！太牛了！想请教您如何在预训练模型的基础上去微调出适合自己的模型？简单来说是，使用少量的数据进行微调，然后利用微调出的模型去标注类似大量的数据

Some undefined keys leading to error while finetuning

Hi, I am trying to finetune the model on my object detection dataset. I am using the config for Efficient YOLO-World with the no-mask annotation _base_.

In the finetune config file yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py, there are some fields such as _base_.copypaste_prob or _base_.min_area_ratio that is not present in the base file yolov8_l_syncbn_fast_8xb16-500e_coco.py.

I am currently unable to run the finetune scripts due to the errors caused by these keys. Can you provide some methods to tackle this problem? If any more context is needed, please let me know.

change texts in demo.py web ui, can't work.

texts on gradio web ui only work once, but restart the server cant work correctly.
I wonder if it's a mechanism,
or is there a way to reset the runner status to load new texts prompt ?

How to fine-tune a pre-trained model to suit one’s own needs?

Hello author, excellent work! I would like to ask how to fine-tune a model to suit my own needs using a small dataset of my own? The main purpose is to use it for annotating a large amount of data.Thank you!

Support Torch==2.0.0?

Hey Authors,
Thanks for the excellent job! I am wandering whether your project can support pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8? Or in general any PyTorch>2.0.0? I believe the runtime efficiency and compatibility of PyTorch>2.0.0 is way more better for python >= 3.10

Do looking forward to deploying your model, instead of grounding-DINO, in the real-time on-board system.

Thanks!

Can't create an instance of YOLOWorldDetector

I'm having problems with creating an instance of YOLOWorldDetector due to the model not being present in the mmengine registry. I'm using the model configuration from the file: https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py. I think it could be caused by different versions of mmengine, mmdet or mmyolo, but it's not clear which versions of the libraries you need from the requirements.txt file. I also couldn't run the YOLO-World without also installing mmengine and mmcv, which isn't mentioned in there either. This is the error I get:

File "scripts/yolo_world_db.py", line 93, in _load_model
runner = Runner.from_cfg(model_dict)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg
runner = cls(
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in init
self.model = self.build_model(model)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model
model = MODELS.build(model)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/miniconda3/envs/yolo_world/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
raise KeyError(
KeyError: 'YOLOWorldDetector is not in the mmengine::model registry. Please check whether the value of YOLOWorldDetector is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

Inference on test image

Is there a sample inference script on a test image?
With NMS, number of boxes, and text parameters?

where we get the full project?

by using mmyolo, i cant get the configure file for yolo_world

how can we get lvis_v1_class_captions.json

when evaluating on lvis，it reports a lack of file "lvis_v1_class_captions.json"，how can we get this file?

ONNX export questions

Hello i've been trying to toy with the onnx export from the huggingface demo
i spinned a quick ORT test code but seems to get a problem when executing :

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running NonMaxSuppression node. Name:'/NonMaxSuppression' Status Message: non_max_suppression.cc:91 onnxruntime::NonMaxSuppressionBase::PrepareCompute boxes and scores should have same spatial_dimension.

Seems like the box and score lists are mismatched in length during the NMS so this should be very close to the end of the execution so i assume that if i messed up with the inputs the model should have crashed much earlier

Can you provide some information if you have any idea what's going on ? I can provide more info on the test code if needed but it's very basic (load image, resize to 640x640, reshape in 1x3x640x640, create input dict with "images" as input name and run the model)
I tried with and without standard RGB normalization as i wasn't sure if we had to do it but both give the same error.

Can you give a simple example to test on an image？

Deployment Options / Instructions

Very impressive work! It would be great if this could be published to HuggingFace as a model as well, allowing easy API deployments.
HuggingFace doesn't allow deploying the current one.
or at least some instructions on how to deploy the ONNX/PyTorch pre-trained model in a minimal setup.

Fine-tuning model for face recognition

Is it possible to use the model and fine-tune it for face recognition task?
Could you please provide any code example for that?

@onuralpszr

I had find a workable Dockerfile to run the demo, FYI

Dockerfile Content

assume file save in ./docker/Dockerfile

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

RUN apt-get update && apt-get install -y software-properties-common && rm -rf /var/lib/apt/lists/*
RUN add-apt-repository ppa:ubuntu-toolchain-r/test
RUN apt-get update && apt-get install -y --no-install-recommends \
    libcurl4-openssl-dev \
    wget \
    vim \
    zlib1g-dev \
    git \
    pkg-config \
    sudo \
    ssh \
    libssl-dev \
    pbzip2 \
    pv \
    bzip2 \
    unzip \
    devscripts \
    lintian \
    fakeroot \
    dh-make \
    libgl1-mesa-glx \
    python3 \
    python3-pip \
    python3-dev \
    python3-wheel \
    gcc \
    g++ \
    && cd /usr/local/bin \
    && ln -s /usr/bin/python3 python \
    && ln -s /usr/bin/pip3 pip \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
RUN pip install torch torchvision torchaudio --no-cache-dir --index-url https://download.pytorch.org/whl/cu118
ENV FORCE_CUDA="1"
ENV MMCV_WITH_OPS=1
RUN pip install gradio --no-cache-dir -i  https://pypi.tuna.tsinghua.edu.cn/simple

Build Scripts

docker build -f ./docker/Dockerfile -t yolo_world:0.1 .

Run Demo

# start container in code dir
docker run -it --runtime=nvidia -p 8080:8080 -v ${PWD}:${PWD} -w ${PWD} yolo_world:0.1 bash
# install deps
pip3 install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple
# download pretrain model

# start demo
python3 ./demo.py ./configs/pretrain/yolo_world_m_dual_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py  ./model/yolo_world_m_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-2b7bd1be.pth

visit web

http://0.0.0.0:8080/

issue with links in the repo

@StevenGrove @wondervictor links for huggingface space and models weights are not working and pointed to repo itself

Open for Contributions

Hii @StevenGrove and YOLO-World team,
Great work. I tried demo.py worked awesome.
But I see the scope for lots of integration and improvements. Are you guys open to Community Contributions?

What is the difference between supervised object detection and closed (fixed) vocabulary detection ??

Hello. I've got a question regarding article about fine-tuning on custom dataset.

I would like to finetune YOLO-World on my custom dataset.

Here, I am just wondering if I have to name my custom classes for fine-tuning.

If so, what would it differ from original supervised object detection which is required to label custom classes??

Could anyone provide assistance with this?? Thank you in advance :)

ONNX export

I was trying to export pytorch model to onnx model using the export_model function in huggingface spaces demo file:

https://huggingface.co/spaces/stevengrove/YOLO-World/blob/main/tools/demo.py#L92

However, doing so results in the error

torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive_max_pool2d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues  [Caused by the value '1303 defined in (%1303 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 3  3 [ CPULongType{2} ]]()

I was confused on how should i proceed here.

Pre-trained model ckpt file: yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth

And config file used for runner loading: HERE

有个问题想请教。

groundingdino的推理是同同时需要对文本和图像编码，是同时进行的，因为要用到交叉注意力的计算。而yolo-world是离线方式的，但是不是很明白论文中的离线方式是什么？我的理解是：不用同时对文本和图像处理，在推理过程中，文本和图像不是同时进行的。我可以这么理解吗？

ONNX input

Currently, while exporting the YOLOWorldDetector model to onnx format, the text input for the model is locked for all possible input images at inference time.
Can onnx model weights where the model accepts both image and text as input be shared?
Otherwise can the pytorch to onnx conversion script be modified s.t. it is also possible to provide text (tokenized input ids and attention mask) as input to onnx model at inference time.

FileNotFoundError with demo.py

Ran into a bit of trouble trying to get the demo running with one of the finetune_coco configs. I'm getting a FileNotFoundError when I run the demo script. Looks like it's missing a file from the mmyolo configs.

Cloned the repo and set everything up as per the instructions.
Tried running the demo with this command:

python demo.py YOLO-World-master/configs/finetune_coco/yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py YOLO-World-master/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth

And hit this error:

FileNotFoundError: [Errno 2] No such file or directory: 'YOLO-World-master/configs/finetune_coco/../../third_party/mmyolo/configs/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py'

Any ideas on what's going on or how to fix this? Would really appreciate the help!

Thanks!

TypeError: init() got an unexpected keyword argument 'lines'

运行demo.py时候遇见的这个问题，请问怎么解决？使用这个命令
root@zmj:/build/YOLO-World# python demo.py configs/pretrain/yolo_world_l_dual_vlpan_vlpan_l2norm_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth
报错如下

Segmentation results

Could the demo output the segmentation results rather than the detection bounding box

How to run the demo?

I have create the conda env, download the yolo_world_l_clip_base_dua....pth

Then when I run the command follow the steps in README.md:

python3 demo.py configs/pretrain/yolo_world_s_dual_vlpan_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py  yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth

python3 demo.py configs/pretrain/yolo_world_l_dual_vlpan_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_val.py  yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth

Error occurs:

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like pretrained_models/clip-vit-base-patch32-projection is not the path to a directory containing a file named config.json.

So, should I download pretrained_models/clip-vit-base-patch32-projection first? then where to get it?

很赞的工作&与detic的比较

感谢作者，这是一项很赞的工作！！！我想从detect everything的角度上来使用这个模型，所以选取了同一张测试图片，与detic做了推理上的比较,有两个问题：
1.yolo-world的检测confidence会稍微有些低，但这不影响使用
2.yolo-world在这张图片上检出率会稍微差一些，是因为detic是大模型的缘故？

and i use 'cone' world set confidence to 0.01

Deployment Issue

Thank you very much for the excellent work done by the authors! I have a few questions and would like to discuss them:

When attempting to perform inference using the exported ONNX model with ort, I encountered the following error:

Traceback (most recent call last):
  File "main_onnxruntime.py", line 111, in <module>
    main()
  File "main_onnxruntime.py", line 86, in main
    decoder_outputs = decoder(
  File "/home/cvhub/workspace/projects/python/detection/YOLO-World/yolo_world/easydeploy/examples/numpy_coder.py", line 43, in __call__
    feats = [
  File "/home/cvhub/workspace/projects/python/detection/YOLO-World/yolo_world/easydeploy/examples/numpy_coder.py", line 44, in <listcomp>
    np.ascontiguousarray(feat[0].transpose(1, 2, 0))
ValueError: axes don't match array

The relevant command used for running is:

python main_onnxruntime.py /home/cvhub/workspace/projects/python/detection/YOLO-World/third_party/mmyolo/demo/dog.jpg /home/cvhub/workspace/projects/python/detection/YOLO-World/work_dirs/yolow-l.onnx --type YOLOV5

Upon observation, the inference process seems normal, but there appears to be an issue with decoding.

How can open vocabulary detection be supported? I noticed that the provided demo and running examples are based on the 80 classes of the COCO dataset for detection results.

Question about pretrained weights

Hi, YOLO-World Team!
Big shoutout to the team for bringing such an excellent work!🚀 bringing open-vocabulary detector to real-time world! Thanks! 😄
I'm a core maintainer and ML engineer of Ultralytics YOLOv8, recently I'm trying to migrate YOLO-World weights into our YOLOv8 repo.

I've got really close to it. However today I found the weights from the Hugging Face/YOLO-World repo is kind of different from the ones in current github/YOLO-World repo.

From the mAP tables it seems like the ones lying in github page are better, but just to confirm, I'd like to ask which one is the major(better) weights and what's the difference between them. Thanks!

Installed /usr/local/lib/python3.10/dist-packages/semantic_version-2.10.0-py3.10.egg
Searching for ruff>=0.1.7
Reading https://pypi.org/simple/ruff/
Downloading https://files.pythonhosted.org/packages/07/1e/fa1c65330787f08e73980e8401f7996882f0c556975f0cb31ef742b9908a/ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=3826fb34c144ef1e171b323ed6ae9146ab76d109960addca730756dc19dc7b22
Best match: ruff 0.2.0
Processing ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Installing ruff-0.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl to /usr/local/lib/python3.10/dist-packages
Adding ruff 0.2.0 to easy-install.pth file
Traceback (most recent call last):
  File "/content/YOLO-World/setup.py", line 163, in <module>
    setup(
  File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 130, in install_for_development
    self.process_distribution(None, self.dist, not self.no_deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 750, in process_distribution
    distros = WorkingSet([]).resolve(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 827, in resolve
    dist = self._resolve_dist(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 863, in _resolve_dist
    dist = best[req.key] = env.best_match(
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1133, in best_match
    return self.obtain(req, installer)
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1145, in obtain
    return installer(requirement)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 677, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 705, in install_item
    self.process_distribution(spec, dist, deps)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 731, in process_distribution
    self.install_egg_scripts(dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/develop.py", line 152, in install_egg_scripts
    return easy_install.install_egg_scripts(self, dist)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py", line 607, in install_egg_scripts
    dist.get_metadata('scripts/' + script_name)
  File "/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py", line 1519, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 25: invalid start byte in scripts/ruff file at path: /usr/local/lib/python3.10/dist-packages/ruff-0.2.0-py3.10-linux-x86_64.egg/EGG-INFO/scripts/ruff

Finetuning

I know it only will be released but I would like to fine-tune this on a very small dataset.

I have two questions:

Is there any good example of how a dataset should look like?
Is it possible to fine-tune it on a just one GPU with 40GB VRAM ?