megvii-basedetection / yolox Goto Github PK

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

License: Apache License 2.0

Python 92.07% C++ 7.93%

yolox yolov3 onnx tensorrt ncnn openvino pytorch megengine object-detection yolo

yolox's Introduction

Megvii-BaseDetection

Repos owned by BaseDetection member

hiring

We are looking for intern/full time, please send your resume to [email protected] if you are willing to join us and do something exciting!

yolox's People

Contributors

Stargazers

Watchers

Forkers

zhiqwang yx868868 xudou3 kazuhito00 qgh1223 xialuxi shiquanyu scott-mao dbofseuofhust shining-love allentdan objectdetection fatescript arcral huangwgang senwang98 cmdbug trendingtechnology topology1225 rookices zhangfan20 l976308589 wangyuan111 hust-wayne trantorrepository yaohong9257 yangyangsu29 wallzfe xxxhycl2010 hwijune fenlai laughing-q deftruth liuqinglong110 yanggui19891007 gloriahm mepleleo yu-zhengbo osirisbbq penghaozhou mrkelvinchen xhennessey rapidai zcl912 oliverqian cvjie yamguocheng tianjun-world yongjingli xrhu1992 shuaijun-deng goldlee alighofrani95 goatmessi7 templeblock cena001plus zhaoxiaolong2020 dandelion111 davis-love-ai chenshisen xiang99 cv-ip ml-lab pinto0309 anminhhung loyalbenny hhaandroid hiyyg rangilyu cc1164 myrzx imneonizer ibrandiay suke0 percer013 yangyin2016 junqiangchen beyond1235 yankai317 dongsheng1991 qinhuaping justpi mzpmzk duanyuqi987 tangohu17 tommy3266 johnren1 xyt2008 xcc2731594 hxl1990 jinjicheng niuwenju miaorain yfq512 likyoo xuner1213 cqray1990 liulei9527 sunlinlin-aragon youtang1993

yolox's Issues

Training error：An error has been caught in function 'launch', process 'MainProcess'

I'm try to run 'python tools/train.py -n yolox-s -d 1 -b 6 --fp16 -o',and the following error occurs.

2021-07-22 15:18:00 | ERROR | yolox.core.launch:68 - An error has been caught in function 'launch', process 'MainProcess' (15429), thread 'MainThread' (139668994281664):
Traceback (most recent call last):

AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):

File "/home/lijq/deeplearn/YOLOX/yolox/data/datasets/mosaicdetection.py", line 91, in getitem
img, _labels, _, _ = self._dataset.pull_item(index)
File "/home/ll/deeplearn/YOLOX/yolox/data/datasets/coco.py", line 99, in pull_item
assert img is not None
AssertionError

PermissionError: [Errno 13] Permission denied: '/data'

(YOLOX) ncy@Lenovo:~/PycharmProjects/YOLOX$ python tools/demo.py image -n yolox-s -c yolox_s.pth.tar --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
Traceback (most recent call last):
File "/home/ncy/PycharmProjects/YOLOX/tools/demo.py", line 277, in
main(exp, args)
File "/home/ncy/PycharmProjects/YOLOX/tools/demo.py", line 210, in main
os.makedirs(file_name, exist_ok=True)
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data'

Cannot download the weight

Cannot download benchmark model

The benchmark models are not accessible and are not downloadable.

tsize 设置为1280报错？

ImportError: /home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/exps/default/yolox_s.py doesn't contains class named 'Exp'

(YOLOX) ncy@Lenovo:~/PycharmProjects/YOLOX$ python tools/trt.py -n yolox-s -c yolox_s.pth.tar
2021-07-20 20:34:44.442 | ERROR | main::77 - An error has been caught in function '', process 'MainProcess' (180525), thread 'MainThread' (140272824201600):
Traceback (most recent call last):

File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/yolox/exp/build.py", line 13, in get_exp_by_file
current_exp = importlib.import_module(os.path.basename(exp_file).split(".")[0])
│ │ │ │ │ └ '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/exps/default/yolox_s.py'
│ │ │ │ └ <function basename at 0x7f93ceec7e50>
│ │ │ └ <module 'posixpath' from '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/posixpath.py'>
│ │ └ <module 'os' from '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/os.py'>
│ └ <function import_module at 0x7f93cee564c0>
└ <module 'importlib' from '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/importlib/init.py'>
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
│ │ │ │ │ └ 0
│ │ │ │ └ None
│ │ │ └ 0
│ │ └ 'yolox_s'
│ └ <function _gcd_import at 0x7f93cef8c310>
└ <module 'importlib._bootstrap' (frozen)>
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 984, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'yolox_s'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/home/ncy/PycharmProjects/YOLOX/tools/trt.py", line 77, in
main()
└ <function main at 0x7f93045813a0>

File "/home/ncy/PycharmProjects/YOLOX/tools/trt.py", line 36, in main
exp = get_exp(args.exp_file, args.name)
│ │ │ │ └ 'yolox-s'
│ │ │ └ Namespace(experiment_name=None, name='yolox-s', exp_file=None, ckpt='yolox_s.pth.tar')
│ │ └ None
│ └ Namespace(experiment_name=None, name='yolox-s', exp_file=None, ckpt='yolox_s.pth.tar')
└ <function get_exp at 0x7f9304581310>

File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/yolox/exp/build.py", line 50, in get_exp
return get_exp_by_name(exp_name)
│ └ 'yolox-s'
└ <function get_exp_by_name at 0x7f9304581280>
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/yolox/exp/build.py", line 34, in get_exp_by_name
return get_exp_by_file(exp_path)
│ └ '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/exps/default/yolox_s.py'
└ <function get_exp_by_file at 0x7f93045811f0>
File "/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/yolox/exp/build.py", line 16, in get_exp_by_file
raise ImportError("{} doesn't contains class named 'Exp'".format(exp_file))
└ '/home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/exps/default/yolox_s.py'

ImportError: /home/ncy/anaconda3/envs/YOLOX/lib/python3.9/site-packages/yolox-0.1.0-py3.9-linux-x86_64.egg/exps/default/yolox_s.py doesn't contains class named 'Exp'

cannot import name 'UnencryptedCookieSessionFactoryConfig'

s使用推理YOLOX-main python openvino_inference.py -m /project/train/yolox/YOLOX-main/yolox_nano/yolox_10.xml -i /home/1..jpg 出现问题如下
Traceback (most recent call last):
File "openvino_inference.py", line 17, in
from yolox.data.data_augment import preproc as preprocess
File "/project/train/yolox/YOLOX-main/yolox/init.py", line 4, in
from .utils import configure_module
File "/project/train/yolox/YOLOX-main/yolox/utils/init.py", line 10, in
from .ema import ModelEMA
File "/project/train/yolox/YOLOX-main/yolox/utils/ema.py", line 7, in
import apex
File "/usr/local/lib/python3.6/dist-packages/apex/init.py", line 13, in
from pyramid.session import UnencryptedCookieSessionFactoryConfig
ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig'

tensorrt version is 7？ or 8.？ orelse？

bug: The dataset sampler during single GPU training should be infinite

In yolox/exp/yolo_base.py:122

if is_distributed:
    batch_size = batch_size // dist.get_world_size()
    sampler = InfiniteSampler(
        len(self.dataset), seed=self.seed if self.seed else 0
    )
else:
    sampler = torch.utils.data.RandomSampler(self.dataset)

torch.utils.data.RandomSampler is not infinite, and will cause exception after an epoch of training.

where can I find the ncnn yolox.bin file

trainning AP is 0.0 after 10 epochs

2021-07-21 17:32:55.097 | INFO | yolox.core.trainer:after_iter:245 - epoch: 10/300, iter: 970/973, mem: 8646Mb, iter_time: 1.567s, data_time: 0.027s, total_loss: 3.7, iou_loss: 1.9, l1_loss: 0.0, conf_loss: 1.2, cls_loss: 0.7, lr: 9.993e-03, size: 512, ETA: 4 days, 23:01:45
2021-07-21 17:32:59.200 | INFO | yolox.core.trainer:save_ckpt:307 - Save weights to ./YOLOX_outputs/yolox_voc_s
2021-07-21 17:32:59.893 | INFO | yolox.core.trainer:after_train:184 - Training of experiment is done and the best AP is 0.00

Window10 onnx->tensorrt yolox

look https://github.com/ttanzhiqiang/onnx_tensorrt_project

can "nn.silu" convert to openvino layer?

Do I need to replace nn.silu to nn.relu? for work on openvino

After I followed the installation process,I got an error with Demo

ModuleNotFoundError: No module named 'resource'

Unfortunately, there is no substitute for 'resource' in Win10.

实现YOLOX的ONNXRuntime C++的推理和测试用例

提供一个YOLOX的ONNXRuntime C++的推理实现 yolox.cpp 测试用例 demo

Training time too long

I train yolox-s with batch size 128 + 8 x V100, and the task takes about 2 days 6 hours for 300 epoch. Is the training time normal ?

Is this state normal?

(base) zjnyly@zjnyly:~/Desktop/YOLOX$ python ./tools/demo.py image -n yolox-s -c ./yolox_s.pth.tar --path ./assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
2021-07-20 19:03:23 | INFO     | __main__:219 - Args: Namespace(camid=0, ckpt='./yolox_s.pth.tar', conf=0.3, demo='image', exp_file=None, experiment_name='yolox_s', fp16=False, fuse=False, name='yolox-s', nms=0.65, path='./assets/dog.jpg', save_result=True, trt=False, tsize=640)
/home/zjnyly/.local/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-07-20 19:03:23 | INFO     | __main__:229 - Model Summary: Params: 8.97M, Gflops: 26.81
2021-07-20 19:03:25 | INFO     | __main__:240 - loading checkpoint

The output after I run the code is as above, but I can't find any output in YOLOX_outputs folder.

multi-gpu training error

I train yolox-s with batch size=128 and 8xA100, but after 10 epochs, some error occur. Here is the error training log:

1554 main_func(*args)
1555 │ └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
1556 └ <function main at 0x7f723fd8d3b0>
1557
1558 File "/workdir/yolox/tools/train.py", line 101, in main
1559 trainer.train()
1560 │ └ <function Trainer.train at 0x7f7159aa0cb0>
1561 └ <yolox.core.trainer.Trainer object at 0x7f7159125650>
1562
1563 File "/workdir/yolox/yolox/core/trainer.py", line 71, in train
1564 self.train_in_epoch()
1565 │ └ <function Trainer.train_in_epoch at 0x7f7159a35290>
1566 └ <yolox.core.trainer.Trainer object at 0x7f7159125650>
1567
1568 File "/workdir/yolox/yolox/core/trainer.py", line 81, in train_in_epoch
1569 self.after_epoch()
1570 │ └ <function Trainer.after_epoch at 0x7f715916bcb0>
1571 └ <yolox.core.trainer.Trainer object at 0x7f7159125650>
1572
1573 File "/workdir/yolox/yolox/core/trainer.py", line 212, in after_epoch
1574 all_reduce_norm(self.model)
1575 │ │ └ DistributedDataParallel(
1576 │ │ (module): YOLOX(
1577 │ │ (backbone): YOLOPAFPN(
1578 │ │ (backbone): CSPDarknet(
1579 │ │ (stem): Focus(
1580 │ │ ...
1581 │ └ <yolox.core.trainer.Trainer object at 0x7f7159125650>
1582 └ <function all_reduce_norm at 0x7f7163f70ef0>
1583
1584 File "/workdir/yolox/yolox/utils/allreduce_norm.py", line 99, in all_reduce_norm
1585 states = all_reduce(states, op="mean")
1586 │ └ OrderedDict([('module.backbone.backbone.stem.conv.bn.weight', tensor([0.5405, 0.7837, 1.4152, 1.2409, 0.7082, 1.0130, 1.0755,...
1587 └ <function all_reduce at 0x7f7163f70e60>
1588
1589 File "/workdir/yolox/yolox/utils/allreduce_norm.py", line 68, in all_reduce
1590 group = _get_global_gloo_group()
1591 └ <functools._lru_cache_wrapper object at 0x7f7163f6f0f0>
1592
1593 File "/workdir/yolox/yolox/utils/dist.py", line 103, in _get_global_gloo_group
1594 return dist.new_group(backend="gloo")
1595 │ └ <function new_group at 0x7f71647869e0>
1596 └ <module 'torch.distributed' from '/workdir/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/distributed/init.py'>
1597
1598 File "/workdir/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2037, in new_group
1599 timeout=timeout)
1600 └ datetime.timedelta(seconds=1800)
1601 File "/workdir/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 521, in _new_process_group_helper
1602 timeout=timeout)
1603 └ datetime.timedelta(seconds=1800)
1604
1605 RuntimeError: [enforce fail at /opt/conda/conda-bld/pytorch_1607370156314/work/third_party/gloo/gloo/transport/tcp/device.cc:83] ifa != nullptr. Unable to find address for: ib0

colab demo

Hi, can you please add a google colab demo for inference

yolox安装不上

你好，我使用命令pip3 install -v -e . # or python3 setup.py develop时，报如下错误：

permissionerror

Traceback (most recent call last): File "tools/demo.py", line 277, in <module> main(exp, args) File "tools/demo.py", line 210, in main os.makedirs(file_name, exist_ok=True) File "/home/lengyihong/anaconda3/lib/python3.8/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/home/lengyihong/anaconda3/lib/python3.8/os.py", line 213, in makedirs makedirs(head, exist_ok=exist_ok) File "/home/lengyihong/anaconda3/lib/python3.8/os.py", line 223, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/data'
Has anyone meet this issue?

运行Demo出错

(base) lw@lw-System:~/work/YOLOX$ python3 tools/demo.py image -n yolox-s -c path/to/yolox_s.pth.tar --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
2021-07-22 12:13:29 | INFO | main:219 - Args: Namespace(camid=0, ckpt='path/to/yolox_s.pth.tar', conf=0.3, demo='image', exp_file=None, experiment_name='yolox_s', fp16=False, fuse=False, name='yolox-s', nms=0.65, path='assets/dog.jpg', save_result=True, trt=False, tsize=640)
/home/lw/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-07-22 12:13:29 | INFO | main:229 - Model Summary: Params: 8.97M, Gflops: 26.81

How does YOLOX compare to YOLOR?

How does YOLOX compare to YOLOR by Wong Kin Yiu?
In terms of speed and accuracy.

Windows10 部署

请问有没有win10部署教程？

ConnectionResetError

`Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
self.run()
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 870, in run
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
return reduction.recv_handle(conn)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/queues.py", line 116, in get
return recvfds(s, 1)[0]
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/reduction.py", line 157, in recvfds
msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_SPACE(bytes_size))
return _ForkingPickler.loads(res)
ConnectionResetError File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
: [Errno 104] Connection reset by peer
fd = df.detach()
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 509, in Client
deliver_challenge(c, authkey)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 740, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/yolox2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 757, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2021-07-21 14:23:59 | INFO | yolox.core.trainer:183 - Training of experiment is done and the best AP is 0.00
2021-07-21 14:23:59 | ERROR | yolox.core.launch:104 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (75815), thread 'MainThread' (139758017422912):
Traceback (most recent call last):

File "", line 1, in
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
│ │ └ 3
│ └ 36
└ <function _main at 0x7f1bea0ad820>
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
return self._bootstrap(parent_sentinel)
│ │ └ 3
│ └ <function BaseProcess._bootstrap at 0x7f1bea1178b0>
└
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
│ └ <function BaseProcess.run at 0x7f1bea107ee0>
└
File "/root/anaconda3/envs/yolox2/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └
│ │ │ └ (<function _distributed_worker at 0x7f1b92170670>, 0, (<function main at 0x7f1b092a2160>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:5...
│ │ └
│ └ <function _wrap at 0x7f1bd6a41040>
└
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
│ │ └ (<function main at 0x7f1b092a2160>, 4, 4, 0, 'nccl', 'tcp://127.0.0.1:57533', (╒══════════════════╤══════════════════════════...
│ └ 0
└ <function _distributed_worker at 0x7f1b92170670>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/core/launch.py", line 104, in _distributed_worker
main_func(*args)
│ └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f1b092a2160>

File "/data/zhangyong/workspace/detect/YOLOX-main/tools/train.py", line 101, in main
trainer.train()
│ └ <function Trainer.train at 0x7f1b10c650d0>
└ <yolox.core.trainer.Trainer object at 0x7f1b092c2070>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/core/trainer.py", line 70, in train
self.train_in_epoch()
│ └ <function Trainer.train_in_epoch at 0x7f1b09321430>
└ <yolox.core.trainer.Trainer object at 0x7f1b092c2070>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/core/trainer.py", line 80, in train_in_epoch
self.after_epoch()
│ └ <function Trainer.after_epoch at 0x7f1b093298b0>
└ <yolox.core.trainer.Trainer object at 0x7f1b092c2070>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/core/trainer.py", line 209, in after_epoch
all_reduce_norm(self.model)
│ │ └ DistributedDataParallel(
│ │ (module): YOLOX(
│ │ (backbone): YOLOPAFPN(
│ │ (backbone): CSPDarknet(
│ │ (stem): Focus(
│ │ ...
│ └ <yolox.core.trainer.Trainer object at 0x7f1b092c2070>
└ <function all_reduce_norm at 0x7f1bd6678940>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/utils/allreduce_norm.py", line 99, in all_reduce_norm
states = all_reduce(states, op="mean")
│ └ OrderedDict([('module.backbone.backbone.stem.conv.bn.weight', tensor([0.7139, 0.7288, 0.4413, 1.3903, 0.7023, 0.4169, 1.4530,...
└ <function all_reduce at 0x7f1bd66788b0>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/utils/allreduce_norm.py", line 68, in all_reduce
group = _get_global_gloo_group()
└ <functools._lru_cache_wrapper object at 0x7f1bd66783a0>

File "/data/zhangyong/workspace/detect/YOLOX-main/yolox/utils/dist.py", line 103, in _get_global_gloo_group
return dist.new_group(backend="gloo")
│ └ <function new_group at 0x7f1bd6cfc940>
└ <module 'torch.distributed' from '/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/distributed/init.py'>

File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2032, in new_group
pg = _new_process_group_helper(group_world_size,
│ └ 4
└ <function _new_process_group_helper at 0x7f1bd6cfb790>
File "/root/anaconda3/envs/yolox2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 517, in _new_process_group_helper
pg = ProcessGroupGloo(
└ <class 'torch.distributed.ProcessGroupGloo'>

RuntimeError: [enforce fail at /pytorch/third_party/gloo/gloo/transport/tcp/device.cc:83] ifa != nullptr. Unable to find address for: ib0`

OpenVINO demo detects nothing

openvino version:2021.3.394
The converted onnx model can work,but the converted openvino model detects nothing.

推理速度的疑问

YOLOv5官方推理速度

论文里的YOLOv5的推理速度

都是V100为何相差这么多！！！求大佬解答

Please correct the typo in the readme.

arxiv upload date needs to be corrected. (2020 -> 2021)
and, It looks like you need to modify the Model Flops to have G, not B.

建议提供1个docker镜像或者docker 编译过程

复现过程中，发现版本问题太多了，很多不兼容的，比如torch和cuda不兼容，requirements.txt也么有版本号，装上也不能用，o(╥﹏╥)o

My running outputs remain the same for a long time

Hi, When I running the code, the outputs remain the same, it can't keep on running.

2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:37 | INFO 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | WARNING 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO 2021-07-21 12:24:38 | INFO | yolox.core.trainer:130 - Model Summary: Params: 99.00M, Gflops: 281.52
| apex.amp.frontend:328 - Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
| apex.amp.frontend:329 - Defaults for this optimization level are:
| apex.amp.frontend:331 - enabled : True
| apex.amp.frontend:331 - opt_level : O1
| apex.amp.frontend:331 - cast_model_type : None
| apex.amp.frontend:331 - patch_torch_functions : True
| apex.amp.frontend:331 - keep_batchnorm_fp32 : None
| apex.amp.frontend:331 - master_weights : None
| apex.amp.frontend:331 - loss_scale : dynamic
| apex.amp.frontend:336 - Processing user overrides (additional kwargs that are not None)...
| apex.amp.frontend:354 - After processing overrides, optimization options are:
| apex.amp.frontend:356 - enabled : True
| apex.amp.frontend:356 - opt_level : O1
| apex.amp.frontend:356 - cast_model_type : None
| apex.amp.frontend:356 - patch_torch_functions : True
| apex.amp.frontend:356 - keep_batchnorm_fp32 : None
| apex.amp.frontend:356 - master_weights : None
| apex.amp.frontend:356 - loss_scale : dynamic
| apex.amp.scaler:69 - Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
| yolox.core.trainer:283 - loading checkpoint for fine tuning
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.0.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.0.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.0.bias in model is torch.Size([3]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.1.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.1.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.1.bias in model is torch.Size([3]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.2.weight in model is torch.Size([3, 320, 1, 1]).
| yolox.utils.checkpoint:26 - Shape of head.cls_preds.2.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.2.bias in model is torch.Size([3]).
| yolox.data.datasets.coco:44 - loading annotations into memory...
| yolox.data.datasets.coco:44 - Done (t=0.28s)
| pycocotools.coco:92 - creating index...
| pycocotools.coco:92 - index created!
| yolox.core.trainer:149 - init prefetcher, this might take one minute or less...

What's the problem?

Case insensitive name collision in Windows

The two files names in red box are not distinguishable in Windows, could one of them be renamed?

detector

minimal detector example is necessary for yolox to become more popular.

How to remove apex?

Hello , could you please help me remove apex since i met difficulties when compile apex.
Thanks!

arxiv link not valid

Errors in the evaluation (data has only one category)

My data has only one category, i set self.eval_interval = 1, but got ap = -1.0. The same problem is in EfficientDet eval [https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/issues/542], in yolox/layers/fast_coco_eval_api.py, i add p.catIds = 1, but got ap = 0.0.

One point worth noting in yolox/data/datasets/coco.py

i modified img_file

ModuleNotFoundError: No module named 'yolox-s'

multiclass_nms后处理出错

Traceback (most recent call last):
File "D:/coding/YOLO-stream/yolox-ort.py", line 178, in
yolox_detect(args.model, args.names, frame)
File "D:/coding/YOLO-stream/yolox-ort.py", line 153, in yolox_detect
dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.65, score_thr=0.1)
File "D:/coding/YOLO-stream/yolox-ort.py", line 103, in multiclass_nms
return np.concatenate(final_dets, 0)
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
大佬们，我在ort推理的时候，当摄像头没有识别的目标或者目标仅有一个的情况下，它就会跳以上的bug，建议还是改成nms直接处理比较好~

Will YOLOX be compatible with cvpods?

no module named yolox

Traceback (most recent call last):
  File "./tools/demo.py", line 15, in <module>
    from yolox.data.data_augment import preproc
ModuleNotFoundError: No module named 'yolox'

how can I import yolox?

demo的video无法跑出结果

image有保存的dog结果
但是video的结果为空白

~/dev/YOLOX$ python3 tools/demo.py video -n yolox-s -c pretrained_models/yolox_s.pth.tar --path /assets/ch14_0616-0625.mp4 --conf 0.3 --nms 0.65 --tsize 640 --save_result
2021-07-22 14:34:23 | INFO | main:219 - Args: Namespace(camid=0, ckpt='pretrained_models/yolox_s.pth.tar', conf=0.3, demo='video', exp_file=None, experiment_name='yolox_s', fp16=False, fuse=False, name='yolox-s', nms=0.65, path='/assets/ch14_0616-0625.mp4', save_result=True, trt=False, tsize=640)
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2021-07-22 14:34:23 | INFO | main:229 - Model Summary: Params: 8.97M, Gflops: 26.81
2021-07-22 14:34:28 | INFO | main:240 - loading checkpoint
2021-07-22 14:34:29 | INFO | main:245 - loaded checkpoint done.
2021-07-22 14:34:29 | INFO | main:183 - video save_path is ./YOLOX_outputs/yolox_s/vis_res/2021_07_22_14_34_29/ch14_0616-0625.mp4
但是检查该目录下并没有mp4文件。

ValueError: current limit exceeds maximum limit

when I run demo, I have this error：
ValueError: current limit exceeds maximum limit

Cannot create Swish layer Mul_43 id:38

When run openvino c++ demo, the following error accured:
'''
Cannot create Swish layer Mul_43 id:38
C:\j\workspace\private-ci\ie\build-windows-icc2018\b\repos\closed-dldt\inference-engine\src\inference_engine\ie_ir_parser.cpp:424
C:\j\workspace\private-ci\ie\build-windows-icc2018\b\repos\closed-dldt\inference-engine\src\inference_engine\ie_core.cpp:493
'''

Env:
windows10, vs2019, openvino 2020.3.194

Model used:
YOLOX-S-nano, YOLOX-S downloaded from:
https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/OpenVINO/cpp

Do I need to convert the model from onnx format to openvino type myself?

Segmentation fault (core dumped)

Hi, thanks the opening source code! When I train my own datasets with VOC style, the process of training is nomal, but the eval process raise 'Segmentation fault (core dumped)'. It seems the val_dataloader problem.

demo结果有问题

你好，我直接下载官方提供的yolox-nano模型跑demo程序，测试dog.jpg，发现输出结果稍微有点问题，多了一个目标。调用代码如下：
python tools/demo.py image -n yolox-nano -c ./models/yolox_nano.pth.tar --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 416 --save_result
显示结果如下图所示：

我打印输出outputs的结果如下：
tensor([[ 71.5272, 116.8364, 172.1584, 295.3528, 0.9587, 0.8872, 16.0000], [ 60.9691, 75.9739, 307.4900, 231.2223, 0.9442, 0.8800, 1.0000], [252.5838, 41.6913, 375.4042, 91.2227, 0.7677, 0.8870, 2.0000], [252.0094, 41.5889, 375.6715, 93.2385, 0.4352, 0.8493, 7.0000]], device='cuda:0')
另外，比较奇怪的是，我把pytorch模型转换成ncnn之后检测结果正常。直接使用的是官方提供的yolox.cpp文件，参数设置和上面相同，检测结果如下图所示：

16 = 0.84702 at 132.07 215.84 185.92 x 329.64 1 = 0.82856 at 113.21 140.14 454.67 x 286.03 2 = 0.68373 at 466.35 77.03 226.64 x 91.41
非常感谢您的工程！

执行demo的时候报错

resource.setrlimit(resource.RLIMIT_NOFILE, (ulimit_value, rlimit[1]))
ValueError: current limit exceeds maximum limit

这个问题请问怎么解决呀

comments on configs

Hi，

thank you for sharing such a wonderful work. I would appreciate it if you could add some comments on some params in config files / hyperparams in model structure. Those are really helpful for us beginners.

Best

Distributed worker error during training caused by multiprocess connection error

I was training yolox-s on 8 2080 gpus with batch size 64. Each time at 10/300 epoch the issue happens.
The error log is as follows:

EOFError

Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/connection.py", line 509, in Client
deliver_challenge(c, authkey)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/connection.py", line 740, in deliver_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
2021-07-21 09:11:10 | ERROR | yolox.core.launch:104 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (28776), thread 'MainThread' (140335833473600):
Traceback (most recent call last):

  File "<string>", line 1, in <module>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               │     │   └ 3
               │     └ 37
               └ <function _main at 0x7fa27a2465e0>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
    return self._bootstrap(parent_sentinel)
           │    │          └ 3
           │    └ <function BaseProcess._bootstrap at 0x7fa27a369820>
           └ <SpawnProcess name='SpawnProcess-1' parent=28716 started>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7fa27a37fe50>
    └ <SpawnProcess name='SpawnProcess-1' parent=28716 started>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <SpawnProcess name='SpawnProcess-1' parent=28716 started>
    │    │        │    └ (<function _distributed_worker at 0x7fa0130f5550>, 0, (<function main at 0x7fa00a620af0>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:4...
    │    │        └ <SpawnProcess name='SpawnProcess-1' parent=28716 started>
    │    └ <function _wrap at 0x7fa01731edc0>
    └ <SpawnProcess name='SpawnProcess-1' parent=28716 started>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
    │  │   └ (<function main at 0x7fa00a620af0>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:48781', (╒══════════════════╤══════════════════════════...
    │  └ 0
    └ <function _distributed_worker at 0x7fa0130f5550>

> File "/home/hdzhang/YOLOX/yolox/core/launch.py", line 104, in _distributed_worker
    main_func(*args)
    │          └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
    └ <function main at 0x7fa00a620af0>

  File "/home/hdzhang/YOLOX/tools/train.py", line 101, in main
    trainer.train()
    │       └ <function Trainer.train at 0x7fa01767d5e0>
    └ <yolox.core.trainer.Trainer object at 0x7fa00a635d30>

  File "/home/hdzhang/YOLOX/yolox/core/trainer.py", line 70, in train
    self.train_in_epoch()
    │    └ <function Trainer.train_in_epoch at 0x7fa00a8e8310>
    └ <yolox.core.trainer.Trainer object at 0x7fa00a635d30>

  File "/home/hdzhang/YOLOX/yolox/core/trainer.py", line 80, in train_in_epoch
    self.after_epoch()
    │    └ <function Trainer.after_epoch at 0x7fa00a6005e0>
    └ <yolox.core.trainer.Trainer object at 0x7fa00a635d30>

  File "/home/hdzhang/YOLOX/yolox/core/trainer.py", line 209, in after_epoch
    all_reduce_norm(self.model)
    │               │    └ DistributedDataParallel(
    │               │        (module): YOLOX(
    │               │          (backbone): YOLOPAFPN(
    │               │            (backbone): CSPDarknet(
    │               │              (stem): Focus(
    │               │       ...
    │               └ <yolox.core.trainer.Trainer object at 0x7fa00a635d30>
    └ <function all_reduce_norm at 0x7fa016ef95e0>

  File "/home/hdzhang/YOLOX/yolox/utils/allreduce_norm.py", line 99, in all_reduce_norm
    states = all_reduce(states, op="mean")
             │          └ OrderedDict([('module.backbone.backbone.stem.conv.bn.weight', tensor([1.4156, 2.5198, 2.6882, 1.5280, 3.4103, 2.3906, 2.5711,...
             └ <function all_reduce at 0x7fa016ef9550>

  File "/home/hdzhang/YOLOX/yolox/utils/allreduce_norm.py", line 68, in all_reduce
    group = _get_global_gloo_group()
            └ <functools._lru_cache_wrapper object at 0x7fa016ef9040>

  File "/home/hdzhang/YOLOX/yolox/utils/dist.py", line 103, in _get_global_gloo_group
    return dist.new_group(backend="gloo")
           │    └ <function new_group at 0x7fa017800820>
           └ <module 'torch.distributed' from '/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/distributed/__ini...

  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2694, in new_group
    pg = _new_process_group_helper(group_world_size,
         │                         └ 8
         └ <function _new_process_group_helper at 0x7fa0177ff1f0>
  File "/home/hdzhang/miniconda3/envs/centernet/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 616, in _new_process_group_helper
    pg = ProcessGroupGloo(
         └ <class 'torch._C._distributed_c10d.ProcessGroupGloo'>

Windows Support

Is there any plan to provide windows support ?

Variable names in Focus Layer

Thanks for sharing your work!

YOLOX/yolox/models/network_blocks.py

Lines 167 to 170 in f281865

    
           patch_top_left = x[..., ::2, ::2] 
        
           patch_top_right = x[..., ::2, 1::2] 
        
           patch_bot_left = x[..., 1::2, ::2] 
        
           patch_bot_right = x[..., 1::2, 1::2]

I think that these variable names can cause misunderstanding. The results of indexing tensors are not top-left, top-right, bot-left, and bot-right patches. Its indexing method is related to indexing values whose index(x or y) is odd or even number.

It's hard to name these values... I tried to think about best name for these variables, but I failed.

 tensor_even_y_even_x = x[..., ::2, ::2] 
 tensor_even_y_odd_x = x[..., ::2, 1::2] 
 tensor_odd_y_even_x = x[..., 1::2, ::2] 
 tensor_odd_y_odd_x = x[..., 1::2, 1::2]

	patch_top_left = x[..., ::2, ::2]
	patch_top_right = x[..., ::2, 1::2]
	patch_bot_left = x[..., 1::2, ::2]
	patch_bot_right = x[..., 1::2, 1::2]