Coder Social home page Coder Social logo

train error about centernet HOT 56 OPEN

duankaiwen avatar duankaiwen commented on September 13, 2024
train error

from centernet.

Comments (56)

Duankaiwen avatar Duankaiwen commented on September 13, 2024 1

The version maybe high, try cuda 8.0 or cuda 9.0. See this: sangwoomo/instagan#4

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

@Duankaiwen 谢谢小哥哥

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Can I see your full log?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
No cache file found...
loading annotations into memory...
Done (t=25.19s)
creating index...
index created!
118287it [01:18, 1509.02it/s]
loading annotations into memory...
Done (t=20.50s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.08s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=20.29s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.57s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
No cache file found...
loading annotations into memory...
Done (t=1.26s)
creating index...
index created!
5000it [00:03, 1478.28it/s]
loading annotations into memory...
Done (t=0.61s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fac46fc3870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fac46fc38b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
start prefetching data...
shuffling indices...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object
, _object
) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

How many GPUs do you have?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

i have put the val into the training set as you said,but this erro occurred
thank you help me @Duankaiwen

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

16G

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

How many GPUs, not GPU memory

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

sorrysorry 8g

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

only one ,2070

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

thank you

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

best wish for you
i have try it

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 16, but expected 2) (scatter at torch/csrc/cuda/comm.cpp:135)
frame #0: + 0xc42a0b (0x7f94eb489a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x38a5cb (0x7f94eabd15cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object
, _object
) + 0x38f (0x7f94eafafa2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

Ah, I am crazy now.
Still not, God, help me.

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Can I see your config/CenterNet-104.json?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

{
"system": {
"dataset": "MSCOCO",
"batch_size": 48,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [6,6,6,6,6,6,6,6],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

I showed you the original, I also modified it.But still can't
do you know chinese

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Can I see your own config/CenterNet-104.json?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

I don't have my own config, this is my own download.

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

This is what I used for training.

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

You said you have modified it the config/CenterNet-104.json, and I want to know what does the modified file look like. The log shows that there are some errors in config/CenterNet-104.json. I need to know the detail of config/CenterNet-104.json to help you

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

{
"system": {
"dataset": "MSCOCO",
"batch_size": 2,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [2,2,2,2,2,2,2,2],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

This is what I changed after I modified it.

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Modify 'chunk_sizes' to [2], not [2,2,2,2,2,2,2,2]

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

ok ok 谢谢你啊

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

You are really enthusiastic, thank you.

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=24.58s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.16s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.12s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.61s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 2,
'cache_dir': 'cache',
'chunk_sizes': [2],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f75971ee900>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f75971ee948>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

Hello, I follow what you said, but still can't

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

what's your version of cuda

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

cuda 10

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

Hello author, I am bothering you again. I really appreciate your help to me yesterday. I changed cuda to 9, but still can't train, but I can test it.

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=22.31s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.15s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.04s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.53s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 1,
'cache_dir': 'cache',
'chunk_sizes': [1],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fd20efd0870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fd20efd08b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

loading parameters at iteration: 480000
building neural network...
module_file: models.CenterNet-104
total parameters: 210062960
loading parameters...
loading model from cache/nnet/CenterNet-104/CenterNet-104_480000.pkl
locating kps: 0%| | 0/5000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
locating kps: 72%|██████████████████ | 3624/5000 [34:05<14:17, 1.60it/

this is test

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

How about now?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

RuntimeError: Expected object of type CUDAByteType but found type CUDAFloatType for argument #0 'result' (checked_cast_tensor at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/ATen/Utils.h:30)
frame #0: + 0xf5cb33 (0x7f961bb32b33 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #1: at::CUDAFloatType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x26 (0x7f961bb361c6 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #2: torch::autograd::VariableType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x19c (0x7f96351b7f2c in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: at::Type::gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x118 (0x7f961bc8fdd8 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #4: pool_backward(at::Tensor, at::Tensor) + 0x435 (0x7f95ed4c3315 in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #5: + 0x112cb (0x7f95ed4cd2cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #6: + 0x1149e (0x7f95ed4cd49e in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x11e1c (0x7f95ed4cde1c in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #8: _PyCFunction_FastCallDict + 0x154 (0x555e9c1c2744 in python)
frame #9: + 0x19842c (0x555e9c24942c in python)
frame #10: _PyEval_EvalFrameDefault + 0x30a (0x555e9c26e38a in python)
frame #11: PyEval_EvalCodeEx + 0x329 (0x555e9c244289 in python)
frame #12: + 0x194094 (0x555e9c245094 in python)
frame #13: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #14: _PyEval_EvalFrameDefault + 0x19ec (0x555e9c26fa6c in python)
frame #15: + 0x1918e4 (0x555e9c2428e4 in python)
frame #16: _PyFunction_FastCallDict + 0x1bc (0x555e9c243c4c in python)
frame #17: _PyObject_FastCallDict + 0x26f (0x555e9c1c2b0f in python)
frame #18: _PyObject_Call_Prepend + 0x63 (0x555e9c1c76a3 in python)
frame #19: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #20: torch::autograd::PyFunction::apply(std::vector<torch::autograd::Variable, std::allocatortorch::autograd::Variable > const&) + 0x199 (0x7f9635197579 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: torch::autograd::Engine::evaluate_function(torch::autograd::FunctionTask&) + 0x1d1e (0x7f963518254e in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #22: torch::autograd::Engine::thread_main(torch::autograd::GraphTask*) + 0xe7 (0x7f9635182f17 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: torch::autograd::Engine::thread_init(int) + 0x72 (0x7f963517f822 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7f96351ad8aa in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: + 0xb8678 (0x7f96187ea678 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #26: + 0x76ba (0x7f96454606ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #27: clone + 0x6d (0x7f964519641d in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs and still cannot run

from centernet.

hheavenknowss avatar hheavenknowss commented on September 13, 2024

Hello,I have the same issue,and I have two gpus,do I need to change the config like above? if I need to,please tell me how. I have been troubled for 2 weeks,I'll be vrey appreciate if I can fix it

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

please show your log

from centernet.

hheavenknowss avatar hheavenknowss commented on September 13, 2024

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly,I‘ll put it later,thanks again

from centernet.

hheavenknowss avatar hheavenknowss commented on September 13, 2024

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly,I‘ll put it later,thanks again

loading all datasets...
using 4 threads
loading from cache file: cache/coco_hepaticvessel_001.pkl
No cache file found...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
49it [00:00, 36524.06it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f5fd2cc5ab0>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f5fd2cc5af8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [512, 512],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 49
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]shuffling indices...
shuffling indices...

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/dfsdata/pengxf2_data/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/usr/local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/usr/local/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: Device index must be -1 or non-negative, got -1687419088 (Device at /pytorch/torch/lib/tmp_install/include/ATen/Device.h:47)
frame #0: + 0xc4964b (0x7f5f6517b64b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x39120b (0x7f5f648c320b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object
, _object
) + 0x38f (0x7f5f64ca166f in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

from centernet.

hheavenknowss avatar hheavenknowss commented on September 13, 2024

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

I'll try it thank you , and I've tried batch_size 8 and chunk_size [4,4] it worked, I wonder if most these issues are about batch_size and chun_size set?

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Yes

from centernet.

hheavenknowss avatar hheavenknowss commented on September 13, 2024

Yes

Thank you for your answer and patience

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

No problem

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=2406.72s)
creating index...
index created!
Traceback (most recent call last):
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 69, in init
self._load_coco_data()
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 85, in _load_coco_data
data = json.load(f)
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

i'm sorry ,Disturb you again,when i run this code in 1080 (cuda9 and torch0.41),This happens. How can I solve it?

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

Try this:
cd /data/coco/PythonAPI
make

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

Thank you for answering my question late at night. I did what you said, but this happened.
python setup.py build_ext --inplace
running build_ext
skipping 'pycocotools/_mask.c' Cython extension (up-to-date)
building 'pycocotools._mask' extension
creating build
creating build/common
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pycocotools
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c ../common/maskApi.c -o build/temp.linux-x86_64-3.6/../common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99
../common/maskApi.c: In function ‘rleToBbox’:
../common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; }
^
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pycocotools
gcc -pthread -shared -B /home/zq/anaconda3/compiler_compat -L/home/zq/anaconda3/lib -Wl,-rpath=/home/zq/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/../common/maskApi.o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -o build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so -> pycocotools
rm -rf build

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

This error occurred when I ran the program.

zq@zq-G1-SNIPER-B7:~/辛宇/CenterNet-master$ python train.py CornerNet
Traceback (most recent call last):
File "train.py", line 18, in
from nnet.py_factory import NetworkFactory
File "/home/zq/辛宇/CenterNet-master/nnet/py_factory.py", line 8, in
from models.py_utils.data_parallel import DataParallel
File "/home/zq/辛宇/CenterNet-master/models/py_utils/init.py", line 6, in
from ._cpools import TopPool, BottomPool, LeftPool, RightPool
File "/home/zq/辛宇/CenterNet-master/models/py_utils/_cpools/init.py", line 8, in
import top_pool, bottom_pool, left_pool, right_pool
ImportError: /home/zq/.local/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/top_pool.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5torch4barfEPKcz

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

what's your torch version?

from centernet.

nuist-xinyu avatar nuist-xinyu commented on September 13, 2024

in this computer

import torch
print(torch.version)
0.5.0a0+ce8e8fe

from centernet.

lolongcovas avatar lolongcovas commented on September 13, 2024

please, check here

from centernet.

Orange-Ocean-hh avatar Orange-Ocean-hh commented on September 13, 2024

Traceback (most recent call last):
File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 99, in kp_detection
image, detections = random_crop(image, detections, rand_scales, input_size, border=border)
File "/data/data/fxy/CenterNet-master/sample/utils.py", line 57, in random_crop
image_height, image_width = image.shape[0:2]
AttributeError: 'NoneType' object has no attribute 'shape'
Hello,sorry to disturb.How can I fix the error?I'm not sure if there is any problem about 'shape'.

from centernet.

WuChannn avatar WuChannn commented on September 13, 2024

@Duankaiwen hello, kaiwen, could you please show where to specify the ids of gpu used? or the code will use all the gpus automatically? thank you

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'

from centernet.

WuChannn avatar WuChannn commented on September 13, 2024

from centernet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.