Coder Social home page Coder Social logo

Comments (5)

Duankaiwen avatar Duankaiwen commented on September 13, 2024

@li10141110 cuda9 is OK. Try reducing the 'batch_size': 48 and 'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6]

from centernet.

li10141110 avatar li10141110 commented on September 13, 2024

thank you for your reply!

from centernet.

li10141110 avatar li10141110 commented on September 13, 2024

@Duankaiwen Thank you for your time but after reducing 'batch_size' to 8 and 'chunk_sizes' to [1,1,1,1,1,1,1,1] I got following error:
##################################################
/home/imdl/anaconda3/envs/centernet/bin/python /home/imdl/workspace/CenterNet-master_train/train.py CenterNet-104
loading all datasets...
using 1 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=10.99s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.31s)
creating index...
index created!
system config...
{'batch_size': 8,
'cache_dir': 'cache',
'chunk_sizes': [1, 1, 1, 1, 1, 1, 1, 1],
'config_dir': 'config',
'data_dir': '../data',
'data_rng': <mtrand.RandomState object at 0x7faa115bb870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7faa115bb8b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/imdl/workspace/CenterNet-master_train/train.py", line 204, in
train(training_dbs, validation_db, args.start_iter)
File "/home/imdl/workspace/CenterNet-master_train/train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/imdl/workspace/CenterNet-master_train/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/imdl/workspace/CenterNet-master_train/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0x4e1 (0x7fa9f42b9871 in /home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fa9f42c1a0b in /home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fa9f3a095cb in /home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: _PyCFunction_FastCallDict + 0x154 (0x556dd444e7c4 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #4: + 0x19c10c (0x556dd44dc10c in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #5: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #6: + 0x1950a6 (0x556dd44d50a6 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #7: + 0x1960e1 (0x556dd44d60e1 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #8: + 0x19c1e5 (0x556dd44dc1e5 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #9: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #10: PyEval_EvalCodeEx + 0x329 (0x556dd44d6bf9 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #11: + 0x197a14 (0x556dd44d7a14 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #12: PyObject_Call + 0x3e (0x556dd444e5ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #13: THPFunction_apply(_object
, _object
) + 0x38f (0x7fa9f3de7a2f in /home/imdl/anaconda3/envs/centernet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #14: _PyCFunction_FastCallDict + 0x91 (0x556dd444e701 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #15: + 0x19c10c (0x556dd44dc10c in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #17: + 0x1954ce (0x556dd44d54ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #18: _PyFunction_FastCallDict + 0x1bb (0x556dd44d65bb in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #19: _PyObject_FastCallDict + 0x26f (0x556dd444eb8f in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #20: + 0x129e32 (0x556dd4469e32 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #21: PyIter_Next + 0xe (0x556dd449275e in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #22: PySequence_Tuple + 0xf9 (0x556dd4497519 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x549b (0x556dd45055ab in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #24: + 0x1954ce (0x556dd44d54ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #25: _PyFunction_FastCallDict + 0x1bb (0x556dd44d65bb in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #26: _PyObject_FastCallDict + 0x26f (0x556dd444eb8f in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #27: + 0x129e32 (0x556dd4469e32 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #28: PyIter_Next + 0xe (0x556dd449275e in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #29: PySequence_Tuple + 0xf9 (0x556dd4497519 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x549b (0x556dd45055ab in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #31: + 0x1954ce (0x556dd44d54ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #32: + 0x1960e1 (0x556dd44d60e1 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #33: + 0x19c1e5 (0x556dd44dc1e5 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #35: + 0x1954ce (0x556dd44d54ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #36: + 0x1960e1 (0x556dd44d60e1 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #37: + 0x19c1e5 (0x556dd44dc1e5 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #39: + 0x1950a6 (0x556dd44d50a6 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #40: + 0x1960e1 (0x556dd44d60e1 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #41: + 0x19c1e5 (0x556dd44dc1e5 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x10bb (0x556dd45011cb in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #43: + 0x195eab (0x556dd44d5eab in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #44: + 0x19c1e5 (0x556dd44dc1e5 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #45: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #46: + 0x1950a6 (0x556dd44d50a6 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #47: _PyFunction_FastCallDict + 0x3db (0x556dd44d67db in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #48: _PyObject_FastCallDict + 0x26f (0x556dd444eb8f in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #49: _PyObject_Call_Prepend + 0x63 (0x556dd4453773 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #50: PyObject_Call + 0x3e (0x556dd444e5ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #51: _PyEval_EvalFrameDefault + 0x1a88 (0x556dd4501b98 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #52: + 0x1950a6 (0x556dd44d50a6 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #53: _PyFunction_FastCallDict + 0x1bb (0x556dd44d65bb in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #54: _PyObject_FastCallDict + 0x26f (0x556dd444eb8f in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #55: _PyObject_Call_Prepend + 0x63 (0x556dd4453773 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #56: PyObject_Call + 0x3e (0x556dd444e5ce in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #57: + 0x16a307 (0x556dd44aa307 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #58: _PyObject_FastCallDict + 0x8b (0x556dd444e9ab in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #59: + 0x19c25e (0x556dd44dc25e in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #60: _PyEval_EvalFrameDefault + 0x30a (0x556dd450041a in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #61: + 0x1950a6 (0x556dd44d50a6 in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #62: _PyFunction_FastCallDict + 0x3db (0x556dd44d67db in /home/imdl/anaconda3/envs/centernet/bin/python)
frame #63: _PyObject_FastCallDict + 0x26f (0x556dd444eb8f in /home/imdl/anaconda3/envs/centernet/bin/python)

Process finished with exit code 1
##############################################
I google the error and got nothing useful.
Any help and thank you in advance.

from centernet.

Duankaiwen avatar Duankaiwen commented on September 13, 2024

@li10141110 Try using the 'batch_size': 16 and 'chunk_sizes': [2, 2, 2, 2, 2, 2, 2, 2]?

from centernet.

li10141110 avatar li10141110 commented on September 13, 2024

Thank you,using the 'batch_size': 16 and 'chunk_sizes': [2, 2, 2, 2, 2, 2, 2, 2] failed again but when I used the 'batch_size': 16 and 'chunk_sizes': [16] the train.py started.I only have one GPU so i guess chunk_sizes == N_GPU. Thank you again~~

from centernet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.