tlc-pack / tenset Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Hi, mercy
I meet some problems while training the model. When I make dataset, if add the option --sample-in-files 100
, python train_model.py
is ok. But if I add the option --hold-out all_five
, python train_model.py
will report some errors. Below are the details
(base) zhaiyi@linke8:~/tenset/scripts$ CUDA_VISIBLE_DEVICES='7' python /data/workspace/zhaiyi/tenset/scripts/train_model.py --dataset=dataset_all.pkl
Arguments: Namespace(dataset=['dataset_all.pkl'], models='mlp', seed=0, split_scheme='within_task', train_ratio=0.9, use_gpu=False)
Load all tasks...
Load dataset...
Train set: 7415211. Task 0 = LearningTask(workload_key='["142c0886579d3901e9f6db0e30878395", 1, 8, 8, 512, 3, 3, 512, 512, 1, 1, 1, 512, 1, 8, 8, 512, 1, 8, 8, 512]', target='llvm -keys=cpu -link-params=0 -mcpu=skylake-avx512 -model=platinum-8272')
Test set: 823967. Task 0 = LearningTask(workload_key='["142c0886579d3901e9f6db0e30878395", 1, 8, 8, 512, 3, 3, 512, 512, 1, 1, 1, 512, 1, 8, 8, 512, 1, 8, 8, 512]', target='llvm -keys=cpu -link-params=0 -mcpu=skylake-avx512 -model=platinum-8272')
Segmentation fault (core dumped)
Then I tried to debug with vscode's breakpoints, and reported an error in here
if device is None:
if torch.cuda.device_count():
device = 'cuda:0'
else:
device = 'cpu'
print(device)
Then I modified the sourecode as below, but reported another error
class MLPModelInternal:
def __init__(self, device=None, few_shot_learning="base_only", use_workload_embedding=True, use_target_embedding=False,
loss_type='lambdaRankLoss'):
if device is None:
# if torch.cuda.device_count():
# device = 'cuda:0'
# else:
# device = 'cpu'
device = 'cuda:0'
print(device)
(base) zhaiyi@linke8:~/tenset/scripts$ CUDA_VISIBLE_DEVICES='7' python /data/workspace/zhaiyi/tenset/scripts/train_model.py --dataset=dataset_all.pkl
Arguments: Namespace(dataset=['dataset_all.pkl'], models='mlp', seed=0, split_scheme='within_task', train_ratio=0.9, use_gpu=False)
Load all tasks...
Load dataset...
Train set: 7415211. Task 0 = LearningTask(workload_key='["142c0886579d3901e9f6db0e30878395", 1, 8, 8, 512, 3, 3, 512, 512, 1, 1, 1, 512, 1, 8, 8, 512, 1, 8, 8, 512]', target='llvm -keys=cpu -link-params=0 -mcpu=skylake-avx512 -model=platinum-8272')
Test set: 823967. Task 0 = LearningTask(workload_key='["142c0886579d3901e9f6db0e30878395", 1, 8, 8, 512, 3, 3, 512, 512, 1, 1, 1, 512, 1, 8, 8, 512, 1, 8, 8, 512]', target='llvm -keys=cpu -link-params=0 -mcpu=skylake-avx512 -model=platinum-8272')
cuda:0
============================================================
Fit a net. Train size: 7415211
malloc(): invalid next size (unsorted)
Aborted (core dumped)
Ubuntu 20.04.1 LTS \n \l
CUDA Version: 11.4
torch 1.8.2+cu111
torchaudio 0.8.2
torchvision 0.9.2+cu111
memery size 504G
I don't think this error is relative to torch, because there is no error in training mode if I add the option --sample-in-files 100
Do you have any advice on this issue? Thank you, mercy.
Is there a way to look at the generated programs/subgraphs? For example, if there is some CUDA/Python code that's generated and measured, is it possible to look at that code file?
Apologies, if the answer to this is obvious. Really new to TVM.
Thanks,
Akash
Command (from tutorial): python3 tune_network.py --network resnet_50 --n-trials 100 --cost-model mlp-no-update --load-model mlp.pkl --transfer-tune
Error:
Traceback (most recent call last):
File "tune_network.py", line 180, in <module>
args.result_file, args.transfer_tune, args.search_type)
File "tune_network.py", line 93, in tune_and_evaluate
tuner.transfer_tune(tuning_opt, search_policy=policy)
File "<pwd>/tenset/python/tvm/auto_scheduler/task_scheduler.py", line 574, in transfer_tune
few_shot_learning='plus_mix_task'
File "<pwd>/tenset/python/tvm/auto_scheduler/task_scheduler.py", line 121, in make_search_policies
cost_model.model.fit_local(local_dataset)
File "<pwd>/tenset/python/tvm/auto_scheduler/cost_model/mlp_model.py", line 464, in fit_local
self.local_model[task] = diff_model
TypeError: 'NoneType' object does not support item assignment
This tutorial contains a minimal example of training a cost model and using it for search.
pip3 install gdown
gdown https://drive.google.com/uc?id=1hciRGyXcGY9fK_owgvlJow8P_l8xYIVJ
dataset_v3.1.zip
under tvm-cost-model/scripts
and run unzip dataset_v3.1.zip
dataset
will appear in tvm-cost-model/scripts
.see this readme
Go to tvm-cost-model/scripts
.
python3 make_dataset.py --logs dataset/measure_records/e5-2673/*.json --sample-in-files 100
python3 make_dataset.py --logs dataset/measure_records/e5-2673/*.json
python3 train_model.py
python3 tune_network.py --network resnet_50 --n-trials 100 --cost-model xgb-no-update --load-model xgb.pkl
Hi, I am collecting data at my desktop (2060 gpu) and error occurs:
Traceback (most recent call last):
File "measure_programs.py", line 137, in
remeasure_file(i, task, target, args.target_host, args.batch_size, measurer_kwargs)
File "measure_programs.py", line 82, in remeasure_file
res_batch = measurer.measure(task, empty_policy, inp_batch)
File "/root/.local/lib/python3.6/site-packages/tvm-0.8.dev1882+g5a2eed60b-py3.6-linux-x86_64.egg/tvm/runtime/object.py", line 65, in getattr
return _ffi_node_api.NodeGetAttr(self, name)
File "/root/.local/lib/python3.6/site-packages/tvm-0.8.dev1882+g5a2eed60b-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/packed_func.py", line 237, in call
raise get_last_ffi_error()
TypeError: Traceback (most recent call last):
3: TVMFuncCall
2: _ZNSt17_Function_handlerI
1: tvm::NodeGetAttr(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
0: tvm::ReflectionVTable::GetAttr(tvm::runtime::Object*, tvm::runtime::String const&) const
File "/home/lenovo/Desktop/zzh/phd/tvm/tvm/include/tvm/node/reflection.h", line 390
TypeError: auto_scheduler.ProgramMeasurer is not registered via TVM_REGISTER_NODE_TYPE
Have tried several backbones (resnet18, xception etc), same error occurs
I am using tvm-0.8
After I ran setup.py to install the defaulat dependencies and tried the 'get-started' example, I met a few issues and it seems that the versions for torch, xgboost, and other dependencies need to be specified. Could anyone please show me the dependency versions? Thanks.
I encounter two bugs when collecting data on NVIDIA TX2. The settings are:
Device: NVIDIA TX2 (cuda gpu)
Model:
for batch_size in [1]:
for image_size in [256]:
for layer in [18]:
network_keys.append((f'resnet_{layer}',
[(batch_size, 3, image_size, image_size)]))
shell:
python measure_programs.py --target "cuda -keys=cudagpu -arch=SM_62" --target-host="llvm"
The meeting bugs are:
root@kyrie-desktop:/home/kyrie/Desktop/ijcai/tenset/scripts# python measure_programs.py --target "cuda -keys=cudagpu -arch=SM_62" --target-host="llvm"
Load all tasks...
===== task: 0 programs: 0/193 =====
Get 128 programs to measure:
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
Time elapsed for measurement: 708.33 s
===== task: 0 programs: 128/193 =====
Get 65 programs to measure:
........TTTTTTTT
........TTTTTTTT
........TTTTTTTT
[10:37:55] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
[10:38:39] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
[10:39:23] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
[10:40:07] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
[10:40:52] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
Placeholder:
Placeholder:
Placeholder:
Placeholder:
==================================================
No: 190 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:RunTimeoutError, error_msg:, all_cost:6.02, Tstamp:1636080085.83)
==================================================
Placeholder:
Placeholder:
Placeholder:
[10:41:36] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
Placeholder:
[10:41:42] /home/kyrie/Desktop/ijcai/tenset/src/auto_scheduler/measure.cc:337: warning: Too many errors happened during tuning. Switching to debug mode.
The error starts to happen at the 153th program for each task.
Placeholder:
#networks: 108
#tasks: 1577 (sorted according to FLOPs, small to large)
#programs: 1577 * 3000 = 4731000
https://github.com/merrymercy/tvm-cost-model/tree/main/scripts#data-collection-procedure
Check the consistency of measurement
@merrymercy hi~
In your open source code, the input dimension of the MLP model is 164, which is aligned with Ansor, but in Appendix C of the Tenset paper, the input dimension is set to 324. Did you do anything?
Looking forward your reply~
I want to profile the kernels using ncu --target-processes all python3 measure_programs.py --target cuda
, but no kernels are profiled. Is it normal? How could I profile the kernels using nvidia profiler? (ncu, nsys, or nvprof?)
Hello everyone,
I‘m currently building my own dataset. I want to know how to get the collected measurement records. The device I am using does not match the provided dataset.
Hi, mercy
When executing python measure_programs.py "--target=cuda"
, I get some errors. Those errors are tvm error and timeout error.
i try to modify the build time to 30 seconds and run time to 30 seconds, but these is no difference.
what target should i specify when i measure programs with 2080Ti? I didn't find documentation to guide setting target when using cuda. Is it enough to just set --target=cuda
?
Thank you, mercy~
6: ffi_call_unix64
5: TVMArrayFree
at /root/tenset/src/runtime/ndarray.cc:295
4: _ZN3tvm7runtime7NDArray8Int
3: tvm::runtime::NDArray::FFIDecRef(DLTensor*)
at /root/tenset/include/tvm/runtime/ndarray.h:383
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
1: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
0: tvm::runtime::CUDADeviceAPI::FreeDataSpace(DLContext, void*)
at /root/tenset/src/runtime/cuda/cuda_device_api.cc:127
File "/root/tenset/src/runtime/cuda/cuda_device_api.cc", line 127
TVMError: ---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: misaligned address
Exception ignored in: <function NDArrayBase.__del__ at 0x7f92d8afc820>
Traceback (most recent call last):
File "/root/tenset/python/tvm/_ffi/_ctypes/ndarray.py", line 82, in __del__
check_call(_LIB.TVMArrayFree(self.handle))
File "/root/tenset/python/tvm/_ffi/base.py", line 346, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
43: 0xffffffffffffffff
42: __clone
41: start_thread
at /build/glibc-uZu3wS/glibc-2.27/nptl/pthread_create.c:463
40: 0x0000000000619066
39: 0x00000000006390f7
38: PyObject_Call
37: 0x0000000000537506
36: _PyObject_Call_Prepend
35: _PyEval_EvalFrameDefault
34: PyVectorcall_Call
33: 0x00000000005ce97f
32: _PyFunction_Vectorcall
31: 0x000000000045b756
30: 0x0000000000609bbf
29: _PyFunction_Vectorcall
28: 0x000000000045b756
27: 0x0000000000609bbf
26: _PyFunction_Vectorcall
25: _PyEval_EvalFrameDefault
24: PyVectorcall_Call
23: _PyFunction_Vectorcall
22: _PyEval_EvalCodeWithName
21: _PyEval_EvalFrameDefault
20: PyVectorcall_Call
19: _PyFunction_Vectorcall
18: 0x0000000000500cb4
17: 0x0000000000501310
16: 0x0000000000535307
15: PyObject_CallFinalizerFromDealloc
14: 0x00000000005fa0c5
13: 0x0000000000535c2b
12: _PyFunction_Vectorcall
11: 0x000000000045c107
10: _PyObject_MakeTpCall
9: 0x00007f939f8cc763
8: _ctypes_callproc
7: ffi_call
6: ffi_call_unix64
5: TVMArrayFree
at /root/tenset/src/runtime/ndarray.cc:295
4: _ZN3tvm7runtime7NDArray8Int
3: tvm::runtime::NDArray::FFIDecRef(DLTensor*)
at /root/tenset/include/tvm/runtime/ndarray.h:383
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
1: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
0: tvm::runtime::CUDADeviceAPI::FreeDataSpace(DLContext, void*)
at /root/tenset/src/runtime/cuda/cuda_device_api.cc:127
File "/root/tenset/src/runtime/cuda/cuda_device_api.cc", line 127
TVMError: ---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: misaligned address
Exception ignored in: <function NDArrayBase.__del__ at 0x7f92d8afc820>
Traceback (most recent call last):
File "/root/tenset/python/tvm/_ffi/_ctypes/ndarray.py", line 82, in __del__
check_call(_LIB.TVMArrayFree(self.handle))
File "/root/tenset/python/tvm/_ffi/base.py", line 346, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
42: 0xffffffffffffffff
41: __clone
40: start_thread
at /build/glibc-uZu3wS/glibc-2.27/nptl/pthread_create.c:463
39: 0x0000000000619066
38: 0x00000000006390f7
37: PyObject_Call
36: 0x0000000000537506
35: _PyObject_Call_Prepend
34: _PyEval_EvalFrameDefault
33: PyVectorcall_Call
32: 0x00000000005ce97f
31: _PyFunction_Vectorcall
30: 0x000000000045b756
29: 0x0000000000609bbf
28: _PyFunction_Vectorcall
27: 0x000000000045b756
26: 0x0000000000609bbf
25: _PyFunction_Vectorcall
24: _PyEval_EvalFrameDefault
23: PyVectorcall_Call
22: _PyFunction_Vectorcall
21: _PyEval_EvalCodeWithName
20: _PyEval_EvalFrameDefault
19: PyVectorcall_Call
18: _PyFunction_Vectorcall
17: 0x0000000000500cb4
16: 0x0000000000535307
15: PyObject_CallFinalizerFromDealloc
14: 0x00000000005fa0c5
13: 0x0000000000535c2b
12: _PyFunction_Vectorcall
11: 0x000000000045c107
10: _PyObject_MakeTpCall
9: 0x00007f939f8cc763
8: _ctypes_callproc
7: ffi_call
6: ffi_call_unix64
5: TVMArrayFree
at /root/tenset/src/runtime/ndarray.cc:295
4: _ZN3tvm7runtime7NDArray8Int
3: tvm::runtime::NDArray::FFIDecRef(DLTensor*)
at /root/tenset/include/tvm/runtime/ndarray.h:383
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
1: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
0: tvm::runtime::CUDADeviceAPI::FreeDataSpace(DLContext, void*)
at /root/tenset/src/runtime/cuda/cuda_device_api.cc:127
File "/root/tenset/src/runtime/cuda/cuda_device_api.cc", line 127
TVMError: ---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: misaligned address
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [16:04:22] /root/tenset/src/runtime/cuda/cuda_module.cc:61: CUDAError: cuModuleUnload(module_[i]) failed with error: CUDA_ERROR_MISALIGNED_ADDRESS
Stack trace:
0: tvm::runtime::CUDAModuleNode::~CUDAModuleNode()
at /root/tenset/src/runtime/cuda/cuda_module.cc:61
1: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::CUDAModuleNode>::Deleter_(tvm::runtime::Object*)
at /root/tenset/include/tvm/runtime/memory.h:138
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
3: tvm::runtime::ObjectPtr<tvm::runtime::Object>::reset()
at /root/tenset/include/tvm/runtime/object.h:442
4: tvm::runtime::ObjectPtr<tvm::runtime::Object>::~ObjectPtr()
at /root/tenset/include/tvm/runtime/object.h:396
5: tvm::runtime::ObjectRef::~ObjectRef()
at /root/tenset/include/tvm/runtime/object.h:502
6: tvm::runtime::Module::~Module()
at /root/tenset/include/tvm/runtime/module.h:48
7: void std::_Destroy<tvm::runtime::Module>(tvm::runtime::Module*)
at /usr/include/c++/7/bits/stl_construct.h:98
8: void std::_Destroy_aux<false>::__destroy<tvm::runtime::Module*>(tvm::runtime::Module*, tvm::runtime::Module*)
at /usr/include/c++/7/bits/stl_construct.h:108
9: void std::_Destroy<tvm::runtime::Module*>(tvm::runtime::Module*, tvm::runtime::Module*)
at /usr/include/c++/7/bits/stl_construct.h:137
10: void std::_Destroy<tvm::runtime::Module*, tvm::runtime::Module>(tvm::runtime::Module*, tvm::runtime::Module*, std::allocator<tvm::runtime::Module>&)
at /usr/include/c++/7/bits/stl_construct.h:206
11: std::vector<tvm::runtime::Module, std::allocator<tvm::runtime::Module> >::~vector()
at /usr/include/c++/7/bits/stl_vector.h:434
12: tvm::runtime::ModuleNode::~ModuleNode()
at /root/tenset/include/tvm/runtime/module.h:114
13: tvm::runtime::LibraryModuleNode::~LibraryModuleNode()
at /root/tenset/src/runtime/library_module.cc:38
14: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::LibraryModuleNode>::Deleter_(tvm::runtime::Object*)
at /root/tenset/include/tvm/runtime/memory.h:138
15: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
16: tvm::runtime::ObjectPtr<tvm::runtime::Object>::reset()
at /root/tenset/include/tvm/runtime/object.h:442
17: tvm::runtime::ObjectPtr<tvm::runtime::Object>::~ObjectPtr()
at /root/tenset/include/tvm/runtime/object.h:396
18: ~<lambda>
at /root/tenset/src/runtime/library_module.cc:73
19: _M_destroy
at /usr/include/c++/7/bits/std_function.h:207
20: _M_manager
at /usr/include/c++/7/bits/std_function.h:231
21: std::_Function_base::~_Function_base()
at /usr/include/c++/7/bits/std_function.h:276
22: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::~function()
at /usr/include/c++/7/bits/std_function.h:389
23: tvm::runtime::PackedFunc::~PackedFunc()
at /root/tenset/include/tvm/runtime/packed_func.h:75
24: ~<lambda>
at /root/tenset/src/runtime/rpc/rpc_module.cc:370
25: _M_destroy
at /usr/include/c++/7/bits/std_function.h:207
26: _M_manager
at /usr/include/c++/7/bits/std_function.h:231
27: std::_Function_base::~_Function_base()
at /usr/include/c++/7/bits/std_function.h:276
28: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::~function()
at /usr/include/c++/7/bits/std_function.h:389
29: tvm::runtime::PackedFunc::~PackedFunc()
at /root/tenset/include/tvm/runtime/packed_func.h:75
30: TVMFuncFree
at /root/tenset/src/runtime/c_runtime_api.cc:463
31: ffi_call_unix64
32: ffi_call
33: _ctypes_callproc
34: 0x00007f939f8cc763
35: _PyObject_MakeTpCall
36: 0x000000000045c107
37: _PyFunction_Vectorcall
38: 0x0000000000535c2b
39: 0x00000000005fa0c5
40: PyObject_CallFinalizerFromDealloc
41: 0x0000000000535307
42: 0x00000000005ce6d4
43: 0x000000000052cd24
44: 0x00000000005d4882
45: 0x0000000000500cb4
46: _PyFunction_Vectorcall
47: PyVectorcall_Call
48: _PyEval_EvalFrameDefault
49: _PyEval_EvalCodeWithName
50: _PyFunction_Vectorcall
51: PyVectorcall_Call
52: _PyEval_EvalFrameDefault
53: _PyFunction_Vectorcall
54: 0x0000000000609bbf
55: 0x000000000045b756
56: _PyFunction_Vectorcall
57: 0x0000000000609bbf
58: 0x000000000045b756
59: _PyFunction_Vectorcall
60: 0x00000000005ce97f
61: PyVectorcall_Call
62: _PyEval_EvalFrameDefault
63: _PyObject_Call_Prepend
64: 0x0000000000537506
65: PyObject_Call
66: 0x00000000006390f7
67: 0x0000000000619066
68: start_thread
at /build/glibc-uZu3wS/glibc-2.27/nptl/pthread_create.c:463
69: __clone
70: 0xffffffffffffffff
*T*T*T*T*T*T*E*T*T*T*T*E*T*T*E*T*T*T*E*T*E*T*T*T*T*E*T*E*T*T*T*E*T*T*T*T*E*T*E*E*T*EException ignored in: <function NDArrayBase.__del__ at 0x7f92d8afc820>
Traceback (most recent call last):
File "/root/tenset/python/tvm/_ffi/_ctypes/ndarray.py", line 82, in __del__
check_call(_LIB.TVMArrayFree(self.handle))
File "/root/tenset/python/tvm/_ffi/base.py", line 346, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
43: 0xffffffffffffffff
42: __clone
41: start_thread
at /build/glibc-uZu3wS/glibc-2.27/nptl/pthread_create.c:463
40: 0x0000000000619066
39: 0x00000000006390f7
38: PyObject_Call
37: 0x0000000000537506
36: _PyObject_Call_Prepend
35: _PyEval_EvalFrameDefault
34: PyVectorcall_Call
33: 0x00000000005ce97f
32: _PyFunction_Vectorcall
31: 0x000000000045b756
30: 0x0000000000609bbf
29: _PyFunction_Vectorcall
28: 0x000000000045b756
27: 0x0000000000609bbf
26: _PyFunction_Vectorcall
25: _PyEval_EvalFrameDefault
24: PyVectorcall_Call
23: _PyFunction_Vectorcall
22: _PyEval_EvalCodeWithName
21: _PyEval_EvalFrameDefault
20: PyVectorcall_Call
19: _PyFunction_Vectorcall
18: 0x0000000000500cb4
17: 0x0000000000501310
16: 0x0000000000535307
15: PyObject_CallFinalizerFromDealloc
14: 0x00000000005fa0c5
13: 0x0000000000535c2b
12: _PyFunction_Vectorcall
11: 0x000000000045c107
10: _PyObject_MakeTpCall
9: 0x00007f939f8cc763
8: _ctypes_callproc
7: ffi_call
6: ffi_call_unix64
5: TVMArrayFree
at /root/tenset/src/runtime/ndarray.cc:295
4: _ZN3tvm7runtime7NDArray8Int
3: tvm::runtime::NDArray::FFIDecRef(DLTensor*)
at /root/tenset/include/tvm/runtime/ndarray.h:383
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
1: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
0: tvm::runtime::CUDADeviceAPI::FreeDataSpace(DLContext, void*)
at /root/tenset/src/runtime/cuda/cuda_device_api.cc:127
File "/root/tenset/src/runtime/cuda/cuda_device_api.cc", line 127
TVMError: ---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: misaligned address
Exception ignored in: <function NDArrayBase.__del__ at 0x7f92d8afc820>
Traceback (most recent call last):
File "/root/tenset/python/tvm/_ffi/_ctypes/ndarray.py", line 82, in __del__
check_call(_LIB.TVMArrayFree(self.handle))
File "/root/tenset/python/tvm/_ffi/base.py", line 346, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
43: 0xffffffffffffffff
42: __clone
41: start_thread
at /build/glibc-uZu3wS/glibc-2.27/nptl/pthread_create.c:463
40: 0x0000000000619066
39: 0x00000000006390f7
38: PyObject_Call
37: 0x0000000000537506
36: _PyObject_Call_Prepend
35: _PyEval_EvalFrameDefault
34: PyVectorcall_Call
33: 0x00000000005ce97f
32: _PyFunction_Vectorcall
31: 0x000000000045b756
30: 0x0000000000609bbf
29: _PyFunction_Vectorcall
28: 0x000000000045b756
27: 0x0000000000609bbf
26: _PyFunction_Vectorcall
25: _PyEval_EvalFrameDefault
24: PyVectorcall_Call
23: _PyFunction_Vectorcall
22: _PyEval_EvalCodeWithName
21: _PyEval_EvalFrameDefault
20: PyVectorcall_Call
19: _PyFunction_Vectorcall
18: 0x0000000000500cb4
17: 0x0000000000501310
16: 0x0000000000535307
15: PyObject_CallFinalizerFromDealloc
14: 0x00000000005fa0c5
13: 0x0000000000535c2b
12: _PyFunction_Vectorcall
11: 0x000000000045c107
10: _PyObject_MakeTpCall
9: 0x00007f939f8cc763
8: _ctypes_callproc
7: ffi_call
6: ffi_call_unix64
5: TVMArrayFree
at /root/tenset/src/runtime/ndarray.cc:295
4: _ZN3tvm7runtime7NDArray8Int
3: tvm::runtime::NDArray::FFIDecRef(DLTensor*)
at /root/tenset/include/tvm/runtime/ndarray.h:383
2: tvm::runtime::Object::DecRef()
at /root/tenset/include/tvm/runtime/object.h:781
1: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
0: tvm::runtime::CUDADeviceAPI::FreeDataSpace(DLContext, void*)
at /root/tenset/src/runtime/cuda/cuda_device_api.cc:127
File "/root/tenset/src/runtime/cuda/cuda_device_api.cc", line 127
TVMError: ---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
When I was dumping the network information of dcgan using the dump_network_info.py (I modified this file to only dump the information of dcgan), I got the following error:
Traceback (most recent call last):
File "dump_network_info.py", line 243, in
dump_network(key, target)
File "dump_network_info.py", line 122, in dump_network
mod, params, inputs = get_network_with_key(network_key)
File "dump_network_info.py", line 103, in get_network_with_key
mod, params = relay.testing.dcgan.get_workload(
File "/usr/tvm/python/tvm/relay/testing/dcgan.py", line 170, in get_workload
net = get_net(batch_size, random_len, oshape=oshape, ngf=ngf, layout=layout, dtype=dtype)
File "/usr/tvm/python/tvm/relay/testing/dcgan.py", line 87, in get_net
assert oshape[-1] == 64, "Only support 64x64 image"
AssertionError: Only support 64x64 image
Can I directly delete the assert statements in the file "/usr/tvm/python/tvm/relay/testing/dcgan.py"?
Thanks!
I'm trying to run dump_network_info on an A100 red hat. The script runs fine for all other networks, but causes a segmentation fault for bert_large.
Has anyone faced this before or do you have any idea as to why it might be happening?
Hello everyone.
We are interested in optimizing vision DNN models for Jetson devices, so we tried to use TenSet dataset for optimizing DNN models on Jetson Xavier NX.
With some modification to use auto_scheduler.RPCRunner
in tune_network.py
, we were able to tune networks on Jetson Xavier NX.
We evaluated models available in tune_network.py
with --n-trials 10000
, and we found that Ansor with pretrained model by TenSet founds good programs than Ansor without TenSet model in the first thousands of trials and final results are slightly better.
We expected that enabling --transfer-tune
option makes the result better, because --trasfer-tune
looks to improve cost models by using measured result on real device. However, when we tried --transfer-tune
option, it caused slower programs in all models. The following are the results:
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 7.21 | 10044 |
w/ transfer tune | 10.24 | 1692 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 15.56 | 10044 |
w/ transfer tune | 22.62 | 1692 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 2.49 | 10048 |
w/ transfer tune | 2.9 | 2048 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 3.06 | 10048 |
w/ transfer tune | 3.53 | 3328 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 35.32 | 10044 |
w/ transfer tune | 48.8 | 1692 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 15.61 | 10044 |
w/ transfer tune | 17.88 | 4604 |
compiler | execution time | # trials |
---|---|---|
w/o transfer tune | 29.08 | 10015 |
w/ transfer tune | 45.46 | 3487 |
We use the following commands for evaluation:
n_trials=10000
target="cuda -keys=cudagpu -arch=sm_72 -max_num_threads=1024 -max_threads_per_block=1024 -registers_per_block=65536 -shared_memory_per_block=49152 -thread_warp_size=32"
target_host="llvm -keys=arm_cpu -mtriple=aarch64-linux-gnu -mattr=+neon"
# w/o transfer tune
python3 tune_network.py --network ${model} --n-trials ${n_trials} ----cost-model xgb-no-update --load-model xgb.pkl --target "$target" --target-host "$target_host"
# w/ transfer tune
python3 tune_network.py --network ${model} --n-trials ${n_trials} ----cost-model xgb-no-update --transfer-tune --load-model xgb.pkl --target "$target" --target-host "$target_host"
For investigating the slower results of transfer tune, we read the related code to transfer tune option and found some seemingly strange points in its implementation:
Could you please tell me the intension of these implementation or how to improve the result of transfer tuning?
@merrymercy Hi~This is a great job!
When I collect data in my own environment(Tesla V100-SXM2-16GB) and encounter the error shown in the figure below, is this normal and how should I solve it?
When i enlarge the timeout like this,this error error_type:RunTimeoutError
still exists.
Looking forward your reply~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.