发生在运行 10分钟快速上手使用PaddleX——MobileNetV3_ssld图像分类
https://aistudio.baidu.com/aistudio/projectdetail/439860
前面数据集导入在本地跑都没问题。
但是运行到 model.train 报错了,谢谢各位大佬给看看报错信息,并给出解答,谢谢!
睡了,晚安,各位
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_large_ssld(num_classes=num_classes)
model.train(num_epochs=12,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[6, 8],
save_interval_epochs=1,
learning_rate=0.00625,
save_dir='output/mobilenetv3_large_ssld',
use_vdl=True)
2020-06-19 15:58:21,935-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!
Downloading MobileNetV3_large_x1_0_ssld_pretrained.tar
[==================================================] 100.00%
Uncompress /home/wsuser/.paddlehub/tmp/tmpx64j5fu9/MobileNetV3_large_x1_0_ssld_pretrained.tar
[==================================================] 100.00%
2020-06-19 15:58:34 [INFO] Load pretrain weights from output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld.
2020-06-19 15:58:34 [WARNING] [SKIP] Shape of pretrained weight output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld/fc_weights doesn't match.(Pretrained: (1280, 1000), Actual: (1280, 6))
2020-06-19 15:58:34 [WARNING] [SKIP] Shape of pretrained weight output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld/fc_offset doesn't match.(Pretrained: (1000,), Actual: (6,))
2020-06-19 15:58:34,387-WARNING: output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-06-19 15:58:34 [INFO] There are 268 varaibles in output/mobilenetv3_large_ssld/pretrain/MobileNetV3_large_x1_0_ssld are loaded.
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
EnforceNotMet Traceback (most recent call last)
in
9 learning_rate=0.00625,
10 save_dir='output/mobilenetv3_large_ssld',
---> 11 use_vdl=True)
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddlex/cv/models/classifier.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, optimizer, learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs, lr_decay_gamma, use_vdl, sensitivities_file, eval_metric_loss, early_stop, early_stop_patience, resume_checkpoint)
201 use_vdl=use_vdl,
202 early_stop=early_stop,
--> 203 early_stop_patience=early_stop_patience)
204
205 def evaluate(self,
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddlex/cv/models/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, use_vdl, early_stop, early_stop_patience)
454 self.parallel_train_prog,
455 feed=data,
--> 456 fetch_list=list(self.train_outputs.values()))
457 outputs_avg = np.mean(np.array(outputs), axis=1)
458 records.append(outputs_avg)
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1069 warnings.warn(
1070 "The following exception is not an EOF exception.")
-> 1071 six.reraise(*sys.exc_info())
1072
1073 def _run_impl(self, program, feed, fetch_list, feed_var_name,
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
691 return getattr(self, assertNotRegex)(*args, **kwargs)
692
--> 693
694 if PY3:
695 exec = getattr(moves.builtins, "exec")
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1064 use_program_cache=use_program_cache,
1065 use_prune=use_prune,
-> 1066 return_merged=return_merged)
1067 except Exception as e:
1068 if not isinstance(e, core.EOFException):
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1154 use_program_cache=use_program_cache)
1155
-> 1156 program._compile(scope, self.place)
1157 if program._is_inference:
1158 return self._run_inference(program._executor, feed)
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/compiler.py in _compile(self, scope, place)
441 use_cuda=isinstance(self._place, core.CUDAPlace),
442 scope=self._scope,
--> 443 places=self._places)
444 return self
445
/opt/conda/envs/Python-3.6-CUDA/lib/python3.6/site-packages/paddle/fluid/compiler.py in _compile_data_parallel(self, places, use_cuda, scope)
394 cpt.to_text(self._loss_name)
395 if self._loss_name else six.u(''), self._scope, self._local_scopes,
--> 396 self._exec_strategy, self._build_strategy, self._graph)
397
398 def _compile_inference(self):
EnforceNotMet:
C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::dynload::GetNCCLDsoHandle()
3 void std::__once_call_impl<std::_Bind_simple<paddle::platform::dynload::DynLoad__ncclCommInitAll::operator()<ncclComm**, int, int*>(ncclComm**, int, int*)::{lambda()#1} ()> >()
4 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long)
5 paddle::platform::NCCLCommunicator::InitFlatCtxs(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<ncclUniqueId*, std::allocator<ncclUniqueId*> > const&, unsigned long, unsigned long)
6 paddle::framework::ParallelExecutorPrivate::InitNCCLCtxs(paddle::framework::Scope*, paddle::framework::details::BuildStrategy const&)
7 paddle::framework::ParallelExecutorPrivate::InitOrGetNCCLCommunicator(paddle::framework::Scope*, paddle::framework::details::BuildStrategy*)
8 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)
Error Message Summary:
Error: Failed to find dynamic library: libnccl.so ( libnccl.so: cannot open shared object file: No such file or directory )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled. at (/paddle/paddle/fluid/platform/dynload/dynamic_loader.cc:177)