Comments (13)
OK, we locate the problem here. It seems that the lstm layer is use some AVX instructions. We will fix it in few days.
from paddle.
Before building paddle, "version GLIBC_2.14 no found" occured, so i update glibc from 2.12 to 2.14. Is this OK?
from paddle.
It's very strange that PaddlePaddle didn't print call stack. If you are convenient, can you rebuild PaddlePaddle with flag '-DCMAKE_BUILD_TYPE=Debug', and rerun this training? Or can you give us the core dump files?
And you can refer this link http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault
from paddle.
@reyoung
I0907 17:26:32.151026 1053 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=4 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0907 17:26:32.151208 1053 Util.cpp:113] Calling runInitFunctions
I0907 17:26:32.151401 1053 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 17:26:32,723 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 17:26:32,723 networks.py:1125] The output order is [cost_0]
I0907 17:26:32.740944 1053 Trainer.cpp:169] trainer mode: Normal
I0907 17:26:32.826501 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856484 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856694 1053 GradientMachine.cpp:134] Initing parameters..
I0907 17:26:33.070418 1053 GradientMachine.cpp:141] Init parameters done.
I0907 17:26:33.346114 1062 ThreadLocal.cpp:39] thread use undeterministic rand seed:1063
I0907 17:26:33.367995 1065 ThreadLocal.cpp:39] thread use undeterministic rand seed:1066
I0907 17:26:33.373780 1064 ThreadLocal.cpp:39] thread use undeterministic rand seed:1065
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473240393 (unix time) try "date -d @1473240393" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 1053 (TID 0x7f50fe12e700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f510f76c710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f510e8743d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f510f7649d1 start_thread
Current Layer forward/backward stack is
@ 0x7f510e0598fd clone
/data11/dis_ml/deeplearning/paddle/bin/paddle: line 46: 1053 Illegal instruction ${DEBUGGER}
from paddle.
@NIULQfromNJU Hello, it seems that PaddlePaddle use some CPU instructions that your CPU not support (AVX). Please try to rebuild your PaddlePaddle, disable the AVX support using
-DWITH_AVX=OFF
, and rebuild it. That will solve your problem.
There is a TODO in CMake file to automatically select AVX flag depends on machine CPU, but it is still not developed.
Please set -DCMAKE_BUILD_TYPE=Debug -DWITH_AVX=OFF
to rebuild PaddlePaddle, make sure there is no error. Then you can set -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_AVX=OFF
, and install it to train your model.
from paddle.
hi @reyoung , i rebuild the paddle with -DWITH_AVX=OFF, and then i run the quick start demo. But I have the same problem as before: LR, WE+LR, WE+CNN run successfully while WE+LSTM aborted. So strange! Is there any other instruction that is not supported by CPU in LSTM example?
The following is the error print:
I0907 20:30:21.711181 10069 Util.cpp:144] commandline: /data11/paddle/pd/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0907 20:30:21.711364 10069 Util.cpp:113] Calling runInitFunctions
I0907 20:30:21.711556 10069 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 20:30:22,156 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 20:30:22,157 networks.py:1129] The output order is [cost_0]
I0907 20:30:22.174654 10069 Trainer.cpp:169] trainer mode: Normal
I0907 20:30:22.262153 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288261 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288434 10069 GradientMachine.cpp:134] Initing parameters..
I0907 20:30:22.491011 10069 GradientMachine.cpp:141] Init parameters done.
I0907 20:30:22.681430 10100 ThreadLocal.cpp:39] thread use undeterministic rand seed:10101
I0907 20:30:22.683939 10101 ThreadLocal.cpp:39] thread use undeterministic rand seed:10102
I0907 20:30:22.699645 10098 ThreadLocal.cpp:39] thread use undeterministic rand seed:10099
I0907 20:30:22.701810 10099 ThreadLocal.cpp:39] thread use undeterministic rand seed:10100
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473251422 (unix time) try "date -d @1473251422" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 10069 (TID 0x7f92afa00700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f92c202d710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f92c11353d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f92c20259d1 start_thread
Current Layer forward/backward stack is
@ 0x7f92c091a8fd clone
/data11/paddle/pd/bin/paddle: line 46: 10069 Illegal instruction ${DEBUGGER}
from paddle.
@reyoung great!
from paddle.
@NIULQfromNJU Please give us your cpu info. just cat /proc/cpuinfo
from paddle.
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.24
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
from paddle.
@NIULQfromNJU the code, that will fix this error, is under review. #51
from paddle.
@NIULQfromNJU The fix code is merge into master branch. Please checkout and lstm should be ok now.
from paddle.
@reyoung Well done! Now updated paddle can run lstm successfully! thx~
from paddle.
@NIULQfromNJU You're welcome.
If there is anything I can help, don't hesitate to ask.
Thank you for your attention.
from paddle.
Related Issues (20)
- cuda12.2 安装paddlepaddle==2.61后,训练报“ Cannot use GPU because there is no GPU detected on your machine” HOT 2
- nn.functional.group_norm 在float16格式下,数据排布为NHWC且输入矩阵rank=3时报错FatalError: `Erroneous arithmetic operation` HOT 9
- API scaled_dot_product_attention 运行时出错 HOT 2
- 代码运行过程中报paddle的错误,且无对应的代码行定位,目前不清楚是哪里的问题
- 为 Paddle 框架 API 添加类型提示(Type Hints)Tracking Issue HOT 1
- test的时候报框架错误 HOT 2
- paddle多卡训练,显卡之间通信时间过长,显卡使用率低 HOT 2
- 2.5.1 adamw 自定义param_group Beta无效
- [Paddle Inference] reuse cache不生效 HOT 1
- SigmoidCrossEntropyWithLogitsInferMeta 对dims的check疑似与注释不匹配
- paddle是否支持在customdevice(比如mlu、npu)使用flashattention HOT 3
- 缺少torch.nn.utils.rnn.pad_sequence的API或者实现 HOT 2
- 在瑞芯微3568上已经部署好Fastdeploy。请问example中Yolov5的自己的onnx模型怎么使用呢? HOT 1
- paddlepaddle==2.6.0和2.6.1适配国产化与非国产化代码返回不一致问题 HOT 1
- Torch MultiHeadAttention To Paddle MultiHeadAttention Issue HOT 2
- softmax_with_cross_entropy API在软标签下label是否需要归一化 HOT 3
- 单机多卡问题 HOT 4
- Reported errors after running paddle.utils.run_check() HOT 1
- paddle.sparse.matmul两个参数都是sparse_csr_tensor时报错RuntimeError: (NotFound) The kernel `matmul_csr_csr` is not registered. HOT 2
- 这个教程代码有问题,https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/nlp/transformer_in_English-to-Spanish.html
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.