Coder Social home page Coder Social logo

LSTM running error ! about paddle HOT 13 CLOSED

paddlepaddle avatar paddlepaddle commented on May 4, 2024
LSTM running error !

from paddle.

Comments (13)

reyoung avatar reyoung commented on May 4, 2024 1

OK, we locate the problem here. It seems that the lstm layer is use some AVX instructions. We will fix it in few days.

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

Before building paddle, "version GLIBC_2.14 no found" occured, so i update glibc from 2.12 to 2.14. Is this OK?

from paddle.

reyoung avatar reyoung commented on May 4, 2024

It's very strange that PaddlePaddle didn't print call stack. If you are convenient, can you rebuild PaddlePaddle with flag '-DCMAKE_BUILD_TYPE=Debug', and rerun this training? Or can you give us the core dump files?

And you can refer this link http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

@reyoung
I0907 17:26:32.151026 1053 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=4 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0907 17:26:32.151208 1053 Util.cpp:113] Calling runInitFunctions
I0907 17:26:32.151401 1053 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 17:26:32,723 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 17:26:32,723 networks.py:1125] The output order is [cost_0]
I0907 17:26:32.740944 1053 Trainer.cpp:169] trainer mode: Normal
I0907 17:26:32.826501 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856484 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856694 1053 GradientMachine.cpp:134] Initing parameters..
I0907 17:26:33.070418 1053 GradientMachine.cpp:141] Init parameters done.
I0907 17:26:33.346114 1062 ThreadLocal.cpp:39] thread use undeterministic rand seed:1063
I0907 17:26:33.367995 1065 ThreadLocal.cpp:39] thread use undeterministic rand seed:1066
I0907 17:26:33.373780 1064 ThreadLocal.cpp:39] thread use undeterministic rand seed:1065
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473240393 (unix time) try "date -d @1473240393" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 1053 (TID 0x7f50fe12e700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f510f76c710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f510e8743d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f510f7649d1 start_thread
Current Layer forward/backward stack is
@ 0x7f510e0598fd clone
/data11/dis_ml/deeplearning/paddle/bin/paddle: line 46: 1053 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

from paddle.

reyoung avatar reyoung commented on May 4, 2024

@NIULQfromNJU Hello, it seems that PaddlePaddle use some CPU instructions that your CPU not support (AVX). Please try to rebuild your PaddlePaddle, disable the AVX support using
-DWITH_AVX=OFF, and rebuild it. That will solve your problem.

There is a TODO in CMake file to automatically select AVX flag depends on machine CPU, but it is still not developed.

Please set -DCMAKE_BUILD_TYPE=Debug -DWITH_AVX=OFF to rebuild PaddlePaddle, make sure there is no error. Then you can set -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_AVX=OFF, and install it to train your model.

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

hi @reyoung , i rebuild the paddle with -DWITH_AVX=OFF, and then i run the quick start demo. But I have the same problem as before: LR, WE+LR, WE+CNN run successfully while WE+LSTM aborted. So strange! Is there any other instruction that is not supported by CPU in LSTM example?
The following is the error print:
I0907 20:30:21.711181 10069 Util.cpp:144] commandline: /data11/paddle/pd/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0907 20:30:21.711364 10069 Util.cpp:113] Calling runInitFunctions
I0907 20:30:21.711556 10069 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 20:30:22,156 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 20:30:22,157 networks.py:1129] The output order is [cost_0]
I0907 20:30:22.174654 10069 Trainer.cpp:169] trainer mode: Normal
I0907 20:30:22.262153 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288261 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288434 10069 GradientMachine.cpp:134] Initing parameters..
I0907 20:30:22.491011 10069 GradientMachine.cpp:141] Init parameters done.
I0907 20:30:22.681430 10100 ThreadLocal.cpp:39] thread use undeterministic rand seed:10101
I0907 20:30:22.683939 10101 ThreadLocal.cpp:39] thread use undeterministic rand seed:10102
I0907 20:30:22.699645 10098 ThreadLocal.cpp:39] thread use undeterministic rand seed:10099
I0907 20:30:22.701810 10099 ThreadLocal.cpp:39] thread use undeterministic rand seed:10100
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473251422 (unix time) try "date -d @1473251422" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 10069 (TID 0x7f92afa00700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f92c202d710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f92c11353d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f92c20259d1 start_thread
Current Layer forward/backward stack is
@ 0x7f92c091a8fd clone
/data11/paddle/pd/bin/paddle: line 46: 10069 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

@reyoung great!

from paddle.

reyoung avatar reyoung commented on May 4, 2024

@NIULQfromNJU Please give us your cpu info. just cat /proc/cpuinfo

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

@reyoung

processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.24
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

from paddle.

reyoung avatar reyoung commented on May 4, 2024

@NIULQfromNJU the code, that will fix this error, is under review. #51

from paddle.

reyoung avatar reyoung commented on May 4, 2024

@NIULQfromNJU The fix code is merge into master branch. Please checkout and lstm should be ok now.

from paddle.

lqniunjunlper avatar lqniunjunlper commented on May 4, 2024

@reyoung Well done! Now updated paddle can run lstm successfully! thx~

from paddle.

reyoung avatar reyoung commented on May 4, 2024

@NIULQfromNJU You're welcome.

If there is anything I can help, don't hesitate to ask.

Thank you for your attention.

from paddle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.