TrainingStopReason.NumericOverflow. It happens quite a lot but the settings have a hug

Getting alot of NumericOverflow. about neuralnetwork.net HOT 5 CLOSED

sergio0694 commented on May 14, 2024

Getting alot of NumericOverflow.

from neuralnetwork.net.

Comments (5)

Sergio0694 commented on May 14, 2024 1

No worries, that was just a general question, I'm glad the CPU part worked great for you! 😄
Feel free to open other issues in the future should you find bugs or have other questions.
Have a nice day!

from neuralnetwork.net.

Sergio0694 commented on May 14, 2024

Hi, thank you for your kind words, happy to hear you like the library 😄
Numeric overflows can happen for a variety of reasons during training, and as you already noticed they depend on the settings that are used. For instance, one of the most common reasons is a learning rate that's too high, or an optimization method that doesn't work well with a specific network architecture.

Have you tried to lower your learning rate a bit? That should help.
Another trick that is sometimes used is to pre-train the network with a very low learning rate, so that it becomes more stable to high values in the gradient that may pop up, and then start the actual training. This is because using a high learning rate from the start can throw the network off while the weights are entirely randomized (as they are at the beginning).

Other possible causes are activation functions that don't work well with the specific network or dataset, or an incorrect initialization of the weights.
Take a look at this: https://stackoverflow.com/questions/33962226/common-causes-of-nans-during-training

I'm sorry that there isn't a single, wekk defined answer to this question 😕
Can you post the code you're using for the network? It might be useful to try to figure out what's going on.

from neuralnetwork.net.

Aangbaeck commented on May 14, 2024

Great suggestions. I think I will add a lower initial pretraining. From what i can see that solves the issue. Is this intial unstableness something that is considered and mitigated in other frameworks? I know it's thing, i have just never experienced it in Keras for example.

The code is just a simple 16 byte array as features and a binary output. I couldn't use a convnet that is 1 dimensional so i used a linear input and 2 fully connected layers. I use Adam as a loss function. It's quite basic, and your library is the perfect keras-like abstraction that solves this problem for me.

from neuralnetwork.net.

Sergio0694 commented on May 14, 2024

Yeah, using a small pretraining is often enough, assuming the rest of the parameters are fine.
I can confirm that the issue is present in other frameworks as well (as it's just a mathematical consequence of how the training is done, it's not an implementation detail), for instance it happened quite a lot with TensorFlow in some situations.

If you've never experienced it in Keras, my best guess would be that it's using some more conservative settings by default (since Keras is arguably a pretty entry level framework, compared to TensorFlow and others, so it would make sense). Or, it could also be doing an automatic initial mini training like you're doing now, or some other workaround like that.

Other than this, let me know if using the pretraining method allowed you to train the network correctly!

EDIT: just to get some general feedback, are you using the CPU or GPU APIs? If you're running the network on GPU, have you had trouble setting the library up to run correctly?

from neuralnetwork.net.

Aangbaeck commented on May 14, 2024

I have note used the GPU settings yet. I need a very high frequent evaluation (maybe about 5000 forward passes a second) and i think the overhead of the GPU won't benefit me. But for harder problems it's definitively worth it. But i have not tried yet so i don't really know. I will test it in the future when i get some more time. Will post here then.

from neuralnetwork.net.

Getting alot of NumericOverflow. about neuralnetwork.net HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent