Coder Social home page Coder Social logo

Comments (5)

Sergio0694 avatar Sergio0694 commented on May 14, 2024 1

No worries, that was just a general question, I'm glad the CPU part worked great for you! 😄
Feel free to open other issues in the future should you find bugs or have other questions.
Have a nice day!

from neuralnetwork.net.

Sergio0694 avatar Sergio0694 commented on May 14, 2024

Hi, thank you for your kind words, happy to hear you like the library 😄
Numeric overflows can happen for a variety of reasons during training, and as you already noticed they depend on the settings that are used. For instance, one of the most common reasons is a learning rate that's too high, or an optimization method that doesn't work well with a specific network architecture.

Have you tried to lower your learning rate a bit? That should help.
Another trick that is sometimes used is to pre-train the network with a very low learning rate, so that it becomes more stable to high values in the gradient that may pop up, and then start the actual training. This is because using a high learning rate from the start can throw the network off while the weights are entirely randomized (as they are at the beginning).

Other possible causes are activation functions that don't work well with the specific network or dataset, or an incorrect initialization of the weights.
Take a look at this: https://stackoverflow.com/questions/33962226/common-causes-of-nans-during-training

I'm sorry that there isn't a single, wekk defined answer to this question 😕
Can you post the code you're using for the network? It might be useful to try to figure out what's going on.

from neuralnetwork.net.

Aangbaeck avatar Aangbaeck commented on May 14, 2024

Great suggestions. I think I will add a lower initial pretraining. From what i can see that solves the issue. Is this intial unstableness something that is considered and mitigated in other frameworks? I know it's thing, i have just never experienced it in Keras for example.

The code is just a simple 16 byte array as features and a binary output. I couldn't use a convnet that is 1 dimensional so i used a linear input and 2 fully connected layers. I use Adam as a loss function. It's quite basic, and your library is the perfect keras-like abstraction that solves this problem for me.

from neuralnetwork.net.

Sergio0694 avatar Sergio0694 commented on May 14, 2024

Yeah, using a small pretraining is often enough, assuming the rest of the parameters are fine.
I can confirm that the issue is present in other frameworks as well (as it's just a mathematical consequence of how the training is done, it's not an implementation detail), for instance it happened quite a lot with TensorFlow in some situations.

If you've never experienced it in Keras, my best guess would be that it's using some more conservative settings by default (since Keras is arguably a pretty entry level framework, compared to TensorFlow and others, so it would make sense). Or, it could also be doing an automatic initial mini training like you're doing now, or some other workaround like that.

Other than this, let me know if using the pretraining method allowed you to train the network correctly!

EDIT: just to get some general feedback, are you using the CPU or GPU APIs? If you're running the network on GPU, have you had trouble setting the library up to run correctly?

from neuralnetwork.net.

Aangbaeck avatar Aangbaeck commented on May 14, 2024

I have note used the GPU settings yet. I need a very high frequent evaluation (maybe about 5000 forward passes a second) and i think the overhead of the GPU won't benefit me. But for harder problems it's definitively worth it. But i have not tried yet so i don't really know. I will test it in the future when i get some more time. Will post here then.

from neuralnetwork.net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.