Coder Social home page Coder Social logo

Comments (7)

MaxGhi8 avatar MaxGhi8 commented on September 27, 2024

Hi @brfi3983,
I'm not an expert but I can try to help you to finding the bug...

  • The first thing that I see is the scheduler... Personally I can try to increase the step_size (maybe 10 can be a better value respect to 1).

  • In the test loader can be helpful to increase the batch_size (maybe 32, but can depends on the specs of your computer), but this does not affect your results only the computation time.

Let me know if this can be helpful for you.

from convolutionalneuraloperator.

brfi3983 avatar brfi3983 commented on September 27, 2024

Hi @MaxGhi8 ,

Thanks for the response.

  1. So, I am following the paper where they mention that the learning rate is multiplied by a factor $\gamma$ every epoch. In that sense, I use $\gamma=0.98$ and as I am using PyTorch Lightning, the way the scheduler is configured is for every 1 epoch. I can configure it too look more like their code and try a step_size of 10 or 15, but I am just trying to follow their best hyperparameter table to reproduce errors of ~3% on NS.
  2. Batch size of 1 is just for convenience as my script iterates through it - but as you said, it does not affect results - so I am more concerned about my setup for training.

Are you able to get 3% on any of the models with the NS dataset (1024 samples)? If so, would you mind sharing your hyperparameter setup?

Thanks again!

from convolutionalneuraloperator.

MaxGhi8 avatar MaxGhi8 commented on September 27, 2024

Hi @brfi3983, I agree with your hyper-parameters setup... The only things that seems to be different is the normalization of the output functions. It seems to be that you not perform any normalization on the output, but the output perform min-max scaling even on the output. Let me know if this resolve your issue...

from convolutionalneuraloperator.

bogdanraonic3 avatar bogdanraonic3 commented on September 27, 2024

I agree with what @MaxGhi8 mentioned and have nothing more to add. Perhaps you could try using the exact loaders and evaluation metrics that we use. I don't see any issues with your code. Please let me know if @MaxGhi8's suggestions were helpful.

from convolutionalneuraloperator.

brfi3983 avatar brfi3983 commented on September 27, 2024

Hi @MaxGhi8 and @bogdanraonic3,

So I was initially confused by the suggestion about the scaling as I had indeed scaled my test set from the parameters in the training set. However, after making a reply draft to this suggestion, I realized what you meant and I think that might have been the issue.

More specifically, I believe you were referring to this line:

labels = (labels - self.min_model)/(self.max_model - self.min_model)

I then reran my model and am now getting about a 4% relative-L1 test error (after 200 epochs and some slight overfitting), which seems to be close enough to the 3.57% error shown in the paper (for the FNO example, at least).

To ask a follow-up question, I was always used to scaling my input variables for better optimization purposes but had the understanding that it was not really necessary to also scale the labels since, if the model learned properly, it is just mapping to a different scale of outputs, regardless of if the labels are in one unit/scale or another. Is there an intuitive explanation for this? Is there a reason why this would make such a massive difference? My intuition wants to say that by scaling the labels, you are changing the underlying manifold that the network has to map onto, and so, might make it easier to find optimums, but this is just a guess. Is scaling the labels a common practice in CV/SciML or is it a new paradigm with NOs because of some specific model assumption?

Thanks again for all the support!

from convolutionalneuraloperator.

MaxGhi8 avatar MaxGhi8 commented on September 27, 2024

Hi,
I'm thinking exactly of the line that you have marked!
I have the same intuition that you have about scaling the data. Indeed if you think to linear regression and suppose that you have two inputs, with one feature scaling in [0, 1000] and the other one scaling in [0, 0.001], then the learning can be hard because data are really compressed in one direction and small changes in network's parameters can drastically change results.
With similar reasoning you have to scale even the output. I think that the network’s parameters are initialized to have mean equal to zero, so if your labels have values in [100, 1000] the gradients can be huge, so you have to normalize even the labels. Even if you plot the loss function during the training I expect that without labels normalization you have a very irregular shape, instead with normalization the shape is more regular and without high jumps.
I always see this practice in the context of NOs, but even in standard ML or Deep Learning I think it is a good practice.

Of course the normalization parameters (min-max or mean-std) have to be computed in the train-set and use the same value for the test-set (as the authors do in this repository).

As a final rule of thumbs I encourage you to give a lot of importance even to the data and not only to the model, this can be a real time-saving practice.

Let me know if this answers your question and if you have any other curiosity or questions.
Have a nice day!

from convolutionalneuraloperator.

brfi3983 avatar brfi3983 commented on September 27, 2024

Hi,

Yes, I was able to play around with the normalization on the input as well as output and see its effects. The gradient propagation also makes sense - I think I was previously used to classification problems so that is why I had that preconceived notation about not normalizing the labels.

In any case, I was able to test out different normalization combinations (channel-wise, assuming the min was 0 or the mean was 0 and changing to standardization, and all seems to be working as intended. As such, I will be closing this issue as I feel satisfied and appreciate all the feedback and help received :)

Have a good day too!

from convolutionalneuraloperator.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.