Coder Social home page Coder Social logo

bogdanraonic3 / convolutionalneuraloperator Goto Github PK

View Code? Open in Web Editor NEW
124.0 5.0 12.0 15.04 MB

This repository is the official implementation of the paper Convolutional Neural Operators for robust and accurate learning of PDEs

Home Page: https://arxiv.org/abs/2302.01178

Python 75.43% C++ 5.98% Cuda 18.59%
deep-learning ml4physics benchmark cno ml4science neural-operator neural-operators pytorch fno navier-stokes navier-stokes-equations partial-differential-equations pde pde-solver scientific-computing

convolutionalneuraloperator's Issues

Simple Code Snippets to load different datasets

Hi,
It would be great if you can provide code snippets to load datasets from the h5 file for all different datasets. (As it helps how the dataset is stored and how we can split it into train, test set etc)

Unable to Get How its Mesh Invariant??

Hi,
I went through the paper and codebase. I have doubts about how it is invariant to mesh size if we train on 48x48 then how to transfer the solution to 85x85 like superresolution in FNO??

Trouble reproducing results shown in the paper

Overview

For whatever reason, I can not seem to reproduce the results that are shown in the paper. I have a feeling that I am doing something specific that is messing up results as I can not get the same accuracy for UNet, FNO, nor CNO. I am using my own setup, so not a clone of the repo, but I am using the same models and setup, so I should get the same results. Unfortunately, my results for any of the models for the NS case is minimum 10% error, and not 3% error like shown in the paper. In any case, I will outline what exactly is my setup and if there is a glaring mistake, I would appreciate any feedback.

Data

Dataset: Navier Stokes 64x64 dataset. (Assume this is the case for everything below)

Data splits: I split up the data into training and testing like what you do :

# Import data
f = h5py.File('../NavierStokes_64x64_IN.h5', 'r')
x = []
y = []
for key in f.keys():
    x.append(f[key]['input'])
    y.append(f[key]['output'])
X = np.array(x)
y = np.array(y)

# Permute axis
X = X[:, np.newaxis, ...]
y = y[:, np.newaxis, ...]

# Split data up into train, val, test
X_train, y_train = X[:768], y[:768]
X_valid, y_valid = X[768:768 + 128], y[768:768 + 128]
X_test, y_test = X[768 + 128:768 + 128*2], y[768 + 128:768 + 128*2]

# Transform data (it seems you this this normalization based off your code)
min_data, max_data = np.min(X_train), np.max(X_train)
#min_model, max_model = np.min(y_train), np.max(y_train) (this is not used - I only apply the transformation on X)

class NormalizeMinMax(torch.nn.Module):
    def __init__(self, img_min, img_max):
        self.img_min = img_min
        self.img_max = img_max
        
    def __call__(self, img):
        new_img = (img - self.img_min) / (self.img_max - self.img_min)
        return new_img
    
transform = transforms.Compose(
    [
        NormalizeMinMax(min_data, max_data),
    ]
)

# Convert to tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)

X_valid = torch.tensor(X_valid, dtype=torch.float32)
y_valid = torch.tensor(y_valid, dtype=torch.float32)

X_train = transform(X_train)
X_valid = transform(X_valid)

Hyperparameters

Here, I use the models supplied in the repository with a PyTorch Lightning setup. Below, I will provide some psuedocode to explain the hyperparameter/optimization setups. I apply the LR scheduler per epoch so each epoch the LR becomes LR * 0.98.

loss_function = nn.L1Loss() # CNO and UNet
loss_function = nn.SmoothL1Loss() # FNO

batch_size = 32 # CNO and FNO
batch_size = 10 # UNet

optimizer = torch.optim.AdamW(self.parameters(), lr=1e-3, weight_decay=1e-6) # FNO
optimizer = torch.optim.AdamW(self.parameters(), lr=1e-3, weight_decay=1e-10) # CNO
optimizer = torch.optim.AdamW(self.parameters(), lr=5e-4, weight_decay=1e-6) # UNet

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.98) #unet, cno, and fno

Model Setup

Here are some code snippets that show how I initialize each of the models. Each of these are using the best performing hyperparameters you mention in the paper.

# CNO
class CNO2d(nn.Module):
    def __init__(self,
                in_dim = 1,                    # Number of input channels.
                out_dim = 1,                   # Number of input channels.
                size = 64,                      # Input and Output spatial size (required )
                N_layers = 3,                  # Number of (D) or (U) blocks in the network
                N_res = 1,                 # Number of (R) blocks per level (except the neck)
                N_res_neck = 8,            # Number of (R) blocks in the neck
                channel_multiplier = 32,   # How the number of channels evolve?
                use_bn = False,             # Add BN? We do not add BN in lifting/projection layer
                ):
# FNO
class FNO2d(nn.Module):
    def __init__(self, in_channels = 1, out_channels = 1, device=None):
        super(FNO2d, self).__init__()

        """
        The overall network. It contains 4 layers of the Fourier layer.
        1. Lift the input to the desire channel dimension by self.fc0 .
        2. 4 layers of the integral operators u' = (W + K)(u).
            W defined by self.w; K defined by self.conv .
        3. Project from the channel space to the output space by self.fc1 and self.fc2 .

        input: the solution of the coefficient function and locations (a(x, y), x, y)
        input shape: (batchsize, x=s, y=s, c=3)
        output: the solution
        output shape: (batchsize, x=s, y=s, c=1)
        """
        self.modes1 = 16 #16
        self.modes2 = 16 #16
        self.width = 128 #64
        self.n_layers = 5 #5
        self.retrain_fno = 4 #4
        self.padding = 0 #0
        self.include_grid = 1 #1
        self.input_dim = in_channels
        self.act  = nn.LeakyReLU() 
        self.device = device
 # UNet
 class UNet(nn.Module):
    def __init__(self, n_channels = 1, n_classes = 1, bilinear=False):
        super(UNet, self).__init__()
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.bilinear = bilinear

        self.inc = (DoubleConv(n_channels, 64))
        self.down1 = (Down(64, 128))
        self.down2 = (Down(128, 256))
        self.down3 = (Down(256, 512))
        factor = 2 if bilinear else 1
        self.down4 = (Down(512, 1024 // factor))
        self.up1 = (Up(1024, 512 // factor, bilinear))
        self.up2 = (Up(512, 256 // factor, bilinear))
        self.up3 = (Up(256, 128 // factor, bilinear))
        self.up4 = (Up(128, 64, bilinear))
        self.outc = (OutConv(64, n_classes))

Error Calculations

Finally, I compute the Relative L1 Error as mentioned in your paper. I Apply the below function for each sample in my test set via a simple enumerated for loop, iterating through y_predicted.

X_test = torch.tensor(X_test, dtype=torch.float32)
X_test = transform(X_test)
test_loader = DataLoader(list(X_test), shuffle=False, batch_size=1)
y_predicted = trainer.predict(network, test_loader)

def relative_l1_error(y_pred, y_true):
    return torch.mean(torch.abs(y_pred - y_true)) / np.mean(np.abs(y_true))

Results

Now, comes the questions. As explained above, I can not get results lower than 10% on any of the models. I am attempting to get the 2.5-3.5% Relative L1 errors outlined in the paper for all CNO, FNO, and UNet.

My training accuracy for the SmoothL1Loss goes down to 1e-5 and the validation accuracy goes down to 0.005; however, if I use the L1Loss, I can not seem to break 10% and the loss just seems to stagnate, even with the hyperparameters given.

All models were trained without early stopping and up to 1000 epochs, but most of their loss stopped converging so I stopped it much sooner.

Conclusion

If there is anything within this setup that is clearly an issue, I would appreciate the feedback. If my understanding of any of the results are also incorrect, that would be good to know as well. My main questions personally lie within the model setups as the code in the repo and the paper were quite different, so I was not sure which setup was intended. In this case, I assumed the paper to be the correct setup and followed each table regardless of the code in the repo. Thank you very much for taking the time to read this until now - I hope that the fix is quite simple and that it was just a simple mistake on my part - but regardless, I am interested to hear what you all say. Have a great rest of your day :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.