Coder Social home page Coder Social logo

fourcastnext's Introduction

FourCastNeXt

Overview

This repo contains scripts to perform FourCastNeXt training and inference using ERA5 from NCI project rt52.

For technical details in the model and training methods, please refer to the preprint.

Citation

@article{guo2024fourcastnext,
  title={FourCastNeXt: Improving FourCastNet Training with Limited Compute},
  author={Edison Guo and Maruf Ahmed and Yue Sun and Rahul Mahendru and Rui Yang and Harrison Cook and Tennessee Leeuwenburg and Ben Evans},
  journal={arXiv preprint arXiv:2401.05584},
  year={2024}
}

Setup

  • Ask to join NCI project rt52 on mancini.

  • Run bash setup.sh to set up the environment. This script sets up a Python virtualenv with all the dependencies. The virtualenv directory python_env is in the same directory as setup.sh.

  • The entrypoint of training is run_trainer.pbs. The inference script is run_inference.pbs. Before you run these scripts, please open them in an text editor and fill in <your NCI project> for run_trainer.pbs, and <output path> and <checkpoint path> for run_inference.pbs.

Training cluster

run_trainer.pbs sets up a training cluster. The training cluster consists of a GPU cluster for Distributed Data Parallel (DDP) training and a ray cluster for data loading. The ray cluster uses the current GPU node as the coordinator and launches three separate CPU Gadi jobs for the data workers. The data workers will join the ray cluster as soon as the CPU Gadi jobs start. The data workers will be automatically shut down when the ray coordinator is being shut down.

fourcastnext's People

Contributors

edisonguo avatar rahulmahendru avatar hcookie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.