Coder Social home page Coder Social logo

pankajmehar / pycrop-yield-prediction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gabrieltseng/pycrop-yield-prediction

0.0 2.0 0.0 5.58 MB

A PyTorch Implementation of Jiaxuan You's Deep Gaussian Process for Crop Yield Prediction

License: MIT License

Python 100.00%

pycrop-yield-prediction's Introduction

PyCrop Yield Prediction

A PyTorch implementation of Jiaxuan You's 2017 Crop Yield Prediction Project.

Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data

This paper won the Food Security Category from the World Bank's 2017 Big Data Innovation Challenge.

Introduction

This repo contains a PyTorch implementation of the Deep Gaussian Process for Crop Yield Prediction. It draws from the original TensorFlow implementation.

Deep Gaussian Processes combine the expressivity of Deep Neural Networks with Gaussian Processes' ability to leverage spatial and temporal correlations between data points.

In this pipeline, a Deep Gaussian Process is used to predict soybean yields in US counties.

Results

These results were generated using early stopping with a patience of 10. They can be replicated by running the pipeline with all the default arguments.

  • A comparison of RMSE of the two models, with and without the Gaussian Process. As in the original paper, this was generated by averaging the results of two runs, to account for random initialization in the neural network:
Year LSTM LSTM + GP CNN CNN + GP
2009 5.18 6.37 6.07 5.56
2010 7.27 7.30 6.75 7.03
2011 6.82 6.72 6.77 6.40
2012 7.01 6.46 5.91 5.72
2013 5.91 5.83 6.41 6.00
2014 5.99 4.65 5.28 4.87
2015 6.14 5.13 6.18 5.36
  • A plot of errors of the CNN model for the year 2014, with and without the Gaussian Process. The color represents prediction error, in bushel per acre:

CNN errors

Pipeline

The main entrypoint into the pipeline is run.py. The pipeline is split into 4 major components. Note that each component reads files from the previous step, and saves all files that later steps will need, into the data folder.

Parameters which can be passed in each step are documented in run.py. The default parameters are all taken from the original repository.

Python Fire is used to generate command line interfaces.

Exporting

python run.py export

Exports data from the Google Earth Engine to Google Drive. Note that to make the export more efficient, all the bands from a county - across all the export years - are concatenated, reducing the number of files to be exported.

To download the data used in the paper (MODIS images of the top 11 soybean producing states in the US) requires just over 110 Gb of storage. This can be done in steps - the export class allows for checkpointing.

Preprocessing

python run.py process

Takes the exported and downloaded data, and splits the data by year. In addition, the temperature and reflection tif files are merged, and the mask is applied so only farmland is considered. Files are saved as .npy files.

The size of the processed files is 97 GB. Running with the flag delete_when_done=True will delete the .tif files as they get processed.

Feature Engineering

python run.py engineer

Take the processed .npy files and generate histogams which can be input into the models.

Model training

python run.py train_cnn

and

python run.py train_rnn

Trains CNN and RNN models, respectively, with a Gaussian Process. The trained models are saved in data/models/<model_type> and results are saved in csv files in those folders. If a Gaussian Process is used, the results of the model without a Gaussian Process are also saved for analysis.

Setup

Anaconda running python 3.7 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from this directory) run

conda env create -f environment.yml

This will create an environment named crop_yield_prediction with all the necessary packages to run the code. To activate this environment, run

conda activate crop_yield_prediction

Running this code also requires you to sign up to Earth Engine. Once you have done so, active the crop_yield_prediction environment and run

earthengine authenticate

and follow the instructions. To test that everything has worked, run

python -c "import ee; ee.Initialize()"

Note that Earth Engine exports files to Google Drive by default (to the same google account used sign up to Earth Engine.)

pycrop-yield-prediction's People

Contributors

gabrieltseng avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.