Coder Social home page Coder Social logo

adebowaledaniel / openmapflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nasaharvest/openmapflow

0.0 0.0 0.0 180.56 MB

Rapid map creation with machine learning and earth observation data.

License: Apache License 2.0

Shell 2.44% Python 74.74% Jupyter Notebook 22.45% Dockerfile 0.37%

openmapflow's Introduction

OpenMapFlow ๐ŸŒ

CI Status Docker Status tb1 db1 tb2 db2 tb3 db3

Rapid map creation with machine learning and earth observation data.

Example projects: Cropland, Buildings, Maize

3maps-gif

Tutorial cb

Colab notebook tutorial demonstrating data exploration, model training, and inference over small region. (video)

Prerequisites:

Creating a map from scratch

To create your own maps with OpenMapFlow, you need to

  1. Generate your own OpenMapFlow project, this will allow you to:
  2. Add your own labeled data
  3. Train a model using that labeled data, and
  4. Create a map using the trained model.

openmapflow-pipeline

Generating a project cb

A project can be generated by either following the below documentation OR running the above Colab notebook.

Prerequisites:

Once all prerequisites are satisfied, inside your Github repository run:

pip install openmapflow
openmapflow generate

The command will prompt for project configuration such as project name and Google Cloud Project ID. Several prompts will have defaults shown in square brackets. These will be used if nothing is entered.

After all configuration is set, the following project structure will be generated:

<YOUR PROJECT NAME>
โ”‚   README.md
โ”‚   datasets.py             # Dataset definitions (how labels should be processed)
โ”‚   evaluate.py             # Template script for evaluating a model
โ”‚   openmapflow.yaml        # Project configuration file
โ”‚   train.py                # Template script for training a model
โ”‚   
โ””โ”€โ”€โ”€ .dvc/                  # https://dvc.org/doc/user-guide/what-is-dvc
โ”‚       
โ””โ”€โ”€โ”€ .github
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€ workflows          # Github actions
โ”‚       โ”‚   deploy.yaml     # Automated Google Cloud deployment of trained models
โ”‚       โ”‚   test.yaml       # Automated integration tests of labeled data
โ”‚       
โ””โ”€โ”€โ”€ data
    โ”‚   raw_labels/                     # User added labels
    โ”‚   datasets/                       # ML ready datasets (labels + earth observation data)
    โ”‚   models/                         # Models trained using datasets
    |   raw_labels.dvc                  # Reference to a version of raw_labels/
    |   datasets.dvc                    # Reference to a version of datasets/
    โ”‚   models.dvc                      # Reference to a version of models/
    

Github Actions Secrets When code is pushed to the repository a Github action will be run to verify project configuration, data integrity, and script functionality. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data, add a new repository secret (instructions).

  • In step 5 of the instructions, name the secret: GDRIVE_CREDENTIALS_DATA
  • In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)

When a new model is pushed to the repository a Github action will be run to deploy this model to Google Cloud. To allow the Github action to access Google Cloud add a new repository secret (instructions).

  • In step 5 of the instructions, name the secret: GCP_SA_KEY
  • In step 6, enter your Google Cloud Service Account Key

After this the Github actions should successfully run.

GCloud Bucket: A Google Cloud bucket must be created for the labeled earth observation files. Assuming gcloud is installed run:

gcloud auth login
gsutil mb -l <YOUR_OPENMAPFLOW_YAML_GCLOUD_LOCATION> gs://<YOUR_OPENMAPFLOW_YAML_BUCKET_LABELED_EO>

Adding data

Adding already existing data

Prerequisites:

Add reference to already existing dataset in your datasets.py:

from openmapflow.datasets import geowiki_landcover_2017, togo_crop_2019

datasets = [geowiki_landcover_2017, togo_crop_2019]

Download and push datasets

openmapflow create-dataset  # Download datasets
dvc commit && dvc push      # Push data to version control

git add .
git commit -m'Created new dataset'
git push

Adding custom data cb

Data can be added by either following the below documentation OR running the above Colab notebook.

Prerequisites:

Move raw labels into project:

export RAW_LABEL_DIR=$(openmapflow datapath RAW_LABELS)
mkdir RAW_LABEL_DIR/<my dataset name>
cp -r <path to my raw data files> RAW_LABEL_DIR/<my dataset name>

Add reference to data using a CustomLabeledDataset object in datasets.py, example:

datasets = [
    CustomLabeledDataset(
        dataset="example_dataset",
        country="Togo",
        raw_labels=(
            RawLabels(
                filename="Togo_2019.csv",
                longitude_col="longitude",
                latitude_col="latitude",
                class_prob=lambda df: df["crop"],
                start_year=2019,
            ),
        ),
    ),
    ...
]

Run dataset creation:

earthengine authenticate    # For getting new earth observation data
gcloud auth login           # For getting cached earth observation data

openmapflow create-dataset  # Initiatiates or checks progress of dataset creation

dvc commit && dvc push      # Push new data to data version control

git add .
git commit -m'Created new dataset'
git push

Training a model cb

A model can be trained by either following the below documentation OR running the above Colab notebook.

Prerequisites:

# Pull in latest data
dvc pull

# Set model name, train model, record test metrics
export MODEL_NAME=<YOUR MODEL NAME>              
python train.py --model_name $MODEL_NAME    
python evaluate.py --model_name $MODEL_NAME 

# Push new models to data version control
dvc commit 
dvc push  

# Make a Pull Request to the repository
git checkout -b"$MODEL_NAME"
git add .
git commit -m "$MODEL_NAME"
git push --set-upstream origin "$MODEL_NAME"

Now after merging the pull request, the model will be deployed to Google Cloud.

Creating a map cb

Prerequisites:

Only available through above Colab notebook. Cloud Architecture must be deployed using the deploy.yaml Github Action.

Accessing existing datasets

from openmapflow.datasets import togo_crop_2019
df = togo_crop_2019.load_df()
x = togo_crop_2019.iloc[0]["eo_data"]
y = togo_crop_2019.iloc[0]["class_prob"]

openmapflow's People

Contributors

ivanzvonkov avatar gabrieltseng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.