Coder Social home page Coder Social logo

adria-synthetic-data's Introduction

ADRIA-synthetic-data

Repository for the creation of synthetic input data layers for ADRIA.

Set-up

Create the environment by running,

conda env create -f ADRIA_synth_data_env.yml

This environment can then be selected in your Python editor of choice.

Add the original data package you want to create synthetic data off to the original_data folder.

Creating site data

Synthetic site data can be generated from the site-data-generation.py file in the examples folder. Add the name of your chosen original data package at the top of the file as orig_data_package = "name of file". Adjust the parameters N1, N2 and N3 as desired also. N1 is the number of unconditionalised samples to generate. N2 is the final number of spatially conditionalised sites to generate. N3 is the number of nodes to generate the final site positions in randomised radii around. Using the site data model automatically creates the synthetic data package and the package name will be given in the modal outputs as synth_site_data_fn.

Creating initial coral cover data

Synthetic initial coal cover data can be generated from the coral-cover-generation.py file in the examples folder. Add the name of your chosen original data package at the top of the file as orig_data_package = "name of file" and the name of the synthetic site data file you want to base the cover data on: e.g. root_site_data_synth = "synth_2023-7-24_152038.csv".

Creating environmental data

Synthetic wave and DHW data can be generated from the env-data-generation.py file in the examples folder. Several inputs at the beginning of the file can be changed to adjust the output of the data model. An example is shown below:

layer = "Ub"
rcp = "45"
root_original_file = "name of file"
root_site_data_synth = "synth_2023-7-24_152038.csv"
nsamples = 10
nreplicates = 5

The layer variable designates the type of data to generate, so dhw for DHW data and Ub for wave data. rcp is the RCP to use to generate data from in the original data file. nsamples is the number of samples to generate from each climate replicate. nreplicates is the number of climate replicates to use from the original dataset. In the example above, the final dataset will have 10*5 replicates based on 5 replicates from the original dataset.

Creating connectivity data

Synthetic connectivity data can be generated from the connectivity-generation.py file in the examples folder. Several inputs at the beginning of the file can be changed to adjust the output of the data model. An example is shown below:

root_original_file = "name of file"
root_site_data_synth = "synth_2023-7-24_152038.csv"
years = ["2015", "2016", "2017"]  # connectivity data years to use
num = ["1", "2", "3"]  # connectivity data sample number to use
model_type = "GAN"  # "GaussianCopula"

years designates how many years to base the synthestic connectivity dataset off, with the average being used to generate the final dataset. num designates any replicates to be used in each year (also averaged over). model_type designates whether to use the GAN model from tensorflow/keras (slower but better quality) or "Gaussian Copula" model from SDV (faster but lesser quality).

Creating a synthetic JSON for the data package

A synthetic JSON file can be added to the synthetic data package using the file data-package-json-creation.py.

Creating the whole data package

The whole data package can be created by running data-package-creation.py. The same inputs will need to be adjusted in this file as in the files described above, namely the sample number parameters and original dataset file name etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.