Coder Social home page Coder Social logo

llnl / syndata Goto Github PK

View Code? Open in Web Editor NEW
1.0 6.0 0.0 35 KB

SYNDATA software includes a suite of statistical/machine learning models to generate discrete/categorical synthetic data.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
synthetic-data-generation clinical-research machine-learning statistics

syndata's Introduction

Synthetic Data Generation with Machine Learning (SYNDATA)

SYNDATA software includes a suite of statistical/machine learning models to generate discrete/categorical synthetic data. To train each model, the user must provide the input data from which the model parameters will be infered. Once the models are trained, they can be used to generate entirely synthetic data. Finally, in addition to the actual models, SYNDATA includes code to process data, evaluate results (based on cross validation), and create a PDF report.

For more details of the methods implemented and the metrics used to evaluate synthetic data generation models, we refer to our paper: Generation and evaluation of synthetic patient data.

Installation

This software suite runs on specific versions of Python and its libraries. We recommend creating a Python environment and install all dependencies from requirements.txt file. To create an environment and install the correct version of the packages, do:

python3 -m venv datagen_env

then activate the environment:

source datagen_env/bin/activate

finally, install all dependencies:

python -m pip install -r requirements.txt

Done. You can now start running your experiments.

Quick Start

A demo file is available in the experiments/ folder. It runs an experiment with UCI's Breast Cancer data. One can build up on this file to create new experiments.

python demo.py

A folder with logs and a PDF report will be created in outputs/ folder. Check that out after running your experiment. The demo.py script may take a few minutes to complete. We recomend using a GPU-powered computer for a faster execution.

Authors:

  • Andre Goncalves (LLNL)
  • Rui Meng (LLNL)
  • Braden Soper (LLNL)
  • Priyadip Ray (LLNL)
  • Ana Paula Sales (LLNL)

Code Release

LLNL-CODE-831774

syndata's People

Contributors

andreric avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.