Coder Social home page Coder Social logo

relddpm's Introduction

RelDDPM

Introduction

This is the source code of the paper Controllable Tabular Data Synthesis Using Diffusion Models

Quick Start

Environment Setup

Before running the code, please make sure your Python version is above 3.7. We recommend running the code under a virtual environment:

conda create -n relddpm_env python=3.8
conda activate relddpm_env

Then install the necessary packages by :

pip install -r requirements.txt

Code Structure

|-- datasets
    |-- minority_class_oversampling # datasets used in minority class oversampling task
    |-- missing_tuple_completion # datasets used in missing tuple completion task
|-- ddpm # the denoise diffusion probabilistic model package
|-- lib_completion # the library used in missing tuple completion task 
|-- lib_oversampling # the library used in minority class oversampling task 
|-- data_utils.py # the class to preprocess the dataset
|-- eval_utils.py # the class to evaluate
|-- eval.py # code of the evaluation
|-- main.py # main code

Run

Minority Class Oversampling.

Run the code to generate synthetic data for minority class oversampling with the following command:

python main.py --task-name=oversampling --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

The parameter "dataset" should be "default", "shoppers" or "weatherAUS".

For example:

python main.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

Minority Class Oversampling.

Run the code to generate synthetic data for missing tuple completion with the following command:

python main.py --task-name=completion --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

The parameter "dataset" should be "heart", "airbnb" or "imdb".

Evaluation

Run the code to evaluate the results of the minority class oversampling/missing tuple completion with the following command:

python eval.py --task-name=[task name] --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

To evaluate the performance of minority class oversampling on default dataset, assume the synthetic results are saved in default_output, the command is:

python eval.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

relddpm's People

Contributors

ruc-datalab avatar ruclty avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.