This is the source code of the paper Controllable Tabular Data Synthesis Using Diffusion Models
Before running the code, please make sure your Python version is above 3.7. We recommend running the code under a virtual environment:
conda create -n relddpm_env python=3.8
conda activate relddpm_env
Then install the necessary packages by :
pip install -r requirements.txt
|-- datasets
|-- minority_class_oversampling # datasets used in minority class oversampling task
|-- missing_tuple_completion # datasets used in missing tuple completion task
|-- ddpm # the denoise diffusion probabilistic model package
|-- lib_completion # the library used in missing tuple completion task
|-- lib_oversampling # the library used in minority class oversampling task
|-- data_utils.py # the class to preprocess the dataset
|-- eval_utils.py # the class to evaluate
|-- eval.py # code of the evaluation
|-- main.py # main code
Run the code to generate synthetic data for minority class oversampling with the following command:
python main.py --task-name=oversampling --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]
The parameter "dataset" should be "default", "shoppers" or "weatherAUS".
For example:
python main.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output
Run the code to generate synthetic data for missing tuple completion with the following command:
python main.py --task-name=completion --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]
The parameter "dataset" should be "heart", "airbnb" or "imdb".
Run the code to evaluate the results of the minority class oversampling/missing tuple completion with the following command:
python eval.py --task-name=[task name] --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]
To evaluate the performance of minority class oversampling on default dataset, assume the synthetic results are saved in default_output, the command is:
python eval.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output