Coder Social home page Coder Social logo

kudatapreprocessing's Introduction

KU Data Preprocessing Package

1. Repository Structure

.
├── dataset
│   ├── ecg_mitbih_test.csv
│   ├── imputed_data
│   │   └── ecg_mitbih_test_imputed.csv
│   ├── decomposed_data
│   │   ├── ecg_mitbih_test_imputed.csv
│   │   └── trend_decomposed.csv
│   └── synchronized_data
│       └── synchronized_dtw.csv
├── imputation.py
├── seasonal_trend_decomposition.py
├── synchronization.py
└── README.md

2. Preprocessing module

2.1 Missing Value (NA) Imputation

2.1.1 Supported Options & Sample Usage

Impute the missing values in a dataset and save the result.

  • Simple Imputation with mean, median, most_frequent, constant value [description]
# Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='simple' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
# Sample Usage
python seasonal_trend_decomposition.py --data_path='./dataset/ecg_mitbih_test.csv' \
                                       --option='knn' \
                                       --n_neighbors=5 \
                                       --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
# Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='mice' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'

2.1.2 Testing imputation module by adding random NAs to temporary dataset.

Just add --test_module argument to the command-line for testing the module. If ``--test_moduleargument is given,imputation.py` automatically adds random NAs to the dataset and then continues to impute the missing values.

* Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='simple' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
                     --test_module

2.2 Seasonal Trend Decomposition and Prediction (STL)

2.2.1 Seasonal Trend Detection using Seasonal-Trend LOESS (STL)

2.2.1 Diagnosis of Patterns in Time-Series data

# Sample Usage
python imputation.py --data_path='./dataset/machine_temperature_system_failure.csv' \
                     --seasonal_output_path='./dataset/decomposed_data/seasonal_decomposed.csv'
                     --trend_output_path='./dataset/decomposed_data/trend_decomposed.csv'

2.3 Synchronization using DTW and soft-DTW

# Sample Usage
python synchronization.py --data_path='./dataset/power_voltage.csv' \
                          --dtw_output_path='./dataset/synchronized_data/synchronized_dtw.csv'\
                          --plot_output_path='./dataset/synchronized_data'\
                          --option='dtw'\
                          --distance=2

kudatapreprocessing's People

Contributors

chmok avatar jiyoon52 avatar ljhz123 avatar suubkiim avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.