Coder Social home page Coder Social logo

emadeldeen24 / adatime Goto Github PK

View Code? Open in Web Editor NEW
170.0 4.0 20.0 108.27 MB

[TKDD 2023] AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data

License: MIT License

Python 100.00%
deep-learning pytorch domain-adaptation time-series eeg sleep-stage-classification human-activity-recognition transfer-learning unsupervised-domain-adaptation cdan

adatime's Introduction

[TKDD 2023] AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data [Paper] [Cite]

by: Mohamed Ragab*, Emadeldeen Eldele*, Wee Ling Tan, Chuan-Sheng Foo, Zhenghua Chen, Min Wu, Chee Kwoh, Xiaoli Li
* Equal contribution
☨ Corresponding author

AdaTime is a PyTorch suite to systematically and fairly evaluate different domain adaptation methods on time series data.

Requirmenets:

  • Python3
  • Pytorch==1.7
  • Numpy==1.20.1
  • scikit-learn==0.24.1
  • Pandas==1.2.4
  • skorch==0.10.0 (For DEV risk calculations)
  • openpyxl==3.0.7 (for classification reports)
  • Wandb=0.12.7 (for sweeps)

Datasets

Available Datasets

We used four public datasets in this study. We also provide the preprocessed versions as follows:

Adding New Dataset

Structure of data

To add new dataset (e.g., NewData), it should be placed in a folder named: NewData in the datasets directory.

Since "NewData" has several domains, each domain should be split into train/test splits with naming style as "train_x.pt" and "test_x.pt".

The structure of data files should in dictionary form as follows: train.pt = {"samples": data, "labels: labels}, and similarly for test.pt.

Configurations

Next, you have to add a class with the name NewData in the configs/data_model_configs.py file. You can find similar classes for existing datasets as guidelines. Also, you have to specify the cross-domain scenarios in self.scenarios variable.

Last, you have to add another class with the name NewData in the configs/hparams.py file to specify the training parameters.

Domain Adaptation Algorithms

Existing Algorithms

Adding New Algorithm

To add a new algorithm, place it in algorithms/algorithms.py file.

Training procedure

The experiments are organised in a hierarchical way such that:

  • Several experiments are collected under one directory assigned by --experiment_description.
  • Each experiment could have different trials, each is specified by --run_description.
  • For example, if we want to experiment different UDA methods with CNN backbone, we can assign --experiment_description CNN_backnones --run_description DANN and --experiment_description CNN_backnones --run_description DDC and so on.

Training a model

To train a model:

python main.py  --phase train  \
                --experiment_description exp1  \
                --da_method DANN \
                --dataset HHAR \
                --backbone CNN \
                --num_runs 5 \

To test a model:

python main.py  --phase test  \
                --experiment_description exp1  \
                --da_method DANN \
                --dataset HHAR \
                --backbone CNN \
                --num_runs 5 \

Launching a sweep

Sweeps here are deployed on Wandb, which makes it easier for visualization, following the training progress, organizing sweeps, and collecting results.

python main_sweep.py  --experiment_description exp1_sweep  \
                --run_description sweep_over_lr \
                --da_method DANN \
                --dataset HHAR \
                --backbone CNN \
                --num_runs 5 \
                --sweep_project_wandb TEST
                --num_sweeps 50 \

Upon the run, you will find the running progress in the specified project page in wandb.

Note: If you got cuda out of memory error during testing, this is probably due to DEV risk calculations.

Upper and Lower bounds

  • To obtain the source-only or the lower bound you can choose the da_method to be NO_ADAPT.
  • To obtain the the target-only or the upper bound you can choose the da_method TARGET_ONLY

Results

  • Each run will have all the cross-domain scenarios results in the format src_to_trg_run_x, where x is the run_id (you can have multiple runs by assigning --num_runs arg).
  • Under each directory, you will find the classification report, a log file, checkpoint, and the different risks scores.
  • By the end of the all the runs, you will find the overall average and std results in the run directory.

Citation

If you found this work useful for you, please consider citing it.

@article{adatime,
  author = {Ragab, Mohamed and Eldele, Emadeldeen and Tan, Wee Ling and Foo, Chuan-Sheng and Chen, Zhenghua and Wu, Min and Kwoh, Chee-Keong and Li, Xiaoli},
  title = {ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series Data},
  year = {2023},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  issn = {1556-4681},
  url = {https://doi.org/10.1145/3587937},
  doi = {10.1145/3587937},
  journal = {ACM Trans. Knowl. Discov. Data},
  month = {mar}
}

Contact

For any issues/questions regarding the paper or reproducing the results, please contact any of the following.

Mohamed Ragab: mohamedr002{at}e.ntu.edu.sg

Emadeldeen Eldele: emad0002{at}e.ntu.edu.sg

School of Computer Science and Engineering (SCSE),
Nanyang Technological University (NTU), Singapore.

adatime's People

Contributors

emadeldeen24 avatar mohamedr002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

adatime's Issues

Number of sweeps provided as argument ignored

Hi emadeldeen24,
I'm trying to reproduce the results listed in your paper with the following setup:
python main.py --experiment_description domain-adapt-test --run_description domain-adapt-run --da_method Deep_Coral --dataset HAR --sweep_project_wandb domain-adapt-sweep --num_runs 1 --device cpu --is_sweep True --num_sweeps 1
Somehow the parameter is ignored and wandb runs an infinite number of sweeps (stopped run at 50) nevertheless.
Can you help?
Thanks a lot,
Nicole

Sweep doesn't work

Hi,

Im using sweep.py for sweeping the hyperparameters for my algorithm. However, with trainer.train(), I'm only able to run rugular training process. And if I change the code to trainer.sweep(), I will get KeyError('batch_size') from wandb.
image
image

Loss of CDAN

Hi,

First, thank you for this huge piece of work, it's a very useful one. I have a little question about CDAN Loss. I saw that you added a conditional entropy loss computed on the target features only, this doesn't seemsto be implemented (or maybe not this way) in the original CDAN code. What is the use of this loss and where does it come from ?

Best regards

Validation set for the datasets

First of all, Thanks a lot for all your effort!

I just have a small concern regarding the usage of preprocessed versions of the data that you provide. Preprocessed datasets always include train and test splits, however, preprocessing scripts seem to include validation sets as well. (In some cases lines for val set are commented out, in some cases they are not.)

To create validation set (i.e. for early-stopping), what would you suggest me to do? Should I, for instance, further split train split into train and val. In that case, is there anything that I should take into account to prevent information leakage across the splits?

Best wishes,

PS: To make the issue more concrete, you can see in the WISDM preprocessing script that the dataset is first split into train and test, and the train set is further split into train and val. However, val set is not saved.

I found this error about config.py of WISDOM

I can run these dataset(HAR)
But I met this error when I run this on WISDOM using CoTMix
1
cuz the WISDOM class hasn't step_size and lr_decay in the config.py

one more error:
2
cuz the size of trg isn't equal to the size of src in the last batch.

Dataset not found: HAR

Hello! I'm trying to run the project, but I have a little trouble.

  1. clone the project and put the folder HAR on data
har # tree -L 2 .
.
├── algorithms
│   ├── algorithms.py
│   └── __pycache__
├── configs
│   ├── data_model_configs.py
│   ├── hparams.py
│   ├── __pycache__
│   └── sweep_params.py
├── data
│   ├── HAR -> /home/xxx/research/dataset/HAR
│   └── README.md
...

13 directories, 20 files
  1. run
python main.py --experiment_description exp1  \ 
                --run_description run_1 \
                --da_method DANN \
                --backbone CNN \
                --num_runs 5 \
                --is_sweep False
  1. report the error:
Traceback (most recent call last):
  File "main.py", line 45, in <module>
    trainer = cross_domain_trainer(args)
  File "/home/xxx/research/code/AdaTime/trainer.py", line 59, in __init__
    self.dataset_configs, self.hparams_class = self.get_configs()
  File "/home/xxx/research/code/AdaTime/trainer.py", line 203, in get_configs
    dataset_class = get_dataset_class(self.dataset)
  File "/home/xxx/research/code/AdaTime/configs/data_model_configs.py", line 4, in get_dataset_class
    raise NotImplementedError("Dataset not found: {}".format(dataset_name))
NotImplementedError: Dataset not found: HAR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.