Coder Social home page Coder Social logo

microsoft / semi-supervised-learning Goto Github PK

View Code? Open in Web Editor NEW
1.2K 21.0 158.0 1.83 MB

A Unified Semi-Supervised Learning Codebase (NeurIPS'22)

Home Page: https://usb.readthedocs.io

License: MIT License

Jupyter Notebook 14.14% Python 85.81% Dockerfile 0.04%
classification semi-supervised-learning transformer computer-vision deep-learning machine-learning natural-language-processing pytorch audio-classification low-resource

semi-supervised-learning's Introduction

Contributors Forks Stargazers Issues


Logo

USB: A Unified Semi-supervised learning Benchmark for CV, NLP, and Audio Classification
Paper · Benchmark · Demo · Docs · Issue · Blog · Blog (Pytorch) · Blog (Chinese) · Video · Video (Chinese)

Table of Contents
  1. News and Updates
  2. Introduction
  3. Getting Started
  4. Usage
  5. Benchmark Results
  6. Model Zoo
  7. Community
  8. License
  9. Acknowledgments

News and Updates

  • [03/16/2024] Add EPASS, SequenceMatch, and ReFixMatch. Fixed some typos.

  • [07/07/2023] Add DeFixmatch. Fixed some bugs. Release semilearn=0.3.1/

  • [06/01/2023] USB has officially joined the Pytorch ecosystem! [Pytorch blog]

  • [01/30/2023] Update semilearn==0.3.0. Add FreeMatch and SoftMatch. Add imbalanced algorithms. Update results and add wandb support. Refer CHANGE_LOG for details. [Results][Logs][Wandb]. Older classic logs can be found here: [TorchSSL Log].

  • [10/16/2022] Dataset download link and process instructions released! [Datasets]

  • [10/13/2022] We have finished the camera ready version with updated [Results]. [Openreview]

  • [10/06/2022] Training logs and results of USB has been updated! Available dataset will be uploaded soon. [Logs] [Results]

  • [09/17/2022] The USB paper has been accepted by NeurIPS 2022 Dataset and Benchmark Track! [Openreview]

  • [08/21/2022] USB has been released!

Introduction

USB is a Pytorch-based Python package for Semi-Supervised Learning (SSL). It is easy-to-use/extend, affordable to small groups, and comprehensive for developing and evaluating SSL algorithms. USB provides the implementation of 14 SSL algorithms based on Consistency Regularization, and 15 tasks for evaluation from CV, NLP, and Audio domain.

Code Structure

(back to top)

Getting Started

This is an example of how to set up USB locally. To get a local copy up, running follow these simple example steps.

Prerequisites

USB is built on pytorch, with torchvision, torchaudio, and transformers.

To install the required packages, you can create a conda environment:

conda create --name usb python=3.8

then use pip to install required packages:

pip install -r requirements.txt

From now on, you can start use USB by typing

python train.py --c config/usb_cv/fixmatch/fixmatch_cifar100_200_0.yaml

Installation

We provide a Python package semilearn of USB for users who want to start training/testing the supported SSL algorithms on their data quickly:

pip install semilearn

(back to top)

Development

You can also develop your own SSL algorithm and evaluate it by cloning USB:

git clone https://github.com/microsoft/Semi-supervised-learning.git

(back to top)

Prepare Datasets

The detailed instructions for downloading and processing are shown in Dataset Download. Please follow it to download datasets before running or developing algorithms.

(back to top)

Usage

USB is easy to use and extend. Going through the bellowing examples will help you familiar with USB for quick use, evaluate an existing SSL algorithm on your own dataset, or developing new SSL algorithms.

Quick Start with USB package

Please see Installation to install USB first. We provide colab tutorials for:

Start with Docker

Step1: Check your environment

You need to properly install Docker and nvidia driver first. To use GPU in a docker container You also need to install nvidia-docker2 (Installation Guide). Then, Please check your CUDA version via nvidia-smi

Step2: Clone the project

git clone https://github.com/microsoft/Semi-supervised-learning.git

Step3: Build the Docker image

Before building the image, you may modify the Dockerfile according to your CUDA version. The CUDA version we use is 11.6. You can change the base image tag according to this site. You also need to change the --extra-index-url according to your CUDA version in order to install the correct version of Pytorch. You can check the url through Pytorch website.

Use this command to build the image

cd Semi-supervised-learning && docker build -t semilearn .

Job done. You can use the image you just built for your own project. Don't forget to use the argument --gpu when you want to use GPU in a container.

Training

Here is an example to train FixMatch on CIFAR-100 with 200 labels. Training other supported algorithms (on other datasets with different label settings) can be specified by a config file:

python train.py --c config/usb_cv/fixmatch/fixmatch_cifar100_200_0.yaml

Evaluation

After training, you can check the evaluation performance on training logs, or running evaluation script:

python eval.py --dataset cifar100 --num_classes 100 --load_path /PATH/TO/CHECKPOINT

Develop

Check the developing documentation for creating your own SSL algorithm!

For more examples, please refer to the Documentation

(back to top)

Benchmark Results

Please refer to Results for benchmark results on different tasks.

(back to top)

Model Zoo

TODO: add pre-trained models.

(back to top)

TODO

  • Finish Readme
  • Updating SUPPORT.MD with content about this project's support experience
  • Multi-language Support
    • Chinese

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

If you have a suggestion that would make USB better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the project
  2. Create your branch (git checkout -b your_name/your_branch)
  3. Commit your changes (git commit -m 'Add some features')
  4. Push to the branch (git push origin your_name/your_branch)
  5. Open a Pull Request

(back to top)

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Community and Contact

The USB community is maintained by:

(back to top)

Citing USB

Please cite us if you fine this project helpful for your project/paper:

@inproceedings{usb2022,
  doi = {10.48550/ARXIV.2208.07204},
  url = {https://arxiv.org/abs/2208.07204},
  author = {Wang, Yidong and Chen, Hao and Fan, Yue and Sun, Wang and Tao, Ran and Hou, Wenxin and Wang, Renjie and Yang, Linyi and Zhou, Zhi and Guo, Lan-Zhe and Qi, Heli and Wu, Zhen and Li, Yu-Feng and Nakamura, Satoshi and Ye, Wei and Savvides, Marios and Raj, Bhiksha and Shinozaki, Takahiro and Schiele, Bernt and Wang, Jindong and Xie, Xing and Zhang, Yue},
  title = {USB: A Unified Semi-supervised Learning Benchmark for Classification},
  booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year = {2022}
}

@article{wang2023freematch,
  title={FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning},
  author={Wang, Yidong and Chen, Hao and Heng, Qiang and Hou, Wenxin and Fan, Yue and and Wu, Zhen and Wang, Jindong and Savvides, Marios and Shinozaki, Takahiro and Raj, Bhiksha and Schiele, Bernt and Xie, Xing},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023}
}

@article{chen2023softmatch,
  title={SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning},
  author={Chen, Hao and Tao, Ran and Fan, Yue and Wang, Yidong and Wang, Jindong and Schiele, Bernt and Xie, Xing and Raj, Bhiksha and Savvides, Marios},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023}
}

@article{zhang2021flexmatch,
  title={FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling},
  author={Zhang, Bowen and Wang, Yidong and Hou, Wenxin and Wu, Hao and Wang, Jindong and Okumura, Manabu and Shinozaki, Takahiro},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2021}
}

Acknowledgments

We thanks the following projects for reference of creating USB:

(back to top)

semi-supervised-learning's People

Contributors

adamtupper avatar bblwbtd avatar beandkay avatar dependabot[bot] avatar devo002 avatar gorarakelyan avatar hhhhhhao avatar hugoschmutz avatar jindongwang avatar limberc avatar memoriesj avatar microsoft-github-operations[bot] avatar microsoftopensource avatar parskatt avatar pm25 avatar qianlanwyd avatar tao0420 avatar thomasbohm avatar yue-fan avatar zhengxiangshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

semi-supervised-learning's Issues

Default value for ulb_num_labels

When I run the updated code, I find a bug if ulb_num_labels is None.

The error is "ssl/semilearn/datasets/utils.py", line 101, in sample_labeled_unlabeled_data: UnboundLocalError: local variable 'ulb_samples_per_class' referenced before assignment".

A default value has to be set for ulb_num_labels.

extra algorithoms used in Benchmark Results

Hi, I wonder the details of the extra algorithoms mentioned in Benchmark Results, .e.g, freematch/softmatch/mpl, would u like to offer the detailed information or articles about them?

Resume with previous learning rate

Currently, the resume option only loads model parameters.
In order to resume from the exact point where the training left off, saving and loading optimizer parameter is required.

Exception: Size of sample #521 is invalid (=(1029, 0)) since max_positions=(1024, 1024), skip this example with --skip-invalid-size-inputs-valid-test

Hi,

Thanks for this nice project.

I encounter an error when running python preprocess/preprocess_aclimdb.py. The error Exception: Size of sample #521 is invalid (=(1029, 0)) since max_positions=(1024, 1024), skip this example with --skip-invalid-size-inputs-valid-test happens at the cur_aug_sen_0 = de2en.translate(en2de.translate(cur_ori_sen, sampling = True, temperature = 0.9), sampling = True, temperature = 0.9). I am not familiar with fairseq. How could I fix this?

Thanks a lot.

unbalanced classification

Hi, I got a question that when the version include the unbalanced classification algorithms like COSSL and DASO will be released?

Example for Custom Dataset Usage in NLP

🚀 Feature

How can I use custom nlp dataset to try these algorithms on? I only saw example for CV custom dataset. Main part I am intereseted in is train_transform step for NLP custom dataset.

AttributeError: 'DistributedSampler' object has no attribute 'num_samples'


AttributeError Traceback (most recent call last)

in
1 # define data loaders
----> 2 train_lb_loader = get_data_loader(config, lb_dataset, config.batch_size)
3 train_ulb_loader = get_data_loader(config, ulb_dataset, int(config.batch_size * config.uratio))
4 eval_loader = get_data_loader(config, eval_dataset, config.eval_batch_size)

1 frames

/content/drive/MyDrive/RUPESH_RESEARCH_IMPLEMENTATIONS/Semi-supervised-learning/semilearn/datasets/utils.py in get_data_loader(args, dset, batch_size, shuffle, num_workers, pin_memory, data_sampler, num_epochs, num_iters, generator, drop_last, distributed)
161 num_samples = per_epoch_steps * batch_size * num_replicas
162 # print(num_samples)
--> 163 return DataLoader(dset, batch_size=batch_size, shuffle=False, num_workers=num_workers, collate_fn=collact_fn,pin_memory=pin_memory, sampler=data_sampler(dset, num_replicas=num_replicas, rank=rank, num_samples=num_samples),
164 generator=generator, drop_last=drop_last)
165

/content/drive/MyDrive/RUPESH_RESEARCH_IMPLEMENTATIONS/Semi-supervised-learning/semilearn/datasets/samplers/sampler.py in init(self, dataset, num_replicas, rank, num_samples)
29 def init(self, dataset, num_replicas=None, rank=None, num_samples=None):
30 if not isinstance(num_samples, int) or num_samples <= 0:
---> 31 raise ValueError("num_samples should be a positive integeral value, but got num_samples={}".format(self.num_samples))
32
33 if num_replicas is None:

AttributeError: 'DistributedSampler' object has no attribute 'num_samples'

Any suggestions on how to solve this issue?

Question about seed and the number of GPUs when you train

Hi,
Thanks for your great efforts.

Currently, I reproduce the results that you shared in this repo.
But, when I followed the recipes that you announced, the loss and accuracy is not exactly followed with your released logs.

Even if utilized with the same seed, the loss could be different.

and, in my case, I currently use a single GPU(RTX3090).
I just want to know how many GPUs are used when you trained the models.

Thank you.

ResNet50 pretrained is ignored

Bug

Hey,

The args use_pretrain and pretrain_path a currently ignored for ResNet50 and an untrained model is always returned.
I prepared a small PR to fix this.

Thanks a lot!

Batch size mismatch in the paper and the code

image
image

Wonderful work!
But I found a batch size mismatch problem between paper and code. The batch size is reported as 16 and 16 for labeled and unlabeled data, but the batch size is implemented as 8 and 8 for labeled and unlabeled data in code.

Random logit interpolation seems not to be applied in AdaMatch

Hi there, as I went through the codes, I found that Random logit interpolation seems not to be applied in AdaMatch algorithm.

In the paper, the logits from labeled data is randomly interpolated from some logits of the unlabeled data, as described in the paper.
image

However, in your implementation, there seems no such an interpolation perfromed on the logits of labeled data and they are the same with those in FixMatch. If the logit interpolation is implemented at other parts, please inform us.

Improve EMA module

🚀 Feature

Improvement of current EMA module should be made, for saving GPU memory and easier application of ema model.

Motivation

Current EMA model are on the same device of model:

model.model = send_model_cuda(args, model.model)

Besides EMA update is separated from ema_model:

Should make this ema update more flexible and save GPU memory

Alternatives

Timm implementation as a good reference:
https://github.com/rwightman/pytorch-image-models/blob/a520da9b495422bc773fb5dfe10819acb8bd7c5c/timm/utils/model_ema.py#L82

FixMatch with EMA ?

Hi, thanks for the code base. However, it seems that ema has not been integrated into FixMatch, which is different to the implementation in the original paper.

Typo in Results of Classic CV?

Looking at the results table for classic_cv, i noticed that for cifar10 there was a cifar10_4000 column.

image

But with regards to the explanations you gave in the README.md of Semi-supervised-learning/results/, it should be 4, 25 and 100 labels per class and therefore cifar10_40, cifar10_250 and cifar10_1000 or am i missing something?

Greetings,
Paul

Train data with 4 channels

🚀 Feature

Can the tool train 4 channels of data?

Motivation

we construct our data into 3 channels and input it into TorchSSL model. We want to use more information to improve the discriminant accuracy of the training model. However, after we construct our data into 4 channels, the model give me an error.

image

Pitch

Alternatives

Additional context

Regression Extension

Hi,
thank you for the work on this project!

I am interested in applying this code for regression or even multi-target regression tasks. You stated in your paper that regression is currently not supported. I was wondering if you see any specific hurdles by extending the code to reach this goal? Does it even make sense to use these algorithms for non-classification tasks?

Thx in advance!

Best,
Andreas

Performance about Beginning example

I run the Beginning_example.ipynb and found the accuracy was only 0.1. I modified the optimization configs according to the comment, but it didn't work.

This is the output.

Epoch: 0
[2023-02-19 06:32:03,336 INFO] confusion matrix
[2023-02-19 06:32:03,337 INFO] [[0.00719161 0.35677718 0.01109215 0.14383228 0.01389566 0.01791809
  0.07557289 0.20075573 0.05741102 0.11555339]
 [0.0442976  0.39304944 0.0150514  0.05347528 0.02300538 0.00709741
  0.12628488 0.12983358 0.04111601 0.16678904]
 [0.00696056 0.41262669 0.00512883 0.08291611 0.02369032 0.01428746
  0.07839785 0.18219563 0.08511418 0.10868238]
 [0.02588839 0.38038833 0.01318842 0.10037856 0.02955184 0.01184516
  0.09842472 0.15337648 0.07681036 0.11014776]
 [0.01193521 0.50468883 0.00815979 0.04859335 0.02094751 0.01181342
  0.05285592 0.1592985  0.09000122 0.09170625]
 [0.02499695 0.41970491 0.00390196 0.07438117 0.02694793 0.01512011
  0.08194123 0.16144373 0.08462383 0.10693818]
 [0.01517005 0.53890384 0.00293614 0.03816981 0.02581356 0.00599462
  0.07805236 0.1225838  0.06435038 0.10802545]
 [0.02513115 0.40856411 0.01207759 0.05904599 0.01402952 0.01390753
  0.05916799 0.20336709 0.10076857 0.10394047]
 [0.00782779 0.30112524 0.01492172 0.11778376 0.02800881 0.01284247
  0.12402153 0.18113992 0.05088063 0.16144814]
 [0.03947849 0.35335689 0.026319   0.09491897 0.0240039  0.01279396
  0.11441452 0.1657122  0.0459364  0.12306568]]
[2023-02-19 06:32:03,339 INFO] evaluation metric
[2023-02-19 06:32:03,339 INFO] acc: 0.0996
[2023-02-19 06:32:03,339 INFO] precision: 0.0900
[2023-02-19 06:32:03,340 INFO] recall: 0.0997
[2023-02-19 06:32:03,340 INFO] f1: 0.0755
model saved: [./saved_models/fixmatch/latest_model.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/latest_model.pth)
model saved: [./saved_models/fixmatch/model_best.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/model_best.pth)
Epoch: 1
[2023-02-19 06:33:48,335 INFO] confusion matrix
[2023-02-19 06:33:48,336 INFO] [[0.00804486 0.34921989 0.00609459 0.15102389 0.01377377 0.02291565
  0.07874208 0.19246709 0.05143832 0.12627986]
 [0.04123837 0.3726138  0.01199217 0.06069506 0.02227117 0.00599608
  0.13350465 0.13264807 0.04698972 0.17205091]
 [0.00879228 0.39821712 0.00610575 0.08889974 0.02173648 0.0162413
  0.08242765 0.16998413 0.10514104 0.10245451]
 [0.03284894 0.37330565 0.0122115  0.10037856 0.02869703 0.01599707
  0.09170839 0.15533032 0.08352668 0.10599585]
 [0.0140056  0.51613689 0.00840336 0.06199001 0.02289611 0.01071733
  0.04871514 0.14297893 0.09426379 0.07989283]
 [0.0241434  0.41263261 0.00195098 0.0915742  0.0241434  0.01304719
  0.0768199  0.15534691 0.08474576 0.11559566]
 [0.01517005 0.53976022 0.00391485 0.04404208 0.02165402 0.0118669
  0.07120137 0.11707854 0.07413751 0.10117446]
 [0.03098695 0.39929242 0.00902769 0.06490179 0.01512749 0.01280956
  0.07002562 0.18409174 0.11821398 0.09552275]
 [0.00978474 0.30858611 0.00990705 0.13478474 0.02495108 0.01516634
  0.11056751 0.16976517 0.05234834 0.16413894]
 [0.0433776  0.34324357 0.02254173 0.09650299 0.02217619 0.01376873
  0.11319605 0.17131717 0.0456927  0.12818326]]
[2023-02-19 06:33:48,338 INFO] evaluation metric
[2023-02-19 06:33:48,338 INFO] acc: 0.0958
[2023-02-19 06:33:48,338 INFO] precision: 0.0880
[2023-02-19 06:33:48,339 INFO] recall: 0.0959
[2023-02-19 06:33:48,339 INFO] f1: 0.0734
model saved: [./saved_models/fixmatch/latest_model.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/latest_model.pth)
Epoch: 2
[2023-02-19 06:35:27,222 INFO] confusion matrix
[2023-02-19 06:35:27,223 INFO] [[0.00499756 0.35031692 0.00792296 0.14553876 0.01572404 0.01487079
  0.07423208 0.21635787 0.05216967 0.11786933]
 [0.04038179 0.39880078 0.01590798 0.05580029 0.02337249 0.00697504
  0.13839941 0.12652961 0.04307391 0.15075869]
 [0.00696056 0.41396996 0.00537306 0.08975455 0.01990475 0.01526438
  0.07058249 0.18671388 0.08291611 0.10856026]
 [0.02075956 0.3963854  0.00818171 0.09720357 0.02381243 0.01697399
  0.08377091 0.15997069 0.08254976 0.11039199]
 [0.01108269 0.51345756 0.00608939 0.05943247 0.01863354 0.00682012
  0.05334308 0.16721471 0.08439898 0.07952746]
 [0.02499695 0.42653335 0.00292647 0.07706377 0.02292403 0.0120717
  0.07779539 0.16717473 0.08279478 0.10571881]
 [0.01920724 0.52263274 0.00587228 0.04685588 0.02361145 0.00990947
  0.06912161 0.13861023 0.05713237 0.10704673]
 [0.02598512 0.40234232 0.01012566 0.06307186 0.01329755 0.0089057
  0.06307186 0.20702696 0.10272051 0.10345248]
 [0.00880626 0.29977984 0.0199364  0.12597847 0.02605186 0.01663405
  0.10518591 0.20303327 0.043909   0.15068493]
 [0.03375168 0.35603753 0.02034848 0.09467528 0.01705861 0.01584014
  0.11989765 0.17533813 0.03777263 0.12927988]]
[2023-02-19 06:35:27,224 INFO] evaluation metric
[2023-02-19 06:35:27,224 INFO] acc: 0.0986
[2023-02-19 06:35:27,225 INFO] precision: 0.0853
[2023-02-19 06:35:27,225 INFO] recall: 0.0986
[2023-02-19 06:35:27,225 INFO] f1: 0.0729
model saved: [./saved_models/fixmatch/latest_model.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/latest_model.pth)
Epoch: 3
[2023-02-19 06:37:10,505 INFO] confusion matrix
[2023-02-19 06:37:10,505 INFO] [[0.00804486 0.34982935 0.00901999 0.15845929 0.00877621 0.02303754
  0.07996099 0.17710873 0.0591175  0.12664554]
 [0.03830152 0.38705335 0.01297112 0.05861478 0.02116985 0.00403818
  0.13864415 0.13179148 0.04674498 0.16067058]
 [0.01099035 0.40896324 0.00915863 0.09048724 0.02088167 0.01721822
  0.07644401 0.1681524  0.09415069 0.10355355]
 [0.02491147 0.36451337 0.01025766 0.09842472 0.02955184 0.01502015
  0.09720357 0.15984858 0.0918305  0.10843815]
 [0.012057   0.49409329 0.00815979 0.05821459 0.02569724 0.01266594
  0.05370844 0.14590184 0.10157106 0.08793082]
 [0.02402146 0.42043653 0.00621875 0.09230582 0.02304597 0.01524204
  0.08669674 0.15376174 0.07877088 0.09950006]
 [0.01211157 0.54110595 0.00489356 0.03609004 0.02789332 0.00893076
  0.07303646 0.11365305 0.07621727 0.10606802]
 [0.03086495 0.38684885 0.01012566 0.07173356 0.01317555 0.0176894
  0.07039161 0.18274979 0.12602172 0.09039893]
 [0.00782779 0.30858611 0.01920254 0.12597847 0.0239726  0.01418787
  0.11876223 0.16964286 0.05418297 0.15765656]
 [0.04349945 0.32167662 0.0152309  0.09869623 0.01815523 0.01669307
  0.11758255 0.17667845 0.05397831 0.13780919]]
[2023-02-19 06:37:10,507 INFO] evaluation metric
[2023-02-19 06:37:10,507 INFO] acc: 0.0991
[2023-02-19 06:37:10,508 INFO] precision: 0.0941
[2023-02-19 06:37:10,508 INFO] recall: 0.0991
[2023-02-19 06:37:10,508 INFO] f1: 0.0764
model saved: [./saved_models/fixmatch/latest_model.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/latest_model.pth)
Epoch: 4
[2023-02-19 06:38:48,838 INFO] confusion matrix
[2023-02-19 06:38:48,839 INFO] [[0.00694783 0.34678206 0.01316431 0.1582155  0.0088981  0.01974647
  0.07557289 0.19953681 0.05936129 0.11177474]
 [0.0324278  0.38423886 0.01395007 0.06558982 0.02129222 0.00697504
  0.13240333 0.13754283 0.04478708 0.16079295]
 [0.0097692  0.39040176 0.00708267 0.09268531 0.02686531 0.01734033
  0.07729882 0.17755526 0.09524973 0.10575162]
 [0.0211259  0.36781048 0.01123458 0.10636219 0.03174991 0.01697399
  0.09085358 0.15410917 0.09354011 0.10624008]
 [0.0130313  0.49092681 0.00815979 0.06150286 0.02581902 0.01071733
  0.05468274 0.15150408 0.09292413 0.09073194]
 [0.02389952 0.4229972  0.00414584 0.08596513 0.02292403 0.01719303
  0.07572247 0.1630289  0.08011218 0.10401171]
 [0.01321263 0.53792513 0.00587228 0.03682408 0.02361145 0.00990947
  0.06667482 0.12796672 0.07095669 0.10704673]
 [0.02598512 0.41319995 0.01110162 0.06294986 0.01329755 0.0118336
  0.06502379 0.18933756 0.10479444 0.10247652]
 [0.00684932 0.31152153 0.01700098 0.11925147 0.02005871 0.01394325
  0.11460372 0.20205479 0.04880137 0.14591487]
 [0.04179359 0.33727306 0.01913001 0.10746923 0.02010479 0.01376873
  0.10759108 0.18228342 0.0480078  0.12257829]]
[2023-02-19 06:38:48,841 INFO] evaluation metric
[2023-02-19 06:38:48,841 INFO] acc: 0.0974
[2023-02-19 06:38:48,841 INFO] precision: 0.0921
[2023-02-19 06:38:48,842 INFO] recall: 0.0975
[2023-02-19 06:38:48,842 INFO] f1: 0.0745
model saved: [./saved_models/fixmatch/latest_model.pth](https://vscode-remote+ssh-002dremote-002bzy.vscode-resource.vscode-cdn.net/home/yxyuan/thesis/AutoMTL/saved_models/fixmatch/latest_model.pth)
Epoch: 5

Wrong configurations

Bug

I saw that you have mistakenly change the model use for training cifar-100 from wrn-28-8 to wrn-28-2.
Moreover, seems like the argument amp is not used, instead, in your code, it is use_amp.

ViT backbone

🚀 Feature

Thanks to your code. And have you planned to realease the code based on ViT pretrained model?

question about training time in classic_cv and use_amp

First thanks to your code base!

It needs to be acknowledged that your USB_cv's ViT backbone is excellent and can get better result in shorter time than classic_cv, but I think using pretrained weight is a bit like transfer learning rather than SSL. So after using ViT backbone I go back to train with WRN backbone, and I found it takes to much time to train, it takes almost 10 days to train fixmatch with RTX6000 (and it takes 8 days to train in provided log), so is there any methods to reduce the training time without reducing accuracy?

Another question is that, the training log before and after I set use_amp=True are same, I really don't know why.

Thanks!

Adaptation to Active Learning

I have implemented some of the methods you presented (pseudo-labels, pi-model, fixmatch) with the goal in mind of using them in an active learning setting. I am currently using a non-pretrained WRS2810. However, the current biggest obstacle is the training time. E.g. the training of fixmatch for 1024 epochs á 1024 train steps (1048576 steps total) with an uratio of 7 (example from CIFAR10 with 40 labeled samples) takes a very long time and would additionally have to be repeated several times in an active learning environment, which makes the whole thing unusable. My questions are therefore:

  • Is ViT and pretaining a better choice for this purpose?
  • Why is it necessary to train so many epochs?
  • Is there an alternative training method that reduces training time while causing minimal performance degradation?
  • What do you think about applying these methods to active learning? Is your approach to training adaptable to this scenario?

Thanks for your help!
Paul

Reproducing results in the paper

Hi, Thanks for your great works!

I'm trying to reproduce CV results in your paper (Table 6). However, I found that some gaps between the paper and what I reproduced. I just ran training with the config files existed in the repository.

For CIFAR100 with 200 labels dataset, the result I got is as follows.
image

Paper top-1 error Reproduced top-1 error Difference
fixmatch 56.43 45.98 10.45
flexmatch 29.59 36.34 -6.75
simmatch 35.94 30.39 5.55

I know I used just one repetition without averaging on multiple seeds. But, those differences seem beyond deviation.

So, I started reading your paper more closely and I found out my fixmatch reproduced result is close to the number in Table D.1 (lr=5e-5, same with the configuration file).
image

It leads me to think that Table 6 results may come from other hyperparameter settings. Would you help me to correctly reproduce the results reported in the paper? Thanks in advance! 😃

p.s. Table 6 has typos.
image

Help Needed

Error

Hi,

When I am trying to run the code getting the following Error.

AttributeError: 'Namespace' object has no attribute 'use_amp'.

Please help in fixing out the error.

Thank you.

pretrained model didn't get imporved

when used the same parameters in the config/.../stl10_40_0.yaml(vit/wrn)
the precision and top1-accuracy didn't increased, but drop from 0.65 to 0.45(20k iters)
simmatch and fixmatch had the same problems. had i done some incorrect?
the result used the original code.

Arguments Display bug

Hi,

Thanks for providing this wonderful code base.
I understand that there should be a huge amount of work so any mistakes are possible.
While I was trying to run with distributed parallel training, I found that the argument display for batch size might be wrong.

i.e., train.py line 255 -> send_model_cuda -> misc.py line 52
this line will further change the batch size by half while declaring and defining the EMA model.

I politely want to double-check this with you and appreciate you if you could clarify this.

By the way, May I ask will the FreeMatch be also included in the next commit?

Cheers and Thanks

Failed to reproduce results

I'm trying to reproduce the result of FlexMatch on CIFAR100-400 experiment and only got around 56% acc, which is less than the reported result 60%. I simply use the command python train.py --c config/classic_cv/flexmatch/flexmatch_cifar100_400_0.yaml and here is my training curve:

screenshot-20221121-115003

Is there sth I've missed?

Code for SimiS

When will the code for SimiS [1] be released? All configs are available, but the code is missing.

[1] An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning.

Colab examples being crashed

Hi!

Beginning and customizing dataset examples in colab are crashing. I have tried it with several google accounts. Colab just returns session has been crashed when trying to import semilearn stuff.

Also; I've been trying a lot to work with the USB. Yet, even I have used Docker, I am unable to execute training process. Returns CUDA:Unknown Error. May be I should open another issue for that. Still, I wonder if anyone can run and use this library seamlessly recently.

Dataset Files Missing

Hi,

The train.json in yahoo_answers and test.json in amazon_review are missed in your dataset zip file.

Could you please update it?

Thanks

A wrong implementation in training data sampling

Bug

I believe there is an error in semilearn/datasets/utils.py: Line 100.

Reproduce the Bug

Here is the line 100: ulb_idx.extend(idx[lb_samples_per_class[c]:lb_samples_per_class[c]+ulb_samples_per_class[c]])

If I am not wrong, this line means to construct unlabeled data for training.
The variable "ulb_samples_per_class" is simply obtained by (total number of training data /number of class). However, the dataset may not be exactly balanced. For example, let's consider a binary class task with 1000 positive data points and 1200 negative data points. In this case, "ulb_samples_per_class" should be (1,000 + 1,200) / 2 = 1,100. And the total unlabeled data should be 2,200.

However, if line 100 is conducted, only 2,100 unlabeled data is collected.

"Beginner Example" Notebook shows no performance improvement after many epochs

Bug

When run, the beginner example notebook does not demonstrate the algorithm is learning. After 30 epochs, each output report produces the exact same output, included below. This is after adjusting the config parameters to match the recommendations (e.g. 100 epochs, 102400 iterations, etc.)

Reproduce the Bug

System: Ubuntu 20.04.4

  1. git clone the repository
  2. cd into repository
  3. Make anaconda environment with python 3.8 and cudatoolkit 11.3.
  4. pip install -r requirements.txt
  5. Verify torch works and can access the GPUs
  6. Install juptyerlab
  7. Update config with appropriate hyperparameters. The only ones that are changed are as follows:
    'epoch': 100,  # set to 100
    'num_train_iter': 102400,  # set to 102400
    'num_eval_iter': 1024,   # set to 1024
    'num_log_iter': 256,    # set to 256
  1. Open jupyterlab and run notebooks/Beginner_Example.ipynb

Error Messages and Logs

Output:

Epoch: 0
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:33:47,914 INFO] confusion matrix
[2023-02-19 15:33:47,915 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:33:47,917 INFO] evaluation metric
[2023-02-19 15:33:47,918 INFO] acc: 0.1011
[2023-02-19 15:33:47,919 INFO] precision: 0.0101
[2023-02-19 15:33:47,919 INFO] recall: 0.1000
[2023-02-19 15:33:47,920 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
(... truncated by poster ...)
Epoch: 17
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:34:40,825 INFO] confusion matrix
[2023-02-19 15:34:40,826 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:34:40,828 INFO] evaluation metric
[2023-02-19 15:34:40,829 INFO] acc: 0.1011
[2023-02-19 15:34:40,830 INFO] precision: 0.0101
[2023-02-19 15:34:40,830 INFO] recall: 0.1000
[2023-02-19 15:34:40,831 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 18
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:35:33,695 INFO] confusion matrix
[2023-02-19 15:35:33,697 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:35:33,699 INFO] evaluation metric
[2023-02-19 15:35:33,700 INFO] acc: 0.1011
[2023-02-19 15:35:33,700 INFO] precision: 0.0101
[2023-02-19 15:35:33,701 INFO] recall: 0.1000
[2023-02-19 15:35:33,702 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 19
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:36:26,505 INFO] confusion matrix
[2023-02-19 15:36:26,506 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:36:26,508 INFO] evaluation metric
[2023-02-19 15:36:26,509 INFO] acc: 0.1011
[2023-02-19 15:36:26,510 INFO] precision: 0.0101
[2023-02-19 15:36:26,510 INFO] recall: 0.1000
[2023-02-19 15:36:26,512 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 20
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:37:19,113 INFO] confusion matrix
[2023-02-19 15:37:19,115 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:37:19,117 INFO] evaluation metric
[2023-02-19 15:37:19,118 INFO] acc: 0.1011
[2023-02-19 15:37:19,119 INFO] precision: 0.0101
[2023-02-19 15:37:19,119 INFO] recall: 0.1000
[2023-02-19 15:37:19,120 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 21
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:38:11,989 INFO] confusion matrix
[2023-02-19 15:38:11,990 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:38:11,992 INFO] evaluation metric
[2023-02-19 15:38:11,993 INFO] acc: 0.1011
[2023-02-19 15:38:11,994 INFO] precision: 0.0101
[2023-02-19 15:38:11,994 INFO] recall: 0.1000
[2023-02-19 15:38:11,995 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 22
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:39:04,797 INFO] confusion matrix
[2023-02-19 15:39:04,798 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:39:04,801 INFO] evaluation metric
[2023-02-19 15:39:04,802 INFO] acc: 0.1011
[2023-02-19 15:39:04,802 INFO] precision: 0.0101
[2023-02-19 15:39:04,803 INFO] recall: 0.1000
[2023-02-19 15:39:04,803 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 23
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:39:57,722 INFO] confusion matrix
[2023-02-19 15:39:57,723 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:39:57,726 INFO] evaluation metric
[2023-02-19 15:39:57,727 INFO] acc: 0.1011
[2023-02-19 15:39:57,727 INFO] precision: 0.0101
[2023-02-19 15:39:57,728 INFO] recall: 0.1000
[2023-02-19 15:39:57,729 INFO] f1: 0.0184
model saved: ./saved_models/fixmatch/latest_model.pth
Epoch: 24
/home/.conda/envs/nlpenv/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
[2023-02-19 15:40:50,675 INFO] confusion matrix
[2023-02-19 15:40:50,677 INFO] [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
[2023-02-19 15:40:50,679 INFO] evaluation metric
[2023-02-19 15:40:50,680 INFO] acc: 0.1011
[2023-02-19 15:40:50,681 INFO] precision: 0.0101
[2023-02-19 15:40:50,681 INFO] recall: 0.1000
[2023-02-19 15:40:50,682 INFO] f1: 0.0184

Can someone provide guidance on why I don't see any learning occur? I kept the notebook as-is except the updates to the config I noted above.

Multi GPU really accelerates the training?

From the code for multi gpu training, you expand the number of samples by the number of gpus (https://github.com/microsoft/Semi-supervised-learning/blob/main/semilearn/core/utils/build.py#L158), but you never decrease the number of iterations, so multi gpu can not really accelearate the training.

From the other repo, they do divide the number of batch size by the number of gpus (https://github.com/LeeDoYup/FixMatch-pytorch/blob/main/train.py#L155), do you have any idea?

Do we really support wandb?

Thanks for your wonderful work. However, during training, I am using --use_wandb to trace my experiments. The wandb is not logging at all.

Tensor shape mismatch

Hi,
I am trying to train FixMatch algorithm for a custom dataset. Following line raises an exception:
trainer.fit(train_lb_loader, train_ulb_loader, eval_loader)

Exception is: The size of tensor a (12545) must match the size of tensor b (257) at non-singleton dimension 1.

Do you have any idea regarding what could be the possible problem?

Customize datasets tutorial cant work

Bug

When running the function get_algorithm , a class like FixMatch will be created which has a super class AlgorithmBase . In this super class, some datasets and dataloaders have to be created with some given public dataset rather than mine.

Reproduce the Bug

Simply using the colab u will find it

Error Messages and Logs

/usr/local/lib/python3.7/dist-packages/semilearn/algorithms/init.py in get_algorithm(args, net_builder, tb_log, logger)
47 net_builder=net_builder,
48 tb_log=tb_log,
---> 49 logger=logger
50 )
51 return alg

/usr/local/lib/python3.7/dist-packages/semilearn/algorithms/fixmatch/fixmatch.py in init(self, args, net_builder, tb_log, logger)
31 """
32 def init(self, args, net_builder, tb_log=None, logger=None):
---> 33 super().init(args, net_builder, tb_log, logger)
34 # fixmatch specificed arguments
35 self.init(T=args.T, p_cutoff=args.p_cutoff, hard_label=args.hard_label)

/usr/local/lib/python3.7/dist-packages/semilearn/core/algorithmbase.py in init(self, args, net_builder, tb_log, logger, **kwargs)
77
78 # build dataset
---> 79 self.dataset_dict = self.set_dataset()
80
81 # build data loader

/usr/local/lib/python3.7/dist-packages/semilearn/core/algorithmbase.py in set_dataset(self)
107 if self.rank != 0 and self.distributed:
108 torch.distributed.barrier()
--> 109 dataset_dict = get_dataset(self.args, self.algorithm, self.args.dataset, self.args.num_labels, self.args.num_classes, self.args.data_dir)
110 self.args.ulb_dest_len = len(dataset_dict['train_ulb']) if dataset_dict['train_ulb'] is not None else 0
111 self.args.lb_dest_len = len(dataset_dict['train_lb'])

/usr/local/lib/python3.7/dist-packages/semilearn/core/utils/build.py in get_dataset(args, algorithm, dataset, num_labels, num_classes, data_dir, include_lb_to_ulb)
103 lb_dset, ulb_dset, eval_dset, test_dset = get_json_dset(args, algorithm, dataset, num_labels, num_classes, data_dir=data_dir, include_lb_to_ulb=include_lb_to_ulb)
104 else:
--> 105 raise NotImplementedError
106
107 dataset_dict = {'train_lb': lb_dset, 'train_ulb': ulb_dset, 'eval': eval_dset, 'test': test_dset}

Which pretrain is used in the CV task?

Thanks to your great work to accelerate the research in SSL. After reading the paper, I have not figure out which pretrained model is used in the CV task. Unsupervised pretrain or supervised pretrain on imagenet?

Reproduction of the DebiasPL

Hi, when I reproduced the result of FixMatch and DebiasPL (the imbalanced algorithm), I found something wrong. It may be a bug.

If I understand correctly, if I set the debiaspl_tau=0, then the training result of fixmatch_debiaspl should be the same with fixmatch.
Thus, I run the two config:

  1. DebiasPL_0: config/classic_cv_imb/fixmatch_debiaspl/fixmatch_debiaspl_cifar10_lb1500_100_ulb3000_100_0.yaml, and add debiaspl_tau=0 into the config.
  2. FixMatch: config/classic_cv_imb/fixmatch/fixmatch_cifar10_lb1500_100_ulb3000_100_0.yaml

However, their performance are different. The FixMatch has a significant accuracy drop in the late phase of training, which DebiasPL_0 does not. Please see the curve below.

The last iteration accuracy and best accuracy are 77.46 / 77.95 (DebiasPL_0) and 74.48 / 77.94 (FixMatch).
image
image

I ran the experiments multiple times and found no clues. Is there a bug or there is something I missed?

Missing key(s) in state_dict

Hi!

I follow the instructions in the README, after training FixMatch the evaluation fails. Running the evaluation as described in the README generated a long list och missing keys and unexpected keys.

Traceback (most recent call last):
File "eval.py", line 54, in
keys = net.load_state_dict(load_state_dict)
File "/home2/johannae/anaconda3/envs/usb/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for WideResNet:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias", [...] "fc.bias".
Unexpected key(s) in state_dict: "cls_token", "pos_embed", "patch_embed.proj.weight", [...] , "head.bias".

Training log

Hi, great work.
Can you provide the training logs as well as the checkpoints for classic cv training?

Does USB offer semi- or weakly- supervised Object detection?

🚀 Feature

The existing WSOD (Weakly Supervised Object Detection) algorithms are very much tailored towards COCO dataset and are very unreliable for custom datasets. And models like omni-detr are very resource demanding requiring very expensive GPUs (32GB GPUs and up) to train properly.

I was wondering if USB offers or will offer Object Detection. In similar ways that DETR and Faster RCNN offer, but weakly supervised.

Motivation

Pitch

Alternatives

Additional context

NLP datasets download link

🚀 Feature

Can you add download links for the NLP datasets that mentioned in the paper?

Motivation

I tried to use huggingface datasets library to download the datasets, and found an inconsistent number of training data (e.g., for AGNews, your paper mentioned the training data number is 100,000, but huggingface reported a 120,000). Even consider the 10,000 validation data, there are still missing 10,000 training data.

Pitch

I can have the exactly same training, validation, and test data for all NLP tasks, and reproduce your results.

The pretrained model of STL-10

In table13, the reported pretrained model of STL-10 is ViT-B-P16-96. But in your code, the pretrained backbone is mae_pretrain_vit_base. I am curious which one is used in your paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.