Comments (6)
Distributedsampler will replicate the data to fulfill training iterations in one epoch
from semi-supervised-learning.
from semi-supervised-learning.
In semi-supervised learning, figuring out what counts as an "epoch" is tricky. Classical semi-supervised methods, as implemented in this USB package, use batches that contain both labeled and unlabeled examples in a particular ratio (often called
PS : I may be wrong, but I believe that the definition of one epoch being everywhere 1024 steps in USB might originate from this FixMatch original choice on Cifar-100
from semi-supervised-learning.
Thank you very much for clarifying my doubts.
I now have a clear understanding of the code organization and program execution flow in this repository, and I have read through all the recent papers on semi-supervised learning. I have gained a preliminary understanding of the methods used in the field of semi-supervised learning: supervised loss + auxiliary loss + pseudo-labeling loss. Building upon this foundation, the 'USB' code has done an excellent job abstracting the workflow for semi-supervised learning. You and your team have done great work.
Regarding data loading, with my own practice and your guidance, I believe I have grasped it quite well. Currently, I have divided my dataset into training set, validation set, and test set in a ratio of 7:1:2. In the training set, 20% of the data is labeled while 80% is unlabeled. Since in the 'train_step' function of the program, data is loaded based on labeled data as a reference point, all I need to do is divide the size of my labeled data by 'train_batch_size' to obtain 'num_train_iters'. This ensures that each labeled data will be used once within one epoch only. Based on this method of data loading, I am also pursuing my own work.
Once again, thank you for your explanations!
from semi-supervised-learning.
Thank you very much for clarifying my doubts.
I now have a clear understanding of the code organization and program execution flow in this repository, and I have read through all the recent papers on semi-supervised learning. I have gained a preliminary understanding of the methods used in the field of semi-supervised learning: supervised loss + auxiliary loss + pseudo-labeling loss. Building upon this foundation, the 'USB' code has done an excellent job abstracting the workflow for semi-supervised learning. You and your team have done great work.
Regarding data loading, with my own practice and your guidance, I believe I have grasped it quite well. Currently, I have divided my dataset into training set, validation set, and test set in a ratio of 7:1:2. In the training set, 20% of the data is labeled while 80% is unlabeled. Since in the 'train_step' function of the program, data is loaded based on labeled data as a reference point, all I need to do is divide the size of my labeled data by 'train_batch_size' to obtain 'num_train_iters'. This ensures that each labeled data will be used once within one epoch only. Based on this method of data loading, I am also pursuing my own work.
Once again, thank you for your explanations!
Thank you for opening this issue, it has enlightened me. As someone new to the field, I'm currently facing difficulty understanding the execution flow within this repository, particularly regarding how the label ratio is utilized in training the SSL algorithms && deciding how to choose the num_labels parameter. Is there any intuition behind this?.
It would be immensely helpful if you could provide a screenshot of the configuration used in the example you mentioned in your comment.
Additionally, I'm curious about your preferred method for running the code. Did you rely on the notebooks such as Beginner_Example.ipynb or Custom_Dataset.ipynb found in the notebooks folder, or is there a better approach?
Any guidance you can offer would be greatly appreciated. Thanks a lot.
from semi-supervised-learning.
Related Issues (20)
- Can not reproduce the results of freeMatch HOT 9
- Training problem HOT 4
- Time series analysis HOT 5
- FreeMatch SAF loss is a negative value? HOT 4
- Question about the Augmentation on Two-Moon dataset used in FreeMatch and SoftMatch HOT 3
- Bug when running the instance code HOT 2
- how to use timm.model as net HOT 2
- Customize datasets tutorial cant work HOT 2
- I can't load model correctly HOT 1
- colab code can not run in Custom_Dataset.ipynb” HOT 1
- 为什么我在自己的数据集上面,100-600-1200不同的有标签数量训练之后,在测试集的效果是一样的差。 HOT 2
- SAT.ass HOT 1
- About config, how to decide the hyperparameters? HOT 3
- Issues related to voice datasets HOT 1
- R..net..\..m..;M// HOT 1
- How to decide the number of labels in experiments HOT 1
- Questions about batch normalization handling. HOT 1
- Can this run a multilabel problem HOT 1
- Testing models on audio datasets HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from semi-supervised-learning.