Comments (4)
We optimize the student model only based on the losses of the tokens which the student model predicts with high confidence. The corresponding line in the code is
Line 88 in 32f2698
from bond.
Thanks for the reply. But, in the above line 'pred_labels' are coming from teacher model and you are getting the mask of confident predictions of teacher model, right ? What I understood was, you check which token predictions of teacher are confident and then calculate loss of student for those tokens only . Please correct me if I'm wrong.
from bond.
What you understand is correct. I was saying that "we optimize the student model only based on the losses of the tokens which the teacher model predicts with high confidence". Sorry for the typo.
from bond.
Thanks @cliang1453 . Just one more query. My task is also token level classification. Would it make sense to just utilize mean teacher for training. Something like :
Learn a model in stage 1 with less data
Then use that model to initialize teacher and student for second stage.
Give teacher all unlabeled data and it will generate pseudo labels for student to train on(calculating cross entropy loss on confident predictions of teacher). Use consistency cost between soft labels of student and teacher, and then update teacher with exponential moving average of student's weights. Continue this for later epochs.
It would be really helpful if I can just get a comment on this.
Thanks in advance!
from bond.
Related Issues (20)
- Testing new dataset HOT 2
- Comparison with Positive-unlabeled learning
- question on stage 2 learning rate
- Trying BOND on new datasets and languages HOT 3
- RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered HOT 1
- Distant label generation code HOT 15
- Results reproduction HOT 5
- Questions About variable `self_training_hp_label`
- The file `dataset/BC5CDR-chem/turn.py` is missing
- What if i would like to use Electra model from huggingface transformers? HOT 6
- Could you please provide the codes for matching distant labels?
- About the gazetteers information and distant label generation code HOT 4
- one question about "tags_hp" in the preprocessing stage HOT 7
- two questions about your paper HOT 1
- Reproducing distant labels with gazetteer information HOT 1
- Questions about "soft labels" HOT 2
- Question about the results HOT 1
- Can the NER model change from BERT+Linear_layer to BERT+CRF? HOT 7
- what is the format of the dataset? how to convert any new dataset into this format? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bond.