Comments (10)
Thanks @dae. Just wanted to say on behalf of all my language learning friends/med school friends, we appreciate all you've done with Anki. :-)
from fsrs-rs.
Hey again @L-M-Sherlock,
still familiarizing myself with burn, seems like a neat project. This one looks pretty simple, poking around their repo it looks we can modify the grads in the step function before returning the TrainOutput struct.
Their example:
https://github.com/burn-rs/burn/blob/2fefc820996085c7e96763d96437876075e0f6ba/examples/text-generation/src/model.rs#L97-L106
So the step here:
https://github.com/open-spaced-repetition/fsrs-optimizer-burn/blob/938cc9286469c8b3e565109d95bf3e146172266f/src/training.rs#L46-L57
would change to something like this:
impl<B: ADBackend<FloatElem = f32>> TrainStep<FSRSBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: FSRSBatch<B>) -> TrainOutput<ClassificationOutput<B>> {
let item = self.forward_classification(
batch.t_historys,
batch.r_historys,
batch.delta_ts,
batch.labels,
);
let grads = item.loss.backward();
// Change the grads to zero for the weights we want to freeze
TrainOutput::new(self, grads, item)
}
}
From what I can tell when we run learner.fit, it creates an TrainEpoch struct, and calls the run method, which calls the model.step above. It uses the grads that come from the model.step and passes them into optim.step which just does the optimizer specific stuff. So I'd imagine if we zero the gradients out in model.step it should result in frozen weights?
TrainEpoch run method:
https://github.com/burn-rs/burn/blob/2fefc820996085c7e96763d96437876075e0f6ba/burn-train/src/learner/epoch.rs#L108-L121
I might be misunderstanding some stuff, but these are my findings. I'd like to try and build an implementation, but I don't have the SQLite file for testing. How might I get my hands on that?
from fsrs-rs.
The test file is available at https://github.com/open-spaced-repetition/fsrs-optimizer-burn/files/12394182/collection.anki21.zip
from fsrs-rs.
@L-M-Sherlock at the current moment we are a little stuck. We can get the grads in tensor form and modify it similar to how you are doing it in Pytorch, but once we modify it we can't get it back into the Gradients
type which is required to pass into TrainOutput. I've talked to Nathaniel on discord about it and he said he could add a grad_replace
method for this particular use case.
// training.rs l52
impl<B: ADBackend<FloatElem = f32>> TrainStep<FSRSBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: FSRSBatch<B>) -> TrainOutput<ClassificationOutput<B>> {
let item = self.forward_classification(
batch.t_historys,
batch.r_historys,
batch.delta_ts,
batch.labels
);
let mut gradients = item.loss.backward();
let grad_tensor = self.w.grad(&gradients).unwrap();
let updated_grad_tensor = grad_tensor.slice_assign([0..4], Tensor::zeros([4]));
// Can't get grad_tensor back into B::Gradients type, so Nathaniel said he could create something like this:
self.w.grad_replace(&mut gradients, updated_grad_tensor);
TrainOutput::new(self, gradients, item)
}
}
from fsrs-rs.
grad_replace was added in tracel-ai/burn#688
The clipping might better be done after tracel-ai/burn#689 is merged?
from fsrs-rs.
grad_replace was added in burn-rs/burn#688
Awesome, I'll make a PR for this later today
from fsrs-rs.
@L-M-Sherlock @dae is this something you want configured in the ModelConfig struct or is this something that should be baked right into model.step? From what I can tell in the Pytorch implementation it looks like it's just hardcoded in.
from fsrs-rs.
I'll need to defer to @L-M-Sherlock on whether it should be hard-coded or would make sense to be changeable by the user
from fsrs-rs.
It should be hard-coded, because it's related to the details of optimization, which not allow users to hack in.
from fsrs-rs.
By the way, freeze weights
is only used after we have implemented the pre-train stage. In current version, the training process shouldn't apply freeze weights
.
from fsrs-rs.
Related Issues (20)
- [Enhancement] Use more splits while training with larger datasets HOT 3
- Request: Ignore reviews before "Forget" HOT 9
- Enhancement: Include incomplete revlogs even when training HOT 4
- Consider time-frame limitation? HOT 3
- TODO: speed up finding optimal retention via Brent's method
- Better outlier filter for trainset HOT 25
- Skip reviews with time = 0 when calculating average answer times HOT 1
- What's the difference between this repo and rs-fsrs? HOT 1
- User guide HOT 3
- Add an option to turn off outlier filter when benchmark HOT 1
- Inference.rs uses the new power curve, but the default parameters are from v4 HOT 17
- Add a example file HOT 4
- Reference usage? HOT 5
- Pre-training Only when the number of reviews is less than 1000 HOT 5
- [BUG] Potential inconsistency in optimal_retention.rs HOT 20
- [Question] How to choose "Days to simulate"? HOT 14
- [Feature Request[ Use two different sets of initial parameters, then average out the results HOT 4
- Use the first revlog in the "known" review history for converting SM-2 ivl & ease to memory states HOT 13
- Achieve parity with the Python optimizer HOT 10
- support WASM HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fsrs-rs.