Comments (9)
Could this perhaps be solved by just sorting the input Vec<FSRSItem>
before feeding it to the batcher?
from fsrs-rs.
Could this perhaps be solved by just sorting the input Vec before feeding it to the batcher?
Partially, we can sort the Vec by length or other properties, and set shuffle=False for the batcher. But the batch size is still constant for all batch. It's not flexible.
from fsrs-rs.
It looks like we can define our own BatchStrategy implementation if we need more control - I imagine we could input a Vec<Vec<FSRSItem>>
and have our strategy just return one of the inner vecs each time. I wonder how much difference it would make in practice through? With items sorted by length, the necessary padding shouldn't be huge, and I (perhaps naively) imagine that the actual model fitting work would dwarf the time it takes to prepare the input tensors.
Edit: I wasn't considering the extra work that would need to be done at fitting time with the padded values. Maybe the impact would be larger than I expected.
from fsrs-rs.
Yeah, I haven't test the performance of burn in long sequence. But it's very slow for long sequence in tinygrad. It costs me 1 minutes for entire epoch. But 90% items with short sequences have been done in 20 seconds.
This feature is in low priority. We can implement #20 and #5 at first.
from fsrs-rs.
On a moderately-sized dataset, the padding doesn't seem to make much of a difference - I see 11.75s vs 11.76s in the best of three runs.
diff --git a/src/convertor.rs b/src/convertor.rs
index 8b71ae6..099aff9 100644
--- a/src/convertor.rs
+++ b/src/convertor.rs
@@ -1,5 +1,6 @@
use chrono::prelude::*;
use chrono_tz::Tz;
+use itertools::Itertools;
use rusqlite::{Connection, Result, Row};
use std::collections::HashMap;
@@ -200,11 +201,13 @@ fn convert_to_fsrs_items(
pub fn anki_to_fsrs() -> Vec<FSRSItem> {
let revlogs = read_collection();
let revlogs_per_card = group_by_cid(revlogs);
- revlogs_per_card
+ let mut revlogs = revlogs_per_card
.into_iter()
.filter_map(|entries| convert_to_fsrs_items(entries, 4, Tz::Asia__Shanghai))
.flatten()
- .collect()
+ .collect_vec();
+ revlogs.sort_by_key(|r| r.reviews.len());
+ revlogs
}
#[cfg(test)]
from fsrs-rs.
On a moderately-sized dataset, the padding doesn't seem to make much of a difference - I see 11.75s vs 11.76s in the best of three runs.
I guess you forget to remove shuffle?
from fsrs-rs.
In my collection, the original convertor without shuffle costs 70.70s
and the sorted convertor without shuffle costs 21.99s
.
from fsrs-rs.
You can test this code: https://github.com/open-spaced-repetition/fsrs-optimizer-burn/tree/Feat/sort-FSRSItem-by-length
from fsrs-rs.
You're right, I forgot about shuffling. That's a nice extra performance win!
from fsrs-rs.
Related Issues (20)
- [Enhancement] Use more splits while training with larger datasets HOT 3
- Request: Ignore reviews before "Forget" HOT 9
- Enhancement: Include incomplete revlogs even when training HOT 4
- Consider time-frame limitation? HOT 3
- TODO: speed up finding optimal retention via Brent's method
- Better outlier filter for trainset HOT 25
- Skip reviews with time = 0 when calculating average answer times HOT 1
- What's the difference between this repo and rs-fsrs? HOT 1
- User guide HOT 3
- Add an option to turn off outlier filter when benchmark HOT 1
- Inference.rs uses the new power curve, but the default parameters are from v4 HOT 17
- Add a example file HOT 4
- Reference usage? HOT 5
- Pre-training Only when the number of reviews is less than 1000 HOT 5
- [BUG] Potential inconsistency in optimal_retention.rs HOT 20
- [Question] How to choose "Days to simulate"? HOT 14
- [Feature Request[ Use two different sets of initial parameters, then average out the results HOT 4
- Use the first revlog in the "known" review history for converting SM-2 ivl & ease to memory states HOT 13
- Achieve parity with the Python optimizer HOT 10
- support WASM HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fsrs-rs.