Comments (15)
@asukaminato0721 and I will deal with it.
from fsrs-rs.
I forget to remove outliers in the Rust optimizer:
def remove_outliers(group: pd.DataFrame) -> pd.DataFrame:
grouped_group = (
group.groupby(by=["r_history", "delta_t"], group_keys=False)
.agg({"y": ["mean", "count"]})
.reset_index()
)
sort_index = grouped_group.sort_values(
by=[("y", "count"), "delta_t"], ascending=[True, False]
).index
total = sum(grouped_group[("y", "count")])
has_been_removed = 0
for i in sort_index:
count = grouped_group.loc[i, ("y", "count")]
if has_been_removed + count >= total * 0.05:
break
has_been_removed += count
group = group[
group["delta_t"].isin(
grouped_group[grouped_group[("y", "count")] >= count]["delta_t"]
)
]
return group
df[df["i"] == 2] = (
df[df["i"] == 2]
.groupby(by=["r_history", "t_history"], as_index=False, group_keys=False)
.apply(remove_outliers)
)
df.dropna(inplace=True)
def remove_non_continuous_rows(group):
discontinuity = group["i"].diff().fillna(1).ne(1)
if not discontinuity.any():
return group
else:
first_non_continuous_index = discontinuity.idxmax()
return group.loc[: first_non_continuous_index - 1]
df = df.groupby("card_id", as_index=False, group_keys=False).progress_apply(
remove_non_continuous_rows
)
from fsrs-rs.
from fsrs-rs.
To increase num_epochs
could reduce the errors. But it also will slow down the the optimization.
from fsrs-rs.
The left is generated by Anki. The right is generated by python optimizer.
from fsrs-rs.
To increase
num_epochs
could reduce the errors. But it also will slow down the the optimization.
By the way, how many epochs does the optimizer in the beta version use? Also, does it use splits, with averaging of the parameters afterwards?
And I don't think that the optimization becoming 2 or even 3 times slower is that important. Currently, the optimizer is blazingly fast, and even on a large collection optimization takes a minute or so. I don't think users will be very upset if the optimization takes 2-3 minutes instead of 1 minute.
from fsrs-rs.
By the way, how many epochs does the optimizer in the beta version use? Also, does it use splits, with averaging of the parameters afterwards?
It uses 16 epochs and doesn't have splits because the framework doesn't support splits.
from fsrs-rs.
from fsrs-rs.
Why are we talking about num_epochs here? Wouldn't this problem be solved just by adding the outlier filter to the rust optimizer?
from fsrs-rs.
So the code related to removing outliers will be added in the next release?
from fsrs-rs.
I implement the outlier filter, but the first four weights are still very different from the Python optimizer. So it's not caused by outlier filter. By the way, I think the weights generated by the Python optimizer doesn't fit the forgetting curve well:
A smaller value of stability would be better:
Maybe RMSE is not a good loss function here. I plan to use log loss.
from fsrs-rs.
Maybe RMSE is not a good loss function here. I plan to use log loss.
I would recommend running the benchmark with both RMSE and logloss to determine whether there is a difference in the final RMSE.
from fsrs-rs.
I implement the outlier filter, but the first four weights are still very different from the Python optimizer. So it's not caused by outlier filter. By the way, I think the weights generated by the Python optimizer doesn't fit the forgetting curve well:
The supposed poor fitting of weights produced by the Python optimizer is definitely worth investigating. But, for now, the main focus should be on finding out why the first four weights generated by the Rust and the Python optimizer very different.
from fsrs-rs.
Maybe RMSE is not a good loss function here. I plan to use log loss.
I know this issue is closed, but I'm curious, did you end up testing RMSE vs logloss in pretrain? If so, which one is better?
from fsrs-rs.
I tested it. The log loss is more robust than RMSE in pretrain.
from fsrs-rs.
Related Issues (20)
- [Enhancement] Use more splits while training with larger datasets HOT 3
- Request: Ignore reviews before "Forget" HOT 9
- Enhancement: Include incomplete revlogs even when training HOT 4
- Consider time-frame limitation? HOT 3
- TODO: speed up finding optimal retention via Brent's method
- Better outlier filter for trainset HOT 25
- Skip reviews with time = 0 when calculating average answer times HOT 1
- What's the difference between this repo and rs-fsrs? HOT 1
- User guide HOT 3
- Add an option to turn off outlier filter when benchmark HOT 1
- Inference.rs uses the new power curve, but the default parameters are from v4 HOT 17
- Add a example file HOT 4
- [TODO] feature: extract optimal retention parameters from revlog
- [BUG] Potential inconsistency in optimal_retention.rs HOT 20
- [Question] How to choose "Days to simulate"? HOT 14
- [Feature Request[ Use two different sets of initial parameters, then average out the results HOT 4
- Use the first revlog in the "known" review history for converting SM-2 ivl & ease to memory states HOT 13
- Achieve parity with the Python optimizer HOT 10
- support WASM HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fsrs-rs.