Coder Social home page Coder Social logo

maimemo / ssp-mmc Goto Github PK

View Code? Open in Web Editor NEW
115.0 8.0 13.0 58 KB

A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling

Home Page: https://www.maimemo.com/paper/

License: MIT License

CMake 0.19% C++ 8.93% Python 90.88%
optimal-control research-paper spaced-repetition spaced-repetition-algorithm

ssp-mmc's Introduction

SSP-MMC

Copyright (c) 2022 MaiMemo, Inc. MIT License.

Stochastic-Shortest-Path-Minimize-Memorization-Cost (SSP-MMC) is a spaced repetition scheduling algorithm used to help learners remember more words in MaiMemo, a language learning application in China.

This repository contains a public release of the data and code used for several experiments in the following paper (which introduces SSP-MMC):

Junyao Ye, Jingyong Su, and Yilong Cao. 2022. A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 4381–4390.

You can access this paper at: https://dl.acm.org/doi/10.1145/3534678.3539081?cid=99660547150

When using this data set and/or software, please cite this publication. A BibTeX record is:

@inproceedings{10.1145/3534678.3539081,
author = {Ye, Junyao and Su, Jingyong and Cao, Yilong},
title = {A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling},
year = {2022},
publisher = {ACM},
doi = {10.1145/3534678.3539081},
pages = {4381–4390},
numpages = {10}
}

Software

The file data_preprocessing.py is used to preprocess data for the DHP model.

The file cal_model_param.py contains the DHP model and HLR model.

The file model/utils.py saves the parameters of the DHP model for training and simulation.

The file algo/main.cpp contains a Cpp implementation of SSP-MMC, which aims at finding the optimal policy.

The file simulator.py provides an environment for comparing different scheduling algorithms.

Workflow

  1. Run data_preprocessing.py -> halflife_for_fit.tsv
  2. Run cal_model_param.py -> intercept_ and coef_ for the DHP model
  3. Save the parameters to the function cal_recall_halflife and cal_forget_halflife in model/utils.py and the function cal_next_recall_halflife in algo/main.cpp
  4. Run algo/main.cpp -> optimal policy in algo/result/
  5. Run simulator.py to compare the SSP-MMC with several baselines.

Data Set and Format

The dataset is available on Dataverse (1.6 GB). This is a 7zipped TSV file containing our experiments' 220 million MaiMemo student memory behavior logs.

The columns are as follows:

  • u - student user ID who reviewed the word (anonymized)

  • w - spelling of the word

  • i - total times the user has reviewed the word

  • d - difficulty of the word

  • t_history - interval sequence of the historic reviews

  • r_history - recall sequence of the historic reviews

  • delta_t - time elapsed from the last review

  • r - result of the review

  • p_recall - probability of recall

  • total_cnt - number of users who did the same memory behavior

ssp-mmc's People

Contributors

celend avatar l-m-sherlock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssp-mmc's Issues

关于单词的难度d是如何调整的?

作者您好,您的工作给了我很大的启发,但是我有一点不明白。
论文中提到,一次Forget行为之后,单词的难度d会被提高𝜃3。这个参数是如何被训练得到的呢,我在代码中没能找到对应的部分。
在后面的SSP-MMC算法中,我注意到单词的难度被划分为了18个等级,每次Forget会在原来的难度上加2。这样设置背后考虑的原因是什么呢?期待您的解答

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.