Coder Social home page Coder Social logo

mahir's Introduction

Mahir DOI

עָלָה מִבָּבֶל וְהוּא־סֹפֵר מָהִיר בְּתוֹרַת משֶׁה

Corpus-driven vocabulary review program for use with Text-Fabric and Jupyter notebook.


Example of Mahir Study Session in the Hebrew Bible


Looking at the gloss-side of a term

What is Mahir?

Mahir is a corpus-driven vocabulary review program with an emphasis on contextual learning of glosses. The student of ancient corpora such as the Hebrew Bible or Greek New Testament is often faced with two sub-optimal choices when it comes to vocabulary acquisition. They may rely on traditional flash card systems (Anki, Quizlet, etc.), in which they must learn a word as a lexicalized form stripped of context. This technique of rote memorization is an unnatural way of learning language. It is difficult, slow, and ultimately not very helpful for idiomatic terms that strongly rely on context. The other option is to simply try to acquire vocabulary from rapid reading of text. While this is better from the standpoint of contextuality, it loses the benefit of systematically covering wide and diverse terms. It also makes the student a slave to the lexicon, which itself dulls the enjoyment of the reading process. This is especially problematic for highly poetic and abstract texts (e.g. Job in the Hebrew Bible).

The corpus-driven approach blends the contextuality of reading the text with the systematicity of flash-card review. This is made possible by using Text-Fabric. For any given term in a vocabulary set, Mahir randomly selects a verse/line where the term appears. The "front side" of the "card" is the plain-text of the verse with the term at hand highlighted. The user can then score the term based on familiarity. In difficult cases, the user may request a different context, which is again selected at random.

Review Strategy

The review strategy of Mahir is designed for reviewing large amounts of vocabulary terms while still learning new ones. This is currently achieved through a rudimentary ranking system and cycle period. The user sets a given number of sessions in which they want to cycle through all of their "common terms". Conversely, a "common term" is one which is seen once every set number of days. Currently, I have these common terms set as "score 3". For instance, if a cycle is set to 30 days, then each score 3 term will be seen once in that period. Each score 2 term is seen every 4 days in the period; each score 1 is every other day. Score 0 are a set number of new terms that are each seen every session until they are upgraded to 1; new score 0 terms then move up to take their place. Finally, as I've acquired more and more terms, I've realized the need for super-cycle scores. So score 4 terms are seen every other cycle, score 5 every 2 cycles, score 6 every 4 cycles, and so on.

To-Do: The rudimentary strategy outlined above is sub-optimal. Scores should be more automated, based on a spaced-repetition algorithm similar to Anki. One of the motivations of the current system is to keep the reviewer in full control. But maybe it is better to conceive of an automatic algorithm as a kind of autopilot. Allow intervention when desired. But otherwise, adjust terms in a way that is optimal for memorization. This issue requires further thought and work.

Set-Up and Use

Your corpus will need to be in the Text-Fabric corpus library. This requires 1) a corpus in TF format (instructions here), and 2) an app written to fit the corpus (instructions here). Finally, each of these elements need to be stored in a Github repository. The first should be in its own repo with a top-level directory called tf. The second must be stored under the annotation/ organization's github. Please contact me for details. Of course, you may also simply rely on the numerous corpora already available in Text-Fabric.

You will also need a vocabulary .json file formatted in a way Mahir expects. See the sample vocabulary json to see how to do this.

Every term has a score, which tells Mahir how often it should be shown. Scores range from 0-4 but higher scores can be configured. The higher a score, the less often it is seen. For example, score 0 terms that fit in the daily quota are shown every session so they can be learned, score 1 terms are shown every other session. Score 3 terms are seen every cycle period. A cycle period is defined as an X number of sessions. Score 4 terms are super-cycle terms, they are only seen once every other cycle. These parameters can be tweaked in the Session object of iMahir.py.

Running Mahir

From within a Jupyter notebook, invoke:

from iMahir import loadStudy

study = loadStudy('sample_vocab/hebrew.json')

study.learn()

The final call will invoke an interactive study session that shows terms in context.

Progress Notes

See here.


© 2020 Cody Kingham, MIT License

mahir's People

Contributors

codykingham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

alvaromolano

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.