Coder Social home page Coder Social logo

tassa-yoniso-manasi-karoto / irgen Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 14.46 MB

Incremental reading card generator for Anki supporting Wikipedia & local HTML files

License: GNU General Public License v3.0

Go 94.02% CSS 5.98%
anki flashcards incremental-learning incremental-reading learning reading wikipedia

irgen's Introduction

irgen

Project status: pre-release. Incremental reading card generator for Anki supporting Wikipedia & local HTML files

At its core it just splits the HTML file using the heading tags into many notes.

However it is capable of creating exceptionally high context notes that would otherwise be too costly timewise to make by means of gathering individually for each note-to-be the surrounding pictures & tables of the document according to user-defined configuration.

How to use

Prerequisites

This is a CLI tool that will create a .txt file (tsv) that can be imported by Anki. AnkiConnect support will come eventually.

.docx / .odt / .epub can't be processed directly but you can save them as HTML using your favorite word processor, which can in turn be passed to irgen.

Because this is a standalone tool you need to edit your note's fields, front/back templates and CSS by hand first. You need to create a "RealTitle" and "Context" field. Your fields should be as follows:

Copy paste the content of files from the note directory of this repo to their corresponding template field in Anki.

The binary will parse a config.json file located in the directory where it itself is located. Before running irgen you need to provide the path to the directory where your media files are located in the config.json. Irgen will automatically download or copy images from the document to this location. This directory is named "collection.media". Please see the anki docs for how to find it.

On Windows you may want to install Notepad++ to edit that JSON file. Please be aware the syntax of JSON requires that the "\" in the path to your collection.media directory to be escaped using another "\" as shown here:

Running irgen

On Windows open the folder where irgen is located using the file manager and click on "File" (upper left corner), "Open Windows PowerShell".

A command line interface will open where you can now pass an URL or the path to a local HTML file like this:

config.json

Taking this local HTML file as reference, I will explain the entries of config.json. Let's take as reference for my examples the note-to-be located under "least important" title and that contains Lorem ipsum with the picture of a snake:

(h1, h2... are the heading tags that will appear in the raw HTML you don't need to add them to the text, this is just to illustrate)

Briefly, these are the keys of the config.json and how they will shape the output :

  • CollectionMedia : the path to your "collection.media" folder
  • DestDir : you can optionally set a default destination directory for the .txt file
  • MaxTitles : this is the max number of headings that will appear in Anki. With it set to 3, the card of my example will get a RealTitle like this "quite important: less important: least important", omitting "Very important title".
  • Fn and FnScope work as pairs. FnScope is the relative position above the heading of a note-to-be. In the example above 1 would correspond to the heading "less important" located one level above in importance to the heading of the note that contains Lorem ipsum with the snake. Currently only FromSuperior and FromSuperiorAndDescendants are implemented. FromSuperior will retrieve only the image of the camera where as FromSuperiorAndDescendants will capture both the image of the camera and the one with the helicopter.
  • ResXMax and ResYMax: on wikipedia each image is available in various resolutions and irgen will automatically download the highest quality available but you can limit the maximal resolution accepted using these values.

TODO

  • ADD VERSIONING
  • move PrefForHiRes to extractors
  • "pinned" user-defined group of high-value image that can be displayed none
  • ankiconnect support

About

I originally started this project many, many years ago. It lived first as a shell script, then as python script, then as a very poorly written Go codebase. I am providing it here after a near complete rewrite for public interest.

irgen's People

Contributors

tassa-yoniso-manasi-karoto avatar

Stargazers

Siddhartha Golu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.