Coder Social home page Coder Social logo

digital_pali_dictionary_sqlite's Introduction

Info

This Python script converts the dpd.zip file (Goldendict data) from Digital Pali Dictionary, into an Sqlite3 database file (dictionary.db ~1.x GB).

How to use

  • System requirements: python3, zip ( sudo apt install zip), pyglossary (pip3 install -U pyglossary)

  • Run the script and answer its prompts:

python3 dpd_to_sqlite.py

Dev Notes

0. Only Roman Pali

All word entries are in Romanized Pali only, with other Pali scripts stripped out to prevent excessive file size in the database.

PALI_ROMAN_CHARS = r'[ĀĪŪṀṂṆḌḶṚṢŚÑṄāīūṁṃṇḍḷṛṣśñṅA-Za-z]'

1. Inline CSS styles

It removes inline CSS styles and JavaScript snippets present in the dicionary, and compiles them into a separate file named temp_extracted_styles.css. This step helps to reduce the size of the output database file.

Each word definition will be wrapped in <section class="dp{number}"> definition here </section> where number is from 1 to n.

It also attempts to parse temp_extracted_styles.css and generates the final output file done_parse_dpd.css. Please note that it is necessary to review these files manually as the inline CSS may undergo changes in future DPD releases.

2. Synonyms

Synonyms are stored in a separate table within the database.

3. SQlite3 db structure of the output dictionary.db

conn.execute('''CREATE TABLE IF NOT EXISTS dictionary
                (idx INTEGER PRIMARY KEY AUTOINCREMENT,
                word TEXT NOT NULL,
                defi TEXT NOT NULL);''')

conn.execute('''CREATE TABLE IF NOT EXISTS synonyms
                (synonym TEXT NOT NULL,
                word TEXT NOT NULL,
                PRIMARY KEY (synonym, word));''')

4. Final check

After running the script, you should re-check done_parse_dpd.css and dictionary.db before integrating them into your application.

You can use this GUI DB Browser for SQLite to view the dictionary.db file and ensure that the data has been properly formatted and organized.

Example usage

The following Kotlin code snippet provides an example of how to perform a lookup in the database.

fun getDefinition(word: String): String {
    // query SQLite database and return definition(s) for word or its synonyms
    val synonymsCursor = db.query(
        "synonyms",
        arrayOf("word"),
        "synonym=?",
        arrayOf(word),
        null,
        null,
        null
    )

    // Add the original word to the list of possible words
    val words = mutableListOf(word)
    with(synonymsCursor) {
        while (moveToNext()) {
            words.add(getString(getColumnIndexOrThrow("word")))
        }
    }

    synonymsCursor.close()

    // query dictionary table for definitions of all possible words
    val definitions = mutableListOf<String>()
    val args = words.joinToString(separator = ",") { "?" }
    val cursor = db.query(
        "dictionary",
        arrayOf("defi"),
        "word IN ($args)",
        words.toTypedArray(),
        null,
        null,
        null
    )

    with(cursor) {
        while (moveToNext()) {
            definitions.add(getString(getColumnIndexOrThrow("defi")))
        }
    }

    cursor.close()

    // combine definitions into a single string
    // return definitions.joinToString(separator = "")

    var counter = 1
    // add number counter if there are multiple definitions
    // combine definitions into a single string with counter inserted after <section class="dp{NUMBER}">

    return if (definitions.size > 1) {
        definitions.joinToString(separator = "\n\n") { def ->
            def.replace(Regex("<section class=\"dp\\d+\">")) { matchResult ->
                "${matchResult.value}${counter++}. "
            }
        }
    } else {
        definitions.joinToString(separator = "")
    }

}

Attributions

digital_pali_dictionary_sqlite's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.