Coder Social home page Coder Social logo

hrishikeshrt / pycdsl Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 1.0 133 KB

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL)

Home Page: https://pypi.org/project/PyCDSL/

License: Other

Makefile 3.02% Python 96.98%
python3 lexicon corpus-management dictionary-search sanskrit-language sanskrit-dictionaries console-interface programmatic-interface

pycdsl's Introduction

PyCDSL

image

Documentation Status

Python Version Support

GitHub Issues

GitHub Followers

Twitter Followers

PyCDSL is a python interface to Cologne Digital Sanskrit Lexicon (CDSL).

Features

  • CDSL Corpus Management (Download, Update, Access)
  • Unified Programmable Interface to access all dictionaries available at CDSL
  • Command Line Interfaces for quick and easy search
    • Console Command: cdsl
    • REPL Interface (powered by cmd2)
  • Extensive support for transliteration using indic-transliteration module
  • Search by key, value or both

Install

To install PyCDSL, run this command in your terminal:

$ pip install PyCDSL

Usage

PyCDSL can be used in a python project, as a console command and as an interactive REPL interface.

Using PyCDSL in a Project

Import PyCDSL in a project:

import pycdsl

Create a CDSLCorpus Instance:

# Default installation at ~/cdsl_data
CDSL = pycdsl.CDSLCorpus()

# Custom installation path can be specified with argument `data_dir`
# e.g. CDSL = pycdsl.CDSLCorpus(data_dir="custom-installation-path")

# Custom transliteration schemes for input and output can be specified
# with arguments `input_scheme` and `output_scheme`.
# Values should be valid names of the schemes from `indic-transliteration`
# If unspecified, `DEFAULT_SCHEME` (`devanagari`) would be used.
# e.g. CDSL = pycdsl.CDSLCorpus(input_scheme="hk", output_scheme="iast")

# Search mode can be specified to search values by key or value or both.
# Valid options for `search_mode` are "key", "value", "both".
# These are also stored in convenience variables, and it is recommended
# to use these instead of string literals.
# The variables are, SEARCH_MODE_KEY, SEARCH_MODE_VALUE, SEARCH_MODE_BOTH.
# The variable SEARCH_MODES will always hold the list of all valid modes.
# The variable DEFAULT_SEARCH_MODE will alway point to the default mode.
# e.g. CDSL = pycdsl.CDSLCorpus(search_mode=pycdsl.SEARCH_MODE_VALUE)

Setup default dictionaries (["MW", "MWE", "AP90", "AE"]):

# Note: Any additional dictionaries that are installed will also be loaded.
CDSL.setup()

# For loading specific dictionaries only,
# a list of dictionary IDs can be passed to the setup function
# e.g. CDSL.setup(["VCP"])

# If `update` flag is True, update check is performed for every dictionary
# in `dict_ids` and if available, the updated version is installed
# e.g. CDSL.setup(["MW"], update=True)

Search in a dictionary:

# Any loaded dictionary is accessible using `[]` operator and dictionary ID
# e.g. CDSL["MW"]
results = CDSL["MW"].search("राम")

# Alternatively, they are also accessible like an attribute
# e.g. CDSL.MW, CDSL.MWE etc.
results = CDSL.MW.search("राम")

# Note: Attribute access and Item access both use the `dicts` property
# under the hood to access the dictionaries.
# >>> CDSL.MW is CDSL.dicts["MW"]
# True
# >>> CDSL["MW"] is CDSL.dicts["MW"]
# True

# `input_scheme` and `output_scheme` can be specified to the search function.
CDSL.MW.search("kṛṣṇa", input_scheme="iast", output_scheme="itrans")[0]
# <MWEntry: 55142: kRRiShNa = 1. kRRiShNa/ mf(A/)n. black, dark, dark-blue (opposed to shveta/, shukla/, ro/hita, and aruNa/), RV.; AV. &c.>

# Search using wildcard (i.e. `*`)
# e.g. To search all etnries starting with kRRi (i.e. कृ)
CDSL.MW.search("kRRi*", input_scheme="itrans")

# Limit and/or Offset the number of search results, e.g.
# Show the first 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10)
# Show the next 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10, offset=10)

# Search using a different search mode
CDSL.MW.search("हृषीकेश", mode=pycdsl.SEARCH_MODE_VALUE)

Access an entry by ID:

# Access entry by `entry_id` using `[]` operator
entry = CDSL.MW["263938"]

# Alternatively, use `CDSLDict.entry` function
entry = CDSL.MW.entry("263938")

# Note: Access using `[]` operator calls the `CDSLDict.entry` function.
# The difference is that, in case an `entry_id` is absent,
# `[]` based access will raise a `KeyError`
# `CDSLDict.entry` will return None and log a `logging.ERROR` level message

# >>> entry
# <MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

# Output transliteration scheme can also be provided

CDSL.MW.entry("263938", output_scheme="iast")
# <MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

Entry class also supports transliteration after creation. Thus, any entry fetched either through search() function or through entry() function can be transliterated.

Transliterate a single entry:

CDSL.MW.entry("263938").transliterate("slp1")
# <MWEntry: 263938: hfzIkeSa = lord of the senses (said of Manas), BhP.>

Change transliteration scheme for a dictionary:

CDSL.MW.set_scheme(input_scheme="itrans")
CDSL.MW.search("rAma")

Change search mode for a dictionary:

CDSL.MW.set_search_mode(mode="value")
CDSL.MW.search("hRRiShIkesha")

Classes CDSLCorpus and CDSLDict are iterable.

  • Iterating over CDSLCorpus yields loaded dictionary instances.
  • Iterating over CDSLDict yields entries in that dictionary.
# Iteration over a `CDSLCorpus` instance

for cdsl_dict in CDSL:
    print(type(cdsl_dict))
    print(cdsl_dict)
    break

# <class 'pycdsl.lexicon.CDSLDict'>
# CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')

# Iteration over a `CDSLDict` isntance
for entry in CDSL.MW:
    print(type(entry))
    print(entry)
    break

# <class 'pycdsl.models.MWEntry'>
# <MWEntry: 1: अ = 1. अ   the first letter of the alphabet>

Note: Please check the documentation of modules in the PyCDSL Package for more detailed information on available classes and functions.

https://pycdsl.readthedocs.io/en/latest/pycdsl.html

Using Console Interface of PyCDSL

Help to the Console Interface:

usage: cdsl [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]]
            [-sm SEARCH_MODE] [-is INPUT_SCHEME] [-os OUTPUT_SCHEME]
            [-hf HISTORY_FILE] [-sc STARTUP_SCRIPT]
            [-u] [-dbg] [-v]

Access dictionaries from Cologne Digital Sanskrit Lexicon (CDSL)

optional arguments:
  -h, --help            show this help message and exit
  -i, --interactive     start in an interactive REPL mode
  -s SEARCH, --search SEARCH
                        search pattern (ignored if `--interactive` mode is set)
  -p PATH, --path PATH  path to installation
  -d DICTS [DICTS ...], --dicts DICTS [DICTS ...]
                        dictionary id(s)
  -sm SEARCH_MODE, --search-mode SEARCH_MODE
                        search mode
  -is INPUT_SCHEME, --input-scheme INPUT_SCHEME
                        input transliteration scheme
  -os OUTPUT_SCHEME, --output-scheme OUTPUT_SCHEME
                        output transliteration scheme
  -hf HISTORY_FILE, --history-file HISTORY_FILE
                        path to the history file
  -sc STARTUP_SCRIPT, --startup-script STARTUP_SCRIPT
                        path to the startup script
  -u, --update          update specified dictionaries
  -dbg, --debug         turn debug mode on
  -v, --version         show version and exit

Common Usage:

$ cdsl -d MW AP90 -s हृषीकेश

Note: Arguments for specifying installation path, dictionary IDs, input and output transliteration schemes are valid for both interactive REPL shell and non-interactive console command.

Using REPL Interface of PyCDSL

REPL Interface is powered by cmd2, and thus supports persistent history, start-up script, and several other rich features.

To use REPL Interface to Cologne Digital Sanskrit Lexicon (CDSL):

$ cdsl -i

cmd2 Inherited REPL Features

  • Persistent History across sessions is maintained at ~/.cdsl_history.
  • If Start-up Script is present (~/.cdslrc), the commands (one per line) are run at the start-up.
  • Customized shortcuts for several useful commands, such as ! for shell, / for search and $ for show.
  • Aliases can be created on runtime.
  • Output Redirection works like the standard console, e.g. command args > output.txt will write the output of command to output.txt. Similarly, >> can be used to append the output.
  • Clipboard Integration is supported through Pyperclip. If the output file name is omitted, the output is copied to the clipboard, e.g., command args >. The output can even be appended to clipboard by command args >>.

References

Note: The locations of history file and start-up script can be customized through CLI options.

REPL Session Example

Cologne Sanskrit Digital Lexicon (CDSL)
---------------------------------------
Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.
Type any keyword to search in the selected dictionaries. (help or ? for list of options)
Loaded 4 dictionaries.

(CDSL::None) help -v

Documented commands (use 'help -v' for verbose/'help <topic>' for details):

Core
======================================================================================================
available             Display a list of dictionaries available in CDSL
dicts                 Display a list of dictionaries available locally
info                  Display information about active dictionaries
search                Search in the active dictionaries
show                  Show a specific entry by ID
stats                 Display statistics about active dictionaries
update                Update loaded dictionaries
use                   Load the specified dictionaries from CDSL.
                      If not available locally, they will be installed first.

Utility
======================================================================================================
alias                 Manage aliases
help                  List available commands or provide detailed help for a specific command
history               View, run, edit, save, or clear previously entered commands
macro                 Manage macros
quit                  Exit this application
run_script            Run commands in script file that is encoded as either ASCII or UTF-8 text
set                   Set a settable parameter or show current settings of parameters
shell                 Execute a command as if at the OS prompt
shortcuts             List available shortcuts
version               Show the current version of PyCDSL

(CDSL::None) help available
Display a list of dictionaries available in CDSL

(CDSL::None) help search

Usage: search [-h] [--limit LIMIT] [--offset OFFSET] pattern

    Search in the active dictionaries

    Note
    ----
    * Searching in the active dictionaries is also the default action.
    * In general, we do not need to use this command explicitly unless we
      want to search the command keywords, such as, `available` `search`,
      `version`, `help` etc. in the active dictionaries.


positional arguments:
pattern          search pattern

optional arguments:
  -h, --help       show this help message and exit
  --limit LIMIT    limit results
  --offset OFFSET  skip results

(CDSL::None) help dicts
Display a list of dictionaries available locally

(CDSL::None) dicts
CDSLDict(id='AP90', date='1890', name='Apte Practical Sanskrit-English Dictionary')
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
CDSLDict(id='MWE', date='1851', name='Monier-Williams English-Sanskrit Dictionary')
CDSLDict(id='AE', date='1920', name="Apte Student's English-Sanskrit Dictionary")

(CDSL::None) update
Data for dictionary 'AP90' is up-to-date.
Data for dictionary 'MW' is up-to-date.
Data for dictionary 'MWE' is up-to-date.
Data for dictionary 'AE' is up-to-date.

(CDSL::None) use MW
Using 1 dictionaries: ['MW']

(CDSL::MW) हृषीकेश

Found 6 results in MW.

<MWEntry: 263922: हृषीकेश = हृषी-केश a   See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) show 263938

<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) show 263938 --show-data

<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

Data:
<H3A><h><key1>hfzIkeSa<\/key1><key2>hfzIkeSa<\/key2><\/h>
<body>  lord of the senses (said of <s1 slp1="manas">Manas<\/s1>), <ls>BhP.<\/ls><info lex="inh"\/><\/body>
<tail><L>263938<\/L><pc>1303,2<\/pc><\/tail><\/H3A>

(CDSL::MW) $263938

<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) $263938 > output.txt
(CDSL::MW) !cat output.txt

<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) set input_scheme itrans
input_scheme - was: 'devanagari'
now: 'itrans'

(CDSL::MW) hRRiSIkesha

Found 6 results in MW.

<MWEntry: 263922: हृषीकेश = हृषी-केश a   See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>

(CDSL::MW) set output_scheme iast
output_scheme - was: 'devanagari'
now: 'iast'

(CDSL::MW) hRRiSIkesha

Found 6 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

(CDSL::MW) set limit 2
limit - was: 50
now: 2

(CDSL::MW) hRRiSIkesha

Found 2 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>

(CDSL::MW) set limit -1
limit - was: 2
now: None

(CDSL::MW) set search_mode value
search_mode - was: 'key'
now: 'value'

(CDSL::MW) hRRiSIkesha

Found 1 results in MW.

<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>

(CDSL::MW) set search_mode both
search_mode - was: 'value'
now: 'both'

(CDSL::MW) hRRiSIkesha

Found 7 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>

(CDSL::MW) info
Total 1 dictionaries are active.
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')

(CDSL::MW) stats
Total 1 dictionaries are active.
---
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
{'total': 287627, 'distinct': 194044, 'top': [('कृष्ण', 50), ('शिव', 46), ('विजय', 46), ('पुष्कर', 45), ('काल', 39), ('सिद्ध', 39), ('योग', 39), ('चित्र', 38), ('शुचि', 36), ('वसु', 36)]}

(CDSL::MW) use WIL

Downloading 'WIL.web.zip' ... (8394727 bytes)
100%|██████████████████████████████████████████████████████████████████████████████████████| 8.39M/8.39M [00:21<00:00, 386kB/s]
Successfully downloaded 'WIL.web.zip' from 'https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/downloads/wilweb1.zip'.
Using 1 dictionaries: ['WIL']

(CDSL::WIL)

(CDSL::WIL) use WIL MW
Using 2 dictionaries: ['WIL', 'MW']

(CDSL::WIL,MW) hRRiSIkesha

Found 1 results in WIL.

<WILEntry: 44411: hṛṣīkeśa = hṛṣīkeśa  m. (-śaḥ) KṚṢṆA or VIṢṆU. E. hṛṣīka an organ of sense, and īśa lord.>

Found 6 results in MW.

<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a   See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>

(CDSL::WIL,MW) use MW AP90 MWE AE
Using 4 dictionaries: ['MW', 'AP90', 'MWE', 'AE']

(CDSL::MW+3) use --all
Using 5 dictionaries: ['AP90', 'MW', 'MWE', 'AE', 'WIL']

(CDSL::AP90+3) use --none
Using 0 dictionaries: []

(CDSL::None) quit

Credits

This application uses data from Cologne Digital Sanskrit Dictionaries, Cologne University.

pycdsl's People

Contributors

hrishikeshrt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

drdhaval2785

pycdsl's Issues

normalization of headwords - sAmAnya and sAmAnyaM

  • PyCDSL version: 0.9.0
  • Python version: 3.10.10
  • Operating System: Manjaro Linux Talos 22.1.0

Problem

Different dictionaries use different headword conventions.
This creates difficulty in accessing the data.
I am giving two examples of the same word from VCP and SKD.
VCP uses 'sAmAnya' and SKD uses 'sAmAnyaM'.
Both refer to the same object, but because of the difference in the headword, the data is not accessible.
Fortunately, this is a solved problem for CDSL data.
See https://github.com/sanskrit-lexicon/hwnorm1/blob/master/sanhw1/hwnorm1c.txt for an amalgamated version of different headwords.

[create@dhaval-pc ~]$ cdsl -d VCP SKD -s sAmAnya

Found 1 results in the dictionary 'VCP'.

<VCPEntry: 45964: सामान्य = सामान्य  न० समानस्य भावः ष्यञ् । १ सादृश्यप्रयोजकधर्मे यथा मुखं पद्ममिव सुन्दरमित्यादौ सौन्दर्य्यादि । समा- नमेव स्वार्थे ष्यञ् । द्रव्यगुणकर्मसु तुल्यतया स्थितायां २ जातौ भाषा० । जातिशब्दे ३०९२ दृश्यम् । सामान्यञ्च द्विविधं सखण्डमखण्डञ्च सामान्यलक्षणा- शब्दे दीधित्युक्तिः । “सामान्यं विशेष इति बुद्ध्यपेक्षम्” कणा० । अनुगतधर्मत्वम् सामान्यमिति तल्लक्षणं तेन ३ अनु- गतधर्मस्वरूपं यथा प्रतियोगितासामान्ये यद्धर्मावच्छिन्न- त्वादीति दीधितिः । ४ अधिकविषयकत्वे यथा ब्राह्मणाय दधि दीयतां कौण्डिन्याय तक्रम् इत्यादौ दधिदानस्या- धिकब्राह्मणविषयकता । सामान्यविषयकशास्त्रञ्च विशेष- शास्त्रेण बाध्यते यथा “मा हिंस्यात् सर्वाभूतानीति” हिंसानिषेधः सर्वविषयः “वायव्यं श्वेतमालभेत” इत्यादि विशेषः हिंसाशास्त्रं तेन सामन्यथास्त्रं वैधेतरविषयएव प्रसरति । ५ अर्थालङ्कारभेदे ४०७ पृ० दृश्यम् ।>

[create@dhaval-pc ~]$ cdsl -d VCP SKD -s sAmAnyaM

Found 2 results in the dictionary 'SKD'.

<SKDEntry: 38760: सामान्यं = सामान्यं , क्ली, (समान एव । स्वार्थे ष्यञ् ।) जातिः । यथा, — “जातिर्जातञ्च सामान्यं व्यक्तिस्तु पृथगात्मिका” इत्यमरः । १ । ४ । ३० ॥ तद्द्विविधं यथा, भाषापरिच्छेदे । “सामान्यं द्विविधं प्रोक्तं परञ्चापरमेव च । द्रव्यादित्रिकवृत्तिस्तु सत्ता परतयोच्यते ॥ परभिन्ना च या जातिः सैवापरतयोच्यते । व्यापकत्वात् परापि स्यात् व्याप्यत्वादपरापि च द्रव्यत्वादिकजातिस्तु परापरतयोच्यते ॥” तल्लक्षणं यथा । नित्यत्वे सत्यनेकसमवेतत्वम । अनेकसमवेतत्वं संयोगादीनामप्यस्ति अतः सत्यन्तं नित्यत्वे सति समवेतत्वंगगनपरिमाणा दीनामप्यस्ति अत उक्तं अनेकेति नित्यत्वे सति अनेकवृत्तित्वमत्यन्ताभावस्याप्यस्ति अतोवृत्ति- सामान्यमुपेक्ष्य समवेतत्वमुक्तम् । एकव्यक्ति- वृत्तिस्तु न जातिः । तथा चोक्तम् । “व्यक्तेरभेदस्तुल्यत्वं सङ्करोऽथानवस्थितिः । रूपहानिरसम्बन्धो जातिवाधकसंग्रहः ॥” एकव्यक्तिकत्वात् आकाशत्वं न जातिः । तुल्य- वृत्तिकत्वात् घटत्वं कलसत्वं न जातिद्वयम् । संकीर्णत्वात् भूतत्वं मूर्त्तत्वं न जातिः । अन- वस्थाभयात् सामान्यत्वं न जातिः । विशेषस्य व्यावृत्तस्वभावस्य रूपहानिः स्यादतो विशेषत्वं न जातिः । समवायसम्बन्धाभावात् समवायो न जातिः । द्रव्यादित्रिकवृत्तिरिति परत्वं अधिकदेशवृत्तित्वं अपरत्वमल्पदेशवृत्तित्वम् । सकलजात्यपेक्षया अधिकदेशवृत्तित्वात् सत्तायाः परत्वम् । एतद्बोधनाय द्रव्यादीति । तदपेक्षया चान्यासां जातीनां अपरत्वम् । परभिन्ना सत्ता- भिन्ना । व्यापकत्वात् परापि स्यात् व्याप्यत्वाद- परापि च । पृथिवीत्वाद्यपेक्षया व्यापकत्वात् अधिकदेशवृत्तित्वात् द्रव्यत्वस्य परत्वम् । सत्तापे क्षया अल्पदेशवृत्तित्वाद्द्रव्यत्वस्य अपरत्वञ्च । तथा च धर्म्मद्वयसमावेशात् उभयमविरुद्धम् । इति सिद्धान्तमुक्तावली ॥ (सादृश्यम् । समा- [Page5-334-a+ 52] नत्वम् । यथा, महाभारते । १२ । २२८ । ४ । “सामान्यमृषिभिर्गत्वा ब्रह्मलोकनिवासिभिः । ब्रह्मेवामितदीप्तौजाः शान्तपाप्मा महातपाः । विचचार यथाकामं त्रिषु लोकेषु नारदः ॥”)>
<SKDEntry: 38761: सामान्यं = सामान्यं , त्रि, (समानस्य भावः । समान + ष्यञ् ।) अनेकसम्बन्ध्ये कवस्तु । साधारणम् । इत्यमरः । ३ । १ । ८२ ॥ यथा, देवलः । “सामान्यं पुत्त्रकन्यानां मृतायां स्त्रीधनं विदुः । अप्रजायां हरेद्भर्त्ता माता भ्राता पितापि वा ॥” इति दायतत्त्वम् ॥ (यथा च कुमारे । ७ । ४४ । “एकैव मूर्त्तिर्बिभिदे त्रिधा सा सामान्यमेषां प्रथमावरत्वम् । विष्णोर्हरस्तस्य हरिः कदाचित् वेधास्तयोस्तावपि धातुराद्यौ ॥”)>

Allow download of specified dictionaries during setup.

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi Linux 6

Description

Allow user to download only selected dictionaries.
Currently it seems that the user is forced to download all the dictionaries when he presses CDSL.setup().
Sometimes there are users who only want to download specific dictionaries, and not download unwanted dictionaries to maintain their system clutter free.

What I Did

Python 3.6.9 (default, Dec  8 2021, 21:08:43) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycdsl
>>> CDSL = pycdsl.CDSLCorpus()
>>> CDSL.setup()
100%|███████████████████████████████████████████████████████████| 10.1M/10.1M [00:06<00:00, 1.56MB/s]
100%|███████████████████████████████████████████████████████████| 36.7M/36.7M [00:20<00:00, 1.77MB/s]
100%|███████████████████████████████████████████████████████████| 7.90M/7.90M [00:04<00:00, 1.67MB/s]
100%|███████████████████████████████████████████████████████████| 4.49M/4.49M [00:02<00:00, 1.66MB/s]
True

Not able to search in more than one dictionary in CLI

  • PyCDSL version: 3.9.1
  • Python version: 0.3.3
  • Operating System: Bodhi LInux 6

Description

I was trying to search across two dictionaries MW and AP90 via the commandline tool cdsl.
I could search in MW and AP90 individually, but could not search in both simultaneously.
Maybe I could not understand the usage instruction properly.
I tried to pass both MW and AP90 as space separated.
If there is a problem, kindly fix it.
If I misunderstood the syntax, kindly post an example of multi-dictionary search in the documentation and change the title of this report from bug to documentation.

What I Did

dhaval@dhaval-Aspire-5750:~$ cdsl -d MW -is slp1 -os devanagari -s Davala
<MWEntry: 100564: धवल = धवल mf(आ)n. (fr. √ 2. धाव्? cf. Uṇ. i, 108 Sch.) white, dazzling wh°, Var.; Kāv.; Pur. &c.>
<MWEntry: 100565: धवल = handsome, beautiful, L.>
<MWEntry: 100566: धवल = धवल m. white (the colour), L.>
<MWEntry: 100567: धवल = a kind of dove, Bhpr.>
<MWEntry: 100568: धवल = an old or excellent bull, Hcar.>
<MWEntry: 100569: धवल = a kind of camphor, L.>
<MWEntry: 100570: धवल = Anogeissus Latifolia, L.>
<MWEntry: 100571: धवल = (in music) N. of a Rāga>
<MWEntry: 100572: धवल = N. of a man, Kathās.>
<MWEntry: 100573: धवल = of one of the elephants of the quarters, R.>
<MWEntry: 100574: धवल = of a dog>
<MWEntry: 100575: धवल = धवल f(आ and ई).  a white cow, Kād.>
dhaval@dhaval-Aspire-5750:~$ cdsl -d AP90 -is slp1 -os devanagari -s Davala
<AP90Entry: 15994: धवल = धवल a. [धवं कंपं लाति ला-क; Tv.] 1 White; धवलातपत्र, धवलगृहं, धवलवस्त्रं &c.  2 Handsome.  3 Clear, pure. — लः 1 The white colour.  2 An excellent bull.  3 China camphor (चीनकर्पूर).  4 N. of a tree (धव). — लं  Whitepepper. — ला A woman with a white complexion. — ली A white cow; (  धवला also).  Comp. — उत्पलं the white water-lily (said to open at  moonrise). — गिरिः N. of the highest peak of the Himālaya mountain. — गृहं a house whitened with  chunam, a palace. — पक्षः  1 a goose.  2 the bright half of a lunar month. — मृत्तिका chalk.>
dhaval@dhaval-Aspire-5750:~$ cdsl -d AP90 MW -is slp1 -os devanagari -s Davala
<AP90Entry: 15994: धवल = धवल a. [धवं कंपं लाति ला-क; Tv.] 1 White; धवलातपत्र, धवलगृहं, धवलवस्त्रं &c.  2 Handsome.  3 Clear, pure. — लः 1 The white colour.  2 An excellent bull.  3 China camphor (चीनकर्पूर).  4 N. of a tree (धव). — लं  Whitepepper. — ला A woman with a white complexion. — ली A white cow; (  धवला also).  Comp. — उत्पलं the white water-lily (said to open at  moonrise). — गिरिः N. of the highest peak of the Himālaya mountain. — गृहं a house whitened with  chunam, a palace. — पक्षः  1 a goose.  2 the bright half of a lunar month. — मृत्तिका chalk.>
dhaval@dhaval-Aspire-5750:~$ cdsl -d MW AP90 -is slp1 -os devanagari -s Davala
<AP90Entry: 15994: धवल = धवल a. [धवं कंपं लाति ला-क; Tv.] 1 White; धवलातपत्र, धवलगृहं, धवलवस्त्रं &c.  2 Handsome.  3 Clear, pure. — लः 1 The white colour.  2 An excellent bull.  3 China camphor (चीनकर्पूर).  4 N. of a tree (धव). — लं  Whitepepper. — ला A woman with a white complexion. — ली A white cow; (  धवला also).  Comp. — उत्पलं the white water-lily (said to open at  moonrise). — गिरिः N. of the highest peak of the Himālaya mountain. — गृहं a house whitened with  chunam, a palace. — पक्षः  1 a goose.  2 the bright half of a lunar month. — मृत्तिका chalk.>

Allow providing model_map to CDSLCorpus instance

  • PyCDSL version: 0.6.0
  • Python version: 3.8.11
  • Operating System: Ubuntu 18.04.6

Problem

  • Currently, CDSLDict has a provision to specify custom models using model_map, however, the CDSLCorpus class that manages dictionaries has no provision for it.
  • It makes sense to remove model_map from CDSLDict and only keep it for CDSLCorpus while keeping only lexicon_model and entry_model for CDSLDict.

Feature Description

  • Add a way of specifying custom models (i.e. in CDSLCorpus) from the perspective of a programmer.
  • Remove model_map from CDSLDict, add it to CDSLCorpus

Reasons

It makes sense from the end-user perspective to never really need to instantiate CDSLDict by hand, and let CDSLCorpus do it, and to that end, it makes sense that arguments required for CDSLDict are specified through the corpus class.

model_map is basically a map of lexicon and entry models per dictionary. Semantically, one dictionary does not need to be aware of the maps for other dictionaries. The map makes more sense on the corpus-management level.

Request for a CLI tool based on Click

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi Linu 6

Description

I love the REPL session which you have created with the python package.
I would also love to have a simple CLI tool based on Click python package too.

Something like cdsl [--dict] MW [--input_transliteration=slp1] [--output_transliteration=slp1] rAma would print the MW entries related to rAma on the terminal itself.

What I Did

Feature request.

Python 3.6 Compatibility

Issues #3 and #5 are related to Python 3.7+ dependence.
Needs more investigation, however, the primary cause seems to be some changes re and cmd modules.

Data related question

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: Not tried
  • Python version: Not tried
  • Operating System: Not tried

Description

Congratulations for creating a tool for accessing Cologne Digital Sanskrit Dictionaries. I am currently involved in maintaining CDSL data at github. I would be greatly interested in helping the frontend tools which use CDSL data.

I am interested in knowing which data you use for downloading / accessing the data when someone invokes the package.
How frequently do you plan to update the data? And how?

What I Did

Just asked a question.

Use cmd2.Cmd instead of cmd.Cmd

  • PyCDSL version: 0.8.0
  • Python version: 3.8.11
  • Operating System: Ubuntu 18.04.6 LTS

Feature Description

Using cmd2.Cmd as a base class for shell.CDSLShell instead of shell.BasicShell, which in itself is an extension of cmd.Cmd.

Reasons

cmd2 is full of useful features.
This change will add all the features of shell.BasicShell and much more, such as support for persistent history, running basic set of commands at startup (such as setting dictionary choices, schemes etc), transcripts (towards unit testing), copy to clipboard etc.

Caution: There are a few pitfalls, the way default() and cmdloop() etc are handled in cmd2. So this needs more attention.

PROs:

  • Many exciting features

CONs:

  • Extra dependency

BUG: cannot update dictionaries without giving a search term

  • PyCDSL version: 0.6.0
  • Python version: 3.8.11
  • Operating System: Ubuntu 18.04

Description

While trying to update dictionaries from CLI, an error is thrown that a search term must also be specified.
Ideally, we should just be able to update dictionaries by specifying cdsl -u or cdsl -u -d MW etc.

What I Did

$ cdsl -u
Must specify a search pattern in non-interactive mode.

$ cdsl -u -d MW
Must specify a search pattern in non-interactive mode.

transliterate error - unexpected argument

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.2.0
  • Python version: 3.9.1
  • Operating System: Bodhi Linux 6

Description

I have upgraded my python3 to python3.9 which was residing on my system.
I tried to check entry 177290 in MW in REPL mode.

What I Did

$ cdsl
Cologne Sanskrit Digital Lexicon (CDSL)
---------------------------------------
Install or load a lexicon by typing `use <DICT_ID>` e.g. `use MW`.
Type any keyword to search in the selected lexicon. (help or ? for list of options)
Loaded 4 dictionaries.
(CDSL::None) use MW
(CDSL::MW) show 177290
Traceback (most recent call last):
  File "/home/dhaval/.local/bin/cdsl", line 8, in <module>
    sys.exit(main())
  File "/home/dhaval/.local/lib/python3.9/site-packages/pycdsl/cli.py", line 222, in main
    cdsl.cmdloop()
  File "/home/dhaval/.local/lib/python3.9/site-packages/pycdsl/cli.py", line 210, in cmdloop
    super(self.__class__, self).cmdloop(intro="")
  File "/usr/local/lib/python3.9/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/usr/local/lib/python3.9/cmd.py", line 217, in onecmd
    return func(arg)
  File "/home/dhaval/.local/lib/python3.9/site-packages/pycdsl/cli.py", line 170, in do_show
    result.transliterate(
TypeError: transliterate() got an unexpected keyword argument 'transliterate_key'

Search across multiple dictionaries or all dictionaries

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.2.1
  • Python version: 3.9.1
  • Operating System: Bodhi Linux 6

Description

  1. Many times users want to see dictionary entry for given word in ALL dictionaries on their sustem. Give them an option "ALL" over and above present MW, MWE, AP90 etc.

  2. Other than this, give them option to pass a comma separated list of dictionaries.
    If I pass MW,AP90 I should get entries from these two dictionaries.

What I Did

Feature request.

Plan for stable release

  • PyCDSL version: 0.6.3
  • Python version: 3.8.11
  • Operating System: Ubuntu 18.04.6

We want to plan for a stable release. Most of the features seem to be stable now.
This issue is to discuss and track any specific features that one may like added in the stable version.

@drdhaval2785 Please comment. Also, if there are others who might be interested, please feel free to encourage them to test it out and provide feedback.

I have added a testing suit as a requirement for the stable release, but I am not sure if it is strictly required, just "good to have". Unit tests and integration tests do exist for individual modules, which should be sufficient.

Checklist

  • Complete testing suit
    • Tests for Lexicon
    • Tests for Corpus
    • Test for Utils
    • Test for CLI (help wanted)
    • Test for Shell (help wanted)
  • Bugfixes

Global transliteration preferences

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.2.1
  • Python version: 3.9.1
  • Operating System: Bodhi Linux 6

Description

Allow global setting of input transliteration and output transliteration over and above dictionary / entry level control.
CDSL.input_transliteration('itrans')
CDSL.output_transliteration('devanagari')
This should take all queries in itrans and give all output in devanagari.
Users have their transliteration predilictions cutting across dictionaries. They usually use the same schemes across dictionaries.

What I Did

Feature request.

re.Match error

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi Linux 6

Description

I tried to run the example given in the docs.

What I Did

>>> results = CDSL.MW.search("राम")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dhaval/.local/lib/python3.6/site-packages/pycdsl/pycdsl.py", line 281, in search
    for result in search_query
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 6923, in __iter__
    self.execute()
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 1911, in inner
    return method(self, database, *args, **kwargs)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 1982, in execute
    return self._execute(database)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 2155, in _execute
    cursor = database.execute(self)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 3172, in execute
    sql, params = ctx.sql(query).query()
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 614, in sql
    return obj.__sql__(self)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 2424, in __sql__
    ctx.literal(' WHERE ').sql(self._where)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 614, in sql
    return obj.__sql__(self)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 1483, in __sql__
    .sql(self.rhs))
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 618, in sql
    return self.sql(Value(obj))
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 614, in sql
    return obj.__sql__(self)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 1383, in __sql__
    return ctx.value(self.value, self.converter)
  File "/home/dhaval/.local/lib/python3.6/site-packages/peewee.py", line 630, in value
    value = self.state.converter(value)
  File "/home/dhaval/.local/lib/python3.6/site-packages/pycdsl/models.py", line 63, in db_value
    return to_internal(value)
  File "/home/dhaval/.local/lib/python3.6/site-packages/pycdsl/models.py", line 36, in to_internal
    if isinstance(matchobj_or_str, re.Match):
AttributeError: module 're' has no attribute 'Match'

dictionary as an argument instead of attribute

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.2.1
  • Python version: 3.9.1
  • Operating System: Bodhi linux 6

Description

Instead of results = CDSL.MW.search("राम")
Kindly provide a functionality like results = CDSL.search(query="राम", dictionary="MW")

This would be required because I can easily pass 'MW' string I got from CLI, without much hassle.
Otherwise it gives me CDSL."MW" instead of CDSL.MW required in present state of affairs.

What I Did

Feature request.

Error while running REPL

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi LInux 6

Description

I tried to use MW and search for rAma (slp1).
It gave me errors.

What I Did

(CDSL::None) use MW
(CDSL::MW) scheme slp1
Input scheme: slp1
(CDSL::MW) rAma
Traceback (most recent call last):
  File "/usr/lib/python3.6/cmd.py", line 214, in onecmd
    func = getattr(self, 'do_' + cmd)
AttributeError: 'CDSLShell' object has no attribute 'do_rAma'

Write Unit Tests and Ingtegration Tests

Feature Description

Write unit tests and integration tests using pytest.

Reasons

Tests help ensure code correctness and compatibility as the application grows.

Checklist

  • Tests for corpus
  • Tests for lexicon
  • Tests for utils
  • Overall Integration Tests
    • Tests for cli
    • Tests for shell

BUG: Search by value is also showing key matches

  • PyCDSL version: 0.7.0 (d792eb3)
  • Python version: 3.8.8
  • Operating System: Ubuntu 18.04.6 LTS

Description

When searching by value, both matches by key and value are shown.

What I Did

cdsl -sm value -s हृषीकेश -d MW

Credit to CDSL

  • PyCDSL version:
  • Python version:
  • Operating System:

Description

Add credits to CDSL as per credits policy of Cologne website.

Documentation

Fair use requirement.

CLI minor edit

  • PyCDSL version:
  • Python version:
  • Operating System:

Description

usage: CLI for PyCDSL [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]] [-is INPUT_SCHEME] [-os OUTPUT_SCHEME] [-u] [-dbg]

needs to be changed to

usage: cdsl [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]] [-is INPUT_SCHEME] [-os OUTPUT_SCHEME] [-u] [-dbg]

Search Modes - by key, by value, by both

  • PyCDSL version: 0.7.0
  • Python version: 3.8.8
  • Operating System: Ubuntu 18.04.6 LTS

Problem

Was trying to find words with specific meaning / similar meaning.

Feature Description

  • search() function can have a mode argument, with possible values say 'key', 'value', 'both'

Reasons

One way is to use the English dictionaries, but use case might be more, where we just want to find related words etc.
It can be a good addition to allow search by value or key or both.

Use JSON Data from sanksrit-lexicon repository

If you want data to be kept updated without much hassle, you can use data available at https://github.com/sanskrit-lexicon/csl-json/tree/main/ashtadhyayi.com. It is in JSON format. www.ashtadhyayi.com uses it for frontend.
This would have additional facility to the user to see the scanned page and also view the scanned page from dictionary entry itself.

The structure is simple. It gives an idea for every headword and from that ID you can search the dictionary entry.

Originally posted by @drdhaval2785 in #1 (comment)

Request for option for output transliteration

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi Linux 6

Description

Current docs tell that I can tweak the input transliteration.
There are people who would also want to see the output in a specific transliteration of their choice.
They may want to see Devanagari content in IAST / SLP or ITRANS or any other transliteration of their choice.
Kind request to allow this option in python package, REPL session and proposed Click CLI version - all three places.
https://pypi.org/project/indic-transliteration/ may be of help going to and fro various transliteration schemes.

What I Did

Feature request.

search() command for CDSLCorpus class

Consider adding these methods to CDSLCorpus class to facilitate usage like `CDSL.search('search-term', dict='MW') etc like usage as described in #12 (comment)

I am not particularly fond of the usage regarding entry as requested in #12 (comment) as I don't see a specific use case where this would be required. Entry numbers are not unique across dictionaries, so if one wants to query using entry ID, one also has a dictionary in mind, and in such case, one might as well do CDSL.dicts[dict_id].entry.

Docs for downloading new dictionaries or updating downloaded dictionaries

  • Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: 0.1.9
  • Python version: 3.6.9
  • Operating System: Bodhi Linux 9

Description

Currently the docs are missing on this poing.
I am seeing 4 dictionaries downloaded.
Whereas the message you have copy pasted on the docs have 23 dictionaries.
It should be documented how the user can download the dictionaries of his choice or update the dictionaries.

What I Did

$ cdsl
Cologne Sanskrit Digital Lexicon (CDSL)
Type any keyword to search in the selected lexicon. (help or ? for list of options)
Loaded 4 dictionaries.

(CDSL::None) 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.