Coder Social home page Coder Social logo

jisho-api's People

Contributors

friendlypigeon avatar mmatlacz avatar patitotective avatar pedroallenrevez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jisho-api's Issues

Weird import conflict

TLDR:

A file called linecache.py, used by pydantic, tries to import jisho_api's tokenize, causing a cyclic dep. Refactoring the "tokenize" folder's name to "tokens" indeed resolves the issue. The below explains the bug discovery.

FIRST:

When launching the interpreter ($> python3.10) from the terminal in the jisho_api folder, I get this weird error (at launch, before attempting to write anything in actual python):

Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)
<frozen importlib._bootstrap>:241: RuntimeWarning: Cython module failed to patch module with custom type
Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)

Annoyingly, this somewhat breaks the interpreter and makes it so that I can't even test out the code.

However, I've isolated the cause as the following line from jisho_api/tokenize/__init__.py, which, when commented, removes the error.

from .request import Tokens

No idea why it happens, probably not some name conflict, since refactoring Tokens to Token does not seem to solve the issue. Maybe it's some error within the file tokenize/request.py that propagates somehow ?

EDIT: extra info

When running from kanji import Kanji, I get the following.

>>> from kanji import Kanji
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/kanji/__init__.py", line 1, in <module>
    from .request import Kanji
  File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/kanji/request.py", line 8, in <module>
    from pydantic import BaseModel
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
    from pathlib import Path
  File "pydantic/dataclasses.py", line 7, in init pydantic.dataclasses
    import builtins
  File "pydantic/main.py", line 310, in init pydantic.main
  File "pydantic/main.py", line 254, in pydantic.main.ModelMetaclass.__new__
  File "pydantic/class_validators.py", line 197, in pydantic.class_validators.extract_root_validators
  File "/usr/lib/python3.10/inspect.py", line 43, in <module>
    import linecache
  File "/usr/lib/python3.10/linecache.py", line 11, in <module>
    import tokenize
  File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/tokenize/__init__.py", line 1, in <module>
    from .request import Tokens
  File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/tokenize/request.py", line 8, in <module>
    from pydantic import BaseModel
ImportError: cannot import name 'BaseModel' from partially initialized module 'pydantic' (most likely due to a circular import) (/home/fulguritude/.local/lib/python3.10/site-packages/pydantic/__init__.cpython-310-x86_64-linux-gnu.so)

Note the "import tokenize" in linecache.py, might be a name conflict after all.

So it's probably the BaseModel import in tokenize/request.py and/or somewhere else that causes the issue, by causing a conflict with the folder name tokenize itself. Moving it around in said file (up a line or two in the import order), I get slightly different behavior.

At launch:

Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)
<frozen importlib._bootstrap>:241: RuntimeWarning: Cython module failed to patch module with custom type

FINAL:

Refactoring the "tokenize" folder to "tokens" indeed resolve the issue. What happened was probably that linecache tried to import jisho_api's tokenize, causing the cyclic dep.

PIP version not up to date

The version of this libary that is hostet on PIP is still at the old commit f636e0e , meaning the tokenizer issues fixed by #5 are still there

It would be great if it were updated, as people pulling the library right now still don't have a working tokenizer (including me πŸ˜…)

Missing Proper Nouns in sentence

Hi,

I was doing a simple test with the sentence API:

from jisho_api.sentence import Sentence
r = Sentence.request('豑る')

and got the following response:

meta=RequestMeta(status=200)
data=[
> SentenceConfig(japanese='η₯ž(かみ)はθ‡ͺら(γΏγšγ‹)γ«γ‹γŸγ©γ£γ¦δΊΊ(ひと)を剡造(γγ†γžγ†)γ•γ‚ŒγŸ', en_translation='God created man in his own image.'), 
> SentenceConfig(japanese='γ―γƒ¬γƒΌγ‚―γ‚’γ‹γŸγ©γ£γ¦ζ± (いけ)γ‚’ι€ γ£γŸ(぀く)', en_translation="Ken'nichi made a pond in the shape of Lake Geneva.")
]

The second sentence is actually on Jisho: 見ζ—₯γ―γƒ¬γƒΌγ‚―γ‚Έγ‚§γƒ‹γƒΌγƒγ‚’γ‹γŸγ©γ£γ¦ζ± γ‚’ι€ γ£γŸγ€‚

So, looks like it is missing the proper nouns for some reason.... is it expected behaviour?

Search tokenized words?

Hi - this is a really great tool!

Currently, the regular Jisho search (jisho.org/search/~) tokenizes a long phrase into its component words. For example, it splits 昨ζ—₯γ™γη„Όγγ‚’ι£ŸγΉγΎγ—γŸ into 昨ζ—₯/すき焼き/γ‚’/ι£ŸγΉγΎγ—γŸ. (For some reason, #sentence returns no results here.)

Would you consider adding an implementation to iterate through these individual words (returning a Word search for each one)? Each one has a data-word tag on it, so they're easy to pull from the soup.

I'm happy to contribute something like this if you think it'd be useful and if you let me know where it would fit best.

ValidationError in Tokenizer

Hi,
executing the following 2 lines of code:

from jisho_api.tokenize import Tokens

r = Tokens.request("γ γ£γ¦εƒ•γ―ζ˜Ÿγ γ‹γ‚‰")

gives me the following error:

Traceback (most recent call last):
  File "/tmp/tmp.TjV1XDIbOj/test.py", line 3, in <module>
    r = Tokens.request("γ γ£γ¦εƒ•γ―ζ˜Ÿγ γ‹γ‚‰")
  File "/home/finia2na/.local/share/virtualenvs/tmp.TjV1XDIbOj-ckSbLsEx/lib/python3.9/site-packages/jisho_api/tokenize/request.py", line 85, in request
    "data": Tokens.tokens(soup),
  File "/home/finia2na/.local/share/virtualenvs/tmp.TjV1XDIbOj-ckSbLsEx/lib/python3.9/site-packages/jisho_api/tokenize/request.py", line 59, in tokens
    tks.append(TokenConfig(
  File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TokenConfig
pos_tag
  value is not a valid enumeration member; permitted: 'Noun', 'Particle', 'Verb', 'Determiner', 'Unknown' (type=type_error.enum; enum_values=[<PosTag.noun: 'Noun'>, <PosTag.particle: 'Particle'>, <PosTag.verb: 'Verb'>, <PosTag.det: 'Determiner'>, <PosTag.unk: 'Unknown'>])

I runnning Python 3.9 with pipenv on Linux 5.15.

I also tried executing the tokenizer with the example sentence used elsewhere here (昨ζ—₯γ™γη„Όγγ‚’ι£ŸγΉγΎγ—γŸ), which worked fine.

Errors with basic requests

Jisho must have updated their website.

Word requests crash

$>  jisho search word nichi
Traceback (most recent call last):
  File "/home/fulguritude/.local/lib/python3.10/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/__init__.py", line 525, in loads
    return _default_decoder.decode(s)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fulguritude/.local/bin/jisho", line 8, in <module>
    sys.exit(make_cli())
  File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/cli.py", line 209, in make_cli
    main()
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/cli.py", line 147, in request_word
    w = Word.request(word, cache=flag)
  File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/word/request.py", line 74, in request
    r = requests.get(url).json()
  File "/home/fulguritude/.local/lib/python3.10/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Kanji requests fail.

$> jisho search kanji ζ•°
[Error] No kanji found with name ζ•°.

Print list in reverse

Hello,
Absolutely love your implementation, but I was wondering if it would not be better to print the results in reverse or at least have a argument for it?
Currently, if I use jisho search word water it prints:

水 (みず), 水 (み) [JLPT: jlpt-n5]
   2   β”‚ 1. water (esp. cool, fresh water, e.g. drinking water)
   3   β”‚ 2. fluid (esp. in an animal tissue), liquid
[...]
ζ°΄ζ»΄ (すいてき) [JLPT: jlpt-n2]
1. drop of water
2. vessel for replenishing inkstone water
──────────────────────────────
桄水器 (γ˜γ‚‡γ†γ™γ„γ)
1. water filter, water purification system
2. Water purification

The problem is, if the result has multiple objects, the most common case might not be present on screen and then I either have to scroll up or pipe the result into tac or bat, which both ruins the well done highlighting.
I feel like having the most important result be print last is more intuitive.

Adding A wiki

I'm sorry if this is not the proper method for suggesting this, but I think it would be beneficial to add a wiki to help People find exactly what they're looking for with greater ease. Here's a start to mock up for the Kanji portion.


Key Classes:

General request information:
Each request notably returns a [Language part]Request Class with three important values

meta = 200 # this value always returns 200 if the request returns a class
data  : List["""Language Part"""Config] | KanjiConfig # stores most of the data 
Config : type:[BaseConfig] # if you're using this, you're beyond God's help

Seeing as if there's a return meta will always be 200, the sections below focus on what data contains.
image
All requests are akin to searching on Jisho, therefore, in order to help visualize what data we're receiving, there'll be screen shots
Kanji

from jisho_api.kanji import Kanji
ten_thousand = 'δΈ‡'
data = Kanji.request(ten_thousand).data

This code is Equivalent to this search:
image
Here's a picture of the basic values that you'll want to use. Remember that data as written in the code block above is the KanjiRequest class.
Basic Values:
Untitled

It wasn't recognizing a utf-8 encoded txt, now it is, but I get nothing in the data folder

Hi, I would like to download definitions from Jisho.org for about 1000 kanji. If I understood correctly, this tool can do the job.

I've putted every Kanji in a new line, saved it on a utf-8 encoded txt file, and when I run "Jisho scrape kanji name.txt" I get these errors.

C:\Jisho>jisho scrape kanji name.txt
Traceback (most recent call last):
File "c:\users\username\appdata\local\programs\python\python38\lib\runpy.py", li
ne 192, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\username\appdata\local\programs\python\python38\lib\runpy.py", li
ne 85, in run_code
exec(code, run_globals)
File "C:\Users\username\AppData\Local\Programs\Python\Python38\Scripts\jisho.exe
_main
.py", line 7, in
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 209, in make_cli
main()
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1130, in call
return self.main(*args, **kwargs)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 102, in scrape_words
scraper(Word, _load_words(file_path), root_dump)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 89, in _load_words
txt = fp.read()
File "c:\users\username\appdata\local\programs\python\python38\lib\encodings\cp1
252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 39: chara
cter maps to

What can I do to solve this issue?

Issue with scraping programatically using provided code sample

Hi there,

When I run the sample code provided for scraping programmatically:

from jisho_api.word import Word
from jisho_api.cli import scrape

word_requests = scrape(Word, ['water', 'fire'], '~/japanese/test')

I get this error:

Traceback (most recent call last):
  File "/home/frank/scripts/scrape_from_jisho.py", line 7, in <module>
    word_requests = scrape(Word, ['water', 'fire'], '~/japanese/test')
  File "/home/frank/dev/venv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/frank/dev/venv/lib/python3.8/site-packages/click/core.py", line 1042, in main
    args = list(args)
TypeError: 'type' object is not iterable

Help is appreciated. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.