pedroallenrevez / jisho-api Goto Github PK
View Code? Open in Web Editor NEWA jisho.org API made in Python
License: Apache License 2.0
A jisho.org API made in Python
License: Apache License 2.0
TLDR:
A file called linecache.py, used by pydantic, tries to import jisho_api's tokenize, causing a cyclic dep. Refactoring the "tokenize" folder's name to "tokens" indeed resolves the issue. The below explains the bug discovery.
FIRST:
When launching the interpreter ($> python3.10
) from the terminal in the jisho_api
folder, I get this weird error (at launch, before attempting to write anything in actual python):
Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)
<frozen importlib._bootstrap>:241: RuntimeWarning: Cython module failed to patch module with custom type
Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)
Annoyingly, this somewhat breaks the interpreter and makes it so that I can't even test out the code.
However, I've isolated the cause as the following line from jisho_api/tokenize/__init__.py
, which, when commented, removes the error.
from .request import Tokens
No idea why it happens, probably not some name conflict, since refactoring Tokens to Token does not seem to solve the issue. Maybe it's some error within the file tokenize/request.py that propagates somehow ?
EDIT: extra info
When running from kanji import Kanji
, I get the following.
>>> from kanji import Kanji
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/kanji/__init__.py", line 1, in <module>
from .request import Kanji
File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/kanji/request.py", line 8, in <module>
from pydantic import BaseModel
File "pydantic/__init__.py", line 2, in init pydantic.__init__
from pathlib import Path
File "pydantic/dataclasses.py", line 7, in init pydantic.dataclasses
import builtins
File "pydantic/main.py", line 310, in init pydantic.main
File "pydantic/main.py", line 254, in pydantic.main.ModelMetaclass.__new__
File "pydantic/class_validators.py", line 197, in pydantic.class_validators.extract_root_validators
File "/usr/lib/python3.10/inspect.py", line 43, in <module>
import linecache
File "/usr/lib/python3.10/linecache.py", line 11, in <module>
import tokenize
File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/tokenize/__init__.py", line 1, in <module>
from .request import Tokens
File "/home/fulguritude/Workspace/Clones/jisho-api/jisho_api/tokenize/request.py", line 8, in <module>
from pydantic import BaseModel
ImportError: cannot import name 'BaseModel' from partially initialized module 'pydantic' (most likely due to a circular import) (/home/fulguritude/.local/lib/python3.10/site-packages/pydantic/__init__.cpython-310-x86_64-linux-gnu.so)
Note the "import tokenize" in linecache.py, might be a name conflict after all.
So it's probably the BaseModel import in tokenize/request.py
and/or somewhere else that causes the issue, by causing a conflict with the folder name tokenize
itself. Moving it around in said file (up a line or two in the import order), I get slightly different behavior.
At launch:
Exception ignored in: <module 'inspect' from '/usr/lib/python3.10/inspect.py'>
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: partially initialized module 'inspect' has no attribute 'isgenerator' (most likely due to a circular import)
<frozen importlib._bootstrap>:241: RuntimeWarning: Cython module failed to patch module with custom type
FINAL:
Refactoring the "tokenize" folder to "tokens" indeed resolve the issue. What happened was probably that linecache tried to import jisho_api's tokenize, causing the cyclic dep.
Hi,
I was doing a simple test with the sentence API:
from jisho_api.sentence import Sentence
r = Sentence.request('豑γ')
and got the following response:
meta=RequestMeta(status=200)
data=[
> SentenceConfig(japanese='η₯(γγΏ)γ―θͺγ(γΏγγ)γ«γγγ©γ£γ¦δΊΊ(γ²γ¨)γε΅ι (γγγγ)γγγ', en_translation='God created man in his own image.'),
> SentenceConfig(japanese='γ―γ¬γΌγ―γγγγ©γ£γ¦ζ± (γγ)γι γ£γ(γ€γ)', en_translation="Ken'nichi made a pond in the shape of Lake Geneva.")
]
The second sentence is actually on Jisho: θ¦ζ₯γ―γ¬γΌγ―γΈγ§γγΌγγγγγ©γ£γ¦ζ± γι γ£γγ
So, looks like it is missing the proper nouns for some reason.... is it expected behaviour?
Hi - this is a really great tool!
Currently, the regular Jisho search (jisho.org/search/~
) tokenizes a long phrase into its component words. For example, it splits ζ¨ζ₯γγηΌγγι£γΉγΎγγ
into ζ¨ζ₯/γγηΌγ/γ/ι£γΉγΎγγ
. (For some reason, #sentence returns no results here.)
Would you consider adding an implementation to iterate through these individual words (returning a Word
search for each one)? Each one has a data-word
tag on it, so they're easy to pull from the soup.
I'm happy to contribute something like this if you think it'd be useful and if you let me know where it would fit best.
Hi,
executing the following 2 lines of code:
from jisho_api.tokenize import Tokens
r = Tokens.request("γ γ£γ¦εγ―ζγ γγ")
gives me the following error:
Traceback (most recent call last):
File "/tmp/tmp.TjV1XDIbOj/test.py", line 3, in <module>
r = Tokens.request("γ γ£γ¦εγ―ζγ γγ")
File "/home/finia2na/.local/share/virtualenvs/tmp.TjV1XDIbOj-ckSbLsEx/lib/python3.9/site-packages/jisho_api/tokenize/request.py", line 85, in request
"data": Tokens.tokens(soup),
File "/home/finia2na/.local/share/virtualenvs/tmp.TjV1XDIbOj-ckSbLsEx/lib/python3.9/site-packages/jisho_api/tokenize/request.py", line 59, in tokens
tks.append(TokenConfig(
File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TokenConfig
pos_tag
value is not a valid enumeration member; permitted: 'Noun', 'Particle', 'Verb', 'Determiner', 'Unknown' (type=type_error.enum; enum_values=[<PosTag.noun: 'Noun'>, <PosTag.particle: 'Particle'>, <PosTag.verb: 'Verb'>, <PosTag.det: 'Determiner'>, <PosTag.unk: 'Unknown'>])
I runnning Python 3.9 with pipenv on Linux 5.15.
I also tried executing the tokenizer with the example sentence used elsewhere here (ζ¨ζ₯γγηΌγγι£γΉγΎγγ), which worked fine.
Jisho must have updated their website.
Word requests crash
$> jisho search word nichi
Traceback (most recent call last):
File "/home/fulguritude/.local/lib/python3.10/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/home/fulguritude/.local/lib/python3.10/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fulguritude/.local/bin/jisho", line 8, in <module>
sys.exit(make_cli())
File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/cli.py", line 209, in make_cli
main()
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/fulguritude/.local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/cli.py", line 147, in request_word
w = Word.request(word, cache=flag)
File "/home/fulguritude/.local/lib/python3.10/site-packages/jisho_api/word/request.py", line 74, in request
r = requests.get(url).json()
File "/home/fulguritude/.local/lib/python3.10/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Kanji requests fail.
$> jisho search kanji ζ°
[Error] No kanji found with name ζ°.
Hello,
Absolutely love your implementation, but I was wondering if it would not be better to print the results in reverse or at least have a argument for it?
Currently, if I use jisho search word water
it prints:
ζ°΄ (γΏγ), ζ°΄ (γΏ) [JLPT: jlpt-n5]
2 β 1. water (esp. cool, fresh water, e.g. drinking water)
3 β 2. fluid (esp. in an animal tissue), liquid
[...]
ζ°΄ζ»΄ (γγγ¦γ) [JLPT: jlpt-n2]
1. drop of water
2. vessel for replenishing inkstone water
ββββββββββββββββββββββββββββββ
ζ΅ζ°΄ε¨ (γγγγγγ)
1. water filter, water purification system
2. Water purification
The problem is, if the result has multiple objects, the most common case might not be present on screen and then I either have to scroll up or pipe the result into tac or bat, which both ruins the well done highlighting.
I feel like having the most important result be print last is more intuitive.
I'm sorry if this is not the proper method for suggesting this, but I think it would be beneficial to add a wiki to help People find exactly what they're looking for with greater ease. Here's a start to mock up for the Kanji portion.
Key Classes:
General request information:
Each request notably returns a [Language part]Request Class with three important values
meta = 200 # this value always returns 200 if the request returns a class
data : List["""Language Part"""Config] | KanjiConfig # stores most of the data
Config : type:[BaseConfig] # if you're using this, you're beyond God's help
Seeing as if there's a return meta will always be 200, the sections below focus on what data contains.
All requests are akin to searching on Jisho, therefore, in order to help visualize what data we're receiving, there'll be screen shots
Kanji
from jisho_api.kanji import Kanji
ten_thousand = 'δΈ'
data = Kanji.request(ten_thousand).data
This code is Equivalent to this search:
Here's a picture of the basic values that you'll want to use. Remember that data as written in the code block above is the KanjiRequest class.
Basic Values:
Hi, I would like to download definitions from Jisho.org for about 1000 kanji. If I understood correctly, this tool can do the job.
I've putted every Kanji in a new line, saved it on a utf-8 encoded txt file, and when I run "Jisho scrape kanji name.txt" I get these errors.
C:\Jisho>jisho scrape kanji name.txt
Traceback (most recent call last):
File "c:\users\username\appdata\local\programs\python\python38\lib\runpy.py", li
ne 192, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\username\appdata\local\programs\python\python38\lib\runpy.py", li
ne 85, in run_code
exec(code, run_globals)
File "C:\Users\username\AppData\Local\Programs\Python\Python38\Scripts\jisho.exe
_main.py", line 7, in
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 209, in make_cli
main()
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1130, in call
return self.main(*args, **kwargs)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 102, in scrape_words
scraper(Word, _load_words(file_path), root_dump)
File "c:\users\username\appdata\local\programs\python\python38\lib\site-packages
\jisho_api\cli.py", line 89, in _load_words
txt = fp.read()
File "c:\users\username\appdata\local\programs\python\python38\lib\encodings\cp1
252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 39: chara
cter maps to
What can I do to solve this issue?
Hi there,
When I run the sample code provided for scraping programmatically:
from jisho_api.word import Word
from jisho_api.cli import scrape
word_requests = scrape(Word, ['water', 'fire'], '~/japanese/test')
I get this error:
Traceback (most recent call last):
File "/home/frank/scripts/scrape_from_jisho.py", line 7, in <module>
word_requests = scrape(Word, ['water', 'fire'], '~/japanese/test')
File "/home/frank/dev/venv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/frank/dev/venv/lib/python3.8/site-packages/click/core.py", line 1042, in main
args = list(args)
TypeError: 'type' object is not iterable
Help is appreciated. Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.