Coder Social home page Coder Social logo

koichiyasuoka / unidic2ud Goto Github PK

View Code? Open in Web Editor NEW
32.0 2.0 2.0 163.71 MB

Tokenizer POS-tagger Lemmatizer and Dependency-parser for modern and contemporary Japanese

License: MIT License

Python 62.20% Shell 0.99% Jupyter Notebook 36.81%
nlp dependency-parser japanese-language

unidic2ud's Issues

the bug in the file "unidic2ud/unidic2ud.py"

When I ran the demo codes, it caused the error.

code

import unidic2ud
nlp=unidic2ud.load("kindai")
s=nlp("其國を治めんと欲する者は先づ其家を齊ふ")
print(s)

cmd output

------------------- ERROR DETAILS ------------------------
arguments: -r site-packages\unidic2ud\mecabrc -d site-packages\unidic2ud\download\gendai
[ifs] no such file or directory: site-packagesunidic2udmecabrc
----------------------------------------------------------

After I changed the file "unidic2ud/unidic2ud.py", line 224 and 231, the code can run.

224 self.mecab=Tagger("-r "+r+" -d "+d).parse
# self.mecab=Tagger("-r '"+r+"' -d '"+d+"'").parse
231 self.mecab=Tagger("-r "+r+" -d "+unidic_lite.DICDIR).parse
# self.mecab=Tagger("-r '"+r+"' -d '"+unidic_lite.DICDIR+"'").parse

shlex call in mecab breaks library with downloaded dictionaries on windows

A call to shlex.split(args) within mecab strips \\ characters from paths unless the paths are enclosed by " characters.

line 224 in unidic2ud.py does not do this:
self.mecab=Tagger("-r "+r+" -d "+d).parse

thus the library does not work with downloaded dictionaries on windows.
this modified line fixes the problem.

self.mecab=Tagger(f"""-r "{r}" -d "{d}" """).parse

mecab itself does enclose paths in this way within it's own code, so I find it highly unlikely that this modification will break anything on other platforms.

All the best :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.