phatpiglet / autocorrect Goto Github PK
View Code? Open in Web Editor NEWPython 3 Spelling Corrector
License: MIT License
Python 3 Spelling Corrector
License: MIT License
Hello,
I found a problem by using spellcheck. the Numbers are treated as errors and tried to be corrctede, for example instead of 50 it shows of....
The spell()
function returns only single value that is the corrected word. Can you please add another function which returns two values (word, confidence)
where confidence is percentage score from 0.0 to 1.0 showing how system confident what the provided word was misspelled. Also maybe it is good to provide multiple suggestions and confidence level for each of them, for example,
> spell_scores('pinrt')
[('print', 0.9), ('pint', 0.7)]
Currently autocorrect understands this typo in "pinrt" incorrectly.
I traced the issue to utils.py -> with closing(t.extractfile(tar_path)) as f:
it seems the tarfile library doesn't behave as intended on Windows. I extracted the file manually and changed the code to open it using os.open for now but I intended to make a pull request with a more general solution soon
Thanks for your effort
Hello,
I tried using your library on my working machine, which works through a proxy. When downloading a dictionary for the Russian language, the error "Tunnel connection failed: 407 Proxy Authentication Required" appears. I know how to deal with such errors when downloading dictionaries for Numpy or when cloning libraries from github, but I did not find how to do this for your library.
I can probably just download the data for the Russian language separately, but it’s still not clear where to put the file to make it work.
No idea how this could possibly work, but its definitely necessary for preprocessing text data
utils -> words = set(...) levels the playing field
Hi dear(s).
I want to add some other words and suggestions to this library, please tell me how can I do this?
Also, where are the reference files in source codes such as 'en_US_GB_CA_lower.txt' or 'en_US_GB_CA_mixed.txt'?
spell('ç§�ã�Ÿã�¡ã�¯ãƒ‰ãƒ¼ãƒ“ルã�«2009å¹´7月ã�«4泊ã�—ã�¾ã�—ã�Ÿã€‚ 地下鉄ã�‹ã‚‰2分ã€�ブãƒ\xadードウェイã�¾ã�§æ\xad©ã�„ã�¦ã‚‚ã€�ã��ã‚“ã�ªã�«æ°—ã�«ã�ªã‚Šã�¾ã�›ã‚“ã�§ã�—ã�Ÿã�‹ã‚‰ã€�立地ã�§ã‚‚ã�™ã�¦ã��ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚経営者ã�‹ã�¨æ€�ã‚�れるè€�夫婦ã�¨å¨˜ã�•ã‚“ã€�ã�‚ã�¨2人ã�®ã‚¹ã‚¿ãƒƒãƒ•ã�«å‡ºä¼šã�„ã�¾ã�—ã�Ÿã€‚ ゴスペルã�®æ‰€åœ¨åœ°ã‚’å°‹ã�\xadã�Ÿã‚‰ã€�ãƒ�ットã�§åœ°å›³ã‚’プリントã�—ã�¦ã��ã‚Œã�¦ã€�親切ã�«èª¬æ˜Žã�—ã�¦ã��ã‚Œã�¾ã�—ã�Ÿã€‚ æ\xad´å�²ã‚’ä¿�ã�¨ã�†ã�¨ã�¨ã�—ã�¦ã�„るニューヨークをæ\xad©ã��æ‹\xa0点ã�¨ã�—ã�¦ã€�最é�©ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚ スターãƒ�ックスã€�マクドナルドã€�ã‚\xadングãƒ�ーガも近ã��ã�«ã�‚ã‚Šã€�エンパイアステートビル迄10分弱ã�§ã�™ã�Œã€�コリアã�®çµŒå–¶ã�™ã‚‹ã‚³ãƒ³ãƒ“ニ兼飲食店も数軒有りã€�ホテルã�®è£�手ã�®é€šã‚Šã�«ã�¯æ¶ˆè²»ç¨Žç„¡æ–™ã�®ã‚³ãƒ³ãƒ“ニもã�‚ã�£ã�¦ä¾¿åˆ©ã�§ã�—ã�Ÿã€‚ 100å¹´ã�®æ\xad´å�²ã�¨ã�„ã�£ã�¦ã‚‚改装ã�•ã‚Œã�¦ã�„ã�¦æ¸…æ½”ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚手動ã�®ã‚¨ãƒ¬ãƒ™ãƒ¼ã‚¿ã‚‚å�°è±¡ã�«æ®‹ã‚Šã�¾ã�—ã�Ÿã€‚ 冷蔵庫ã�Œã�ªã�„点ã�¨ã€�ウインドウタイプã�®ã‚¨ã‚¢ã‚³ãƒ³ã�Œã�¡ã‚‡ã�£ã�¨ä¸�便ã�ªç‚¹ä»¥å¤–ã�Šå‹§ã‚�ã�§ã�™ã€‚')
This causes memory to chew up to 6GB+ in a matter of seconds.
Took me all day to figure this out!
Would be good to include some sort of blacklist of weird characters and prevent the mysterious memory hog, eg throw an error (and provide some api to check if a string is spellable)
I install the latest version, i.e., 2.3.0 on Unbuntu. When I run the code:
spell = Speller()
It reports the error as follows:
self.nlp_data = load_from_tar(lang) if nlp_data is None else nlp_data
File "/home/dgl/virtual_env/textR/lib/python3.5/site-packages/autocorrect/init.py", line 78, in load_from_tar
return json.load(file)
File "/usr/lib/python3.5/json/init.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))
TypeError: the JSON object must be str, not 'bytes'
Hello,
I've used this python library in my project, but it seems that only works with the English dictionary. Or does it work for other languages?
Because I would like to use to auto correct words in Portuguese.
Thanks,
Rita
It takes 0.25 second (!) to correct a single word (“paiin”). If this stems from the code/algorithm design, I suggest to describe the package as a proof-of-concept or toy, not a spelling corrector.
Given a list of words, I'm looping trough them to correct any misspelled words. For now I'm stuck in the 5738 word for more than five minutes, with memory usage up to 12GB of RAM and disk usage of 120MB/s.
It would be nice to have a time out parameter to abort if the search is taking to long. It probably possible to optimize the memory usage either.
It takes 6+ seconds to give me the corrected word.
print(spell('wednsday'));
wednesday
[Finished in 6.5s]
print(spell('hello'));
hello
[Finished in 6.6s]
I would love to contribute to this project. Please tell me your preferred method of contact. Till then, I will go through the code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.