Comments (7)
spaCy probably marked that word as person by mistake in that particular sentence, the word means "cataphract" which shouldn't be marked as person or term. You can use the custom X-Ray feature to ignore this word or increase the "Minimal X-Ray occurrences" value to two.
from worddumb.
You're completement right and the fault was me. I was using the custom X-Ray feature and I made a mistake. I assigned to "catafracto" the property "PERSON". As there are only one occurrence in singular, the behabiour seems correct. To be sure, I've changed the word to "PRODUCT" and I have generated again the file. But the result is not good. The problem remains. There are a lot of occurrences of the word in the book and only three are located:
If I delete this word from custom X-Ray feature the result is:
If I change the word in custom X-Ray feature writing the plural one, it locate all of the ocurrences.
I don't know why, but the plugin part of code to locate occurrences of this word is not working, with the plural.
from worddumb.
You want to add the word "catafracto" and "catafractos" as a X-Ray entity? You can create a custom X-Ray data and add the plural form to the "Aliases" input box. spaCy entity ruler can only find the exact matched string, and custom X-Ray names are not fuzz matched, so you have to add the precise strings you want to find.
from worddumb.
Yes, I want to add the word "catafracto" and "catafractos" as a X-Ray entity. I known about the alias, but I supposed that the plugin looked for the word and plural ones from it. It was my fault. Sorry. Now, I understand perfectly the way of work.
But I think a stange behavour remains.
If I delete "catafracto" from the custom list, spaCy locate "catafractos", but only two ocurrences. It should find 22 ocurrences.
Do you know why?
from worddumb.
spaCy can't guarantee to find the same word in every sentences, unless you add that word to the entity ruler via the custom X-Ray dialog. Other NLP software should be the same.
from worddumb.
Ok. Thanks for your help and the explanations given. Having in mind spaCy is the tool used by the plugin, I think yo can close the issue.
from worddumb.
OK, I'll close the issue now.
from worddumb.
Related Issues (20)
- X-Ray files being placed in wrong folder HOT 2
- Keyerror:zh_cn HOT 4
- The word wise cant get in my PDF HOT 1
- "Customize X-Ray" returns an error HOT 1
- The lemma language cannot be changed, and it always shows Catalan. HOT 4
- Python outdated
- X-Ray deleteing text after an X-Ray instance in epub HOT 2
- Not able to create file with CUDA because of torch 2.3 HOT 4
- Wrong words extract on wordwise HOT 4
- error when try to use the plugin english to hebrew HOT 2
- How can I disable or toggle Wordwise in Koreader? HOT 3
- Cannot add X-Ray or Word Wise HOT 1
- Can't x ray and wordwise
- Incorrect annotations HOT 2
- Cannot X-ray or Wordwise - FileNotFoundError: [WinError 2] The system cannot find the file specified HOT 3
- error on everything HOT 3
- error on setting preferences HOT 1
- Keyerror:zh_cn happeded on newest version HOT 3
- nvcc fatal error HOT 3
- i get an error when generating wordwise and xray HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from worddumb.