Coder Social home page Coder Social logo

Comments (7)

xxyzz avatar xxyzz commented on July 22, 2024

spaCy probably marked that word as person by mistake in that particular sentence, the word means "cataphract" which shouldn't be marked as person or term. You can use the custom X-Ray feature to ignore this word or increase the "Minimal X-Ray occurrences" value to two.

from worddumb.

Phobooky avatar Phobooky commented on July 22, 2024

You're completement right and the fault was me. I was using the custom X-Ray feature and I made a mistake. I assigned to "catafracto" the property "PERSON". As there are only one occurrence in singular, the behabiour seems correct. To be sure, I've changed the word to "PRODUCT" and I have generated again the file. But the result is not good. The problem remains. There are a lot of occurrences of the word in the book and only three are located:

image

If I delete this word from custom X-Ray feature the result is:

image

If I change the word in custom X-Ray feature writing the plural one, it locate all of the ocurrences.

image

I don't know why, but the plugin part of code to locate occurrences of this word is not working, with the plural.

from worddumb.

xxyzz avatar xxyzz commented on July 22, 2024

You want to add the word "catafracto" and "catafractos" as a X-Ray entity? You can create a custom X-Ray data and add the plural form to the "Aliases" input box. spaCy entity ruler can only find the exact matched string, and custom X-Ray names are not fuzz matched, so you have to add the precise strings you want to find.

from worddumb.

Phobooky avatar Phobooky commented on July 22, 2024

Yes, I want to add the word "catafracto" and "catafractos" as a X-Ray entity. I known about the alias, but I supposed that the plugin looked for the word and plural ones from it. It was my fault. Sorry. Now, I understand perfectly the way of work.
But I think a stange behavour remains.
If I delete "catafracto" from the custom list, spaCy locate "catafractos", but only two ocurrences. It should find 22 ocurrences.
Do you know why?
image

from worddumb.

xxyzz avatar xxyzz commented on July 22, 2024

spaCy can't guarantee to find the same word in every sentences, unless you add that word to the entity ruler via the custom X-Ray dialog. Other NLP software should be the same.

from worddumb.

Phobooky avatar Phobooky commented on July 22, 2024

Ok. Thanks for your help and the explanations given. Having in mind spaCy is the tool used by the plugin, I think yo can close the issue.

from worddumb.

xxyzz avatar xxyzz commented on July 22, 2024

OK, I'll close the issue now.

from worddumb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.