Coder Social home page Coder Social logo

opensemanticsearch / solr-ontology-tagger Goto Github PK

View Code? Open in Web Editor NEW
42.0 9.0 11.0 41 KB

Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri

Home Page: https://opensemanticsearch.org/solr-ontology-tagger

License: GNU General Public License v3.0

Python 100.00%
solr rdf skos thesaurus ontology ontologies faceted-search tagging tagger python

solr-ontology-tagger's People

Contributors

mandalka avatar opensemanticsearch avatar wsldankers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solr-ontology-tagger's Issues

Optimize: Dedupe labels

For better performance dedupe labels for each entity, since many multilingual ontologies have multiple same labels (for different languages).

Search by synonyms is case sensitive

Hi,

I've added synonyms for the name "TREPROSTINIL": "Orenitram", "Remodulin" and "TYVASO" and indexed documents with word "Treprostinil" or "treprostinil".

When I do a search in the web UI for "remodulin", it find the document with "treprostinil" but not the one with "Treprostinil".

Is that expected? How can I make it not case-sensitive? I tried adding lower cased entries to "/solr/opensemanticsearch/schema/analysis/synonyms/skos" but it did not work.

This is what I have on this schema (they are duplicated so that symmetric synonym search can work, otherwise the pref_label finds document with aliases but searching for aliases do not find pref_label nor other aliases):
"Orenitram":["Orenitram",
"Remodulin",
"TREPROSTINIL",
"TYVASO"],
"Remodulin":["Orenitram",
"Remodulin",
"TREPROSTINIL",
"TYVASO"],
"TREPROSTINIL":["TREPROSTINIL",
"orenitram",
"remodulin",
"tyvaso"],
"TYVASO":["Orenitram",
"Remodulin",
"TREPROSTINIL",
"TYVASO"],

Thanks!
Yoann

Applying uploaded RDF ontologies not working

Hi,

When we upload an RDF ontology (say ontology.xml) to OSS, we have the option of applying it to existing documents. This isn't currently working as expected because in solr_ontology_tagger.py the fields that are updated are only test_xml_ss_preferred_label_ss and test_xml_ss_uri_ss. The field test_xml_ss (which gets turned to a facet) is not updated. To fix this, I've changed

tagdata = add_value_to_facet(facet = target_facet + '_preferred_label_ss', value = preferred_label, data=tagdata)

to

tagdata[target_facet] = preferred_label
tagdata = add_value_to_facet(facet = target_facet + '_preferred_label_ss', value = preferred_label, data=tagdata)

Is it a reasonable fix?

Thanks!
Bassam

PS: As I'm sure you know, there is a difference in how entity_linker tags documents and how solr_ontology_tagger tags documents. (I think) Entity_linker only tags a document if the facet search returns exact matches to a label:

if str(value).lower() == queries[query]['query'].lower():
              match = True

While playing around with OSS, this seems to cause problems when for example a dictionary term is 'hello '. 'hello' will be returned in a facet search but entity_linker won't tag the document because it's not the same as 'hello ' (with the space at the end).

Bug in 'solr_ontology_tagger.py'

Hi,

When I try to apply an ontology (through “Manage structure” -> “Ontologies (Lists of names or concepts, Vocabularies, Dictionaries, Thesauri)” tab -> “Details” of an ontology -> “Apply ontology”), I get the error

No connection adapters were found for 'localhost:8983/solr/…

The error can be traced back to the solr_ontology_tagger.py file. The solr url is set to be

solr = 'localhost:8983/solr/'

instead of

solr = 'http://localhost:8983/solr/'

I no longer get an error. I get the success message "Ontology applied." but none of the indexed documents have tags applied to them.
It seems like the only way to tag documents with entities from an ontology is to re-index them with the enhance_entity_linking plugin.

Best regards,
Bassam

Synonyms bidirectional

Find/highlight not only entity name and aliases but find entity without alias if search for alias, too

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.