Coder Social home page Coder Social logo

sdm-tib / falcon2.0 Goto Github PK

View Code? Open in Web Editor NEW
109.0 6.0 21.0 4.55 MB

Falcon 2.0 is a joint entity and relation linking tool over Wikidata.

Home Page: https://labs.tib.eu/falcon/falcon2/

License: MIT License

Python 100.00%
entity-linking relation-extraction entity-extraction wikidata dbpedia knowledge-graph natural-language-processing nlp

falcon2.0's People

Contributors

ahmadsakor avatar anerypatel avatar kulsingh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

falcon2.0's Issues

Some kind of entity sorting error

Hi,

I've encountered errors when querying entities of single digits e.g. Earth is Q2.

The error is logged below.

    for entity in sorted(raw , key=lambda x: (-x[3],-x[2],int(x[1][x[1].rfind("/")+2:-1])))[:k]:
ValueError: invalid literal for int() with base 10: ''

I've managed to fix this with the following indexing where the -1 in the slicing is removed.

    for entity in sorted(raw , key=lambda x: (-x[3],-x[2],int(x[1][x[1].rfind("/")+2:])))[:k]:

I believe this -1 truncates the sorted id by a digit unintentionally at the back. For example:

1. Q2' -> ''
2. 'Q123' -> 12

Hoping to hear if this is a correct change please or if it can affect the overall package. Thanks ๐Ÿ˜„

Handling `'s` in entity indexing

Had difficulty parsing the following:

print( process_text_E_R("Hong Kong's",rules) )

The resulting error is:

ValueError: 'Kong' is not in list

This seems to come from the mismatch where ["Hong", "Kong's"].index("Kong").

I've tried a fix by adding new rule in the various entity cleaning portions. Hoping to hear if this would make sense with the rules and parsing. Thank you ๐Ÿ˜ธ

            for ent in entities: 
                ent=ent.replace("?","")
                ent=ent.replace(".","")
                ent=ent.replace("!","")
                ent=ent.replace("\\","")
                ent=ent.replace("#","")
                ent=ent.replace("'s","") < added new rule in line 439
                if token.text in ent:

FileNotFound Error

Hi @AhmadSakor,
When I set up this code, I got an error called FileNotFoundError: [Errno 2] No such file or directory: 'datasets/results/test_api/falcon_lcquad2.csv' in evaluateFalconAPI.py file. The same error came when running evaluateFalconAPI_entities.py file (falcon_simple_test.csv not found). Pls, be kind enough to provide me with these CSV files or give any solution to solve these errors.

Elasticdump for wikidata dump takes a long time

Hi, I've followed the instructions to use elasticdump to place the wikidata into elasticsearch. However, elasticdump has been running for a long time.

  • Is there an estimate on how long will it take for the 9gb of data for just the entities?
  • Is there a smaller dataset that I can try this on?

Thanks.

Small query on the output format

Hi

Would like to raise 2 (points / questions):

(1) the doctype should be doc or _doc in the Elastic submodule?
The source code by default reads doc, but the Elasticdump seems to add _doc by default.
It's a small point, but thought it should be raised, in case this affects adding new docs.

(2)
How to interpret the result?
Trying Falcon on random questions provide the following results. How do we interpret the integers that come after the list of links? Thank you.

>>>    process_text_E_R('Who is Michelle Obama?',rules)
>>>    process_text_E_R('Where is Gracht?',rules)
0
['Who is Michelle Obama?', [], [['<http://www.wikidata.org/entity/Q13133>', 'Michelle obama']], 0, 0, 0, 0]
1
['Where is Gracht?', [], [['<http://www.wikidata.org/entity/Q896611>', 'Gracht']], 0, 0, 0, 0]

Named entity recognition

Hi, Does this project have named entity recognition cz um new to this area. If so can you tell me the scripts names which include it

File not found

Hi, thanks for your effort on developing this useful tool~

I follow the instruction to create index by

    propertyIndexAdd()
    entitiesIndexAdd()

but got error
FileNotFoundError: [Errno 2] No such file or directory: '../data/dbpredicateindex.json'

I want to use falcon2 as a relation linking tool, what should I do?

Besides, I find my import speed is very very slow when I import the wikidataentity.json into elasticsearch, do you have any idea about it?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.