Coder Social home page Coder Social logo

ziqizhang / sti Goto Github PK

View Code? Open in Web Editor NEW
19.0 8.0 8.0 319.86 MB

Implementation of algorithms for semantic table implementation, including the TableMiner+ method

Java 98.36% JavaScript 0.58% HTML 0.94% CSS 0.10% Shell 0.02%
semantic-table-interpretation entity-linking web-table webtable classification relation-extraction dbpedia freebase semantic-web

sti's Introduction

Semantic Table Interpretation

This repository contains implementation of the TableMiner+ system (see below), which implements a novel relational table annotation method that given an existing knowledge base, 1) links mentions in table cells into named entities or their properties in the knowledge base; 2) annotates columns using concepts; 3) annotates relations between columns.

Keywords: webtable, web table, entity linking, table annotation, table interpretation, semantic web, classification, relation extraction

CITING

[1] Zhang, Z. 2017. Effective and efficient semantic table interpretation using tableminer+. Semantic Web 8 (6), 921-957

DISCLAIMER

This project is based on the TableMiner+ system described in [1]. In addition to TableMiner+, this project provides implementation of several other semantic table interpretation algorithms, including: Joint Inference based on Liamye2011, and Semantic Message Passing based on Mulwad2013. However, due to many things out of our control (e.g., use of in-house software in original works, different versions of knowledge bases), please note that we cannot guarantee identical replication of the original systems or reproduction of experiment results.

Part of this work was funded by the EPSRC project LODIE - Linked Open Data for Information Extraction, EP/ J019488/1.

ANNOUNCEMENT

Apr 2018: about Bing web search: Bing web search is used by TableMiner+ to detect subject columns. However, the API has now been deprecated and replaced. There is no plan to migrate to the new service in the near future, due to lack of funding. However, if you are willing to contribute, we would be more than happy to merge your pull request. Otherwise, please disable it by setting sti.subjectcolumndetection.ws=false in sti.properties (as already default in the current distribution).

Oct 2016: STI now has a UI demo to visualise the table interpretation input, process, and output. See here on how to use the demo.

Sep 2016: about Freebase: TableMiner+ was developed using Freebase as the knowledge base. Freebase has been shutdown since 2015 and it is no longer possible to access it online. While it is still possible to access Freebase data by mapping its topic IDs to Wikidata entries, currently this has not been implemented. If you are willing contribute, please get in touch. For now, please use DBpedia instead. See kbsearch.properties for details.

QUICK START

To get started, please follow the instructions between and get in touch if you encounter any problems:

  • Place a copy of STI on your computer
  • Run maven to install two 3rd party libraries (in 'libs') to your local maven repository. See https://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html for howto. There is also a script install_missing_libs.sh to help you on this.
  • Download test data, from here
  • Unzip the test data, into e.g., [sti_data]
  • Navigate into [sti_data/dataset], unzip, depending on the test cases: imdb.tar.gz for the IMDB dataset; musicbrainz.tar.gz for the MusicBrainz dataset; Limaye200.tar.gz for the Limaye200 dataset; Limaye_complete.tar.gz for the LimayeAll dataset
  • Configure your local copy of STI: please read within each .properties file for detailed descriptions of each parameter
    • open config/sti.properties, as a minimum, you need to change sti.home, and sti.cache.main.dir.
    • open kbsearch.properties, as a minimum, you need to change kb.search.result.stoplistfile, and fb.query.api.key to use your own Freebase API key
    • open websearch.properties, as a minimum, you need to change bing.keys to use your own bing web search key
  • STI uses log4j for logging. Make sure you have a copy of config/log4j.properties within your compiled java class folder for the progress to be displayed properly.
  • Run a test case. For example, to run TMP, use: uk.ac.shef.dcs.sti.experiment.TableMinerPlusBatch "[sti_data/Limaye200]" "[output_dir]" "/[sit_home_dir]/sti.properties"

Note: sti.properties distributed with code is a default configuration for Limaye200 and LimayeAll datasets; for IMDB and MusicBrainz datasets, you can edit a template inside /resources. For both IMDB and MusicBrainz, you may want to provide the VM variable '-Djava.util.logging.config.file=' to configure the logging output of the any23-sti module (which can produce too many logs).

LICENCE

Apache 2.0

Running slow/HTTP X errors

Semantic Table Interpretation requires fetching data from a knowledge base. This is currently configured to use a remote knowledge base by calling its APIs or web services, such as the DBPedia SPARQL endpoint. This is the part of the process that takes 99.99% of processing time in a typical STI application. Also, when such a remove server is unreliable, you can often encounter an HTTP error such as HTTP 500. Therefore if possible, please consider to host a local copy of the knowledge base before you start. For example, you can deploy a local DBpedia server, which then can result in orders of magnitude of performance improvement.

sti's People

Contributors

ir-ischool-uos avatar ziqizhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sti's Issues

AttributeValueMatcher not stripping datatype string hence not matching values correctly

TODO

(Thanks to Josef Janoušek from the Odalic project)

"The AttributeValueMatcher in the method score ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/scorer/AttributeValueMatcher.java#L104 ) was not able to match the input cell value 694 (of datatype NUMBER) and the attribute which has the value "694"^^http://www.w3.org/2001/XMLSchema#positiveInteger - so in the DBpedia knowledge base the text representation of the value of the literal attribute contains also the data type (according to XML Schema) - as shown at https://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fp+%3Fo+where+{%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FA_Game_of_Thrones%3E+%3Fp+%3Fo}&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on . So because it was not matched, all attributes had the score 0.0 and relation was not discovered.
So I made changes in collected attributes used for matching ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/algorithm/tmp/TColumnColumnRelationEnumerator.java#L65 ) - when the attribute value contains "^^", then I cut the datatype part of the string and set only the number (e.g. 694) as value of the attribute, and also I set the valueURI of the attribute to null, because otherwise the method classifyAttributeValueDataType of AttributeValueMatcher ( https://github.com/ziqizhang/sti/blob/master/sti-main/src/uk/ac/shef/dcs/sti/core/scorer/AttributeValueMatcher.java#L185 ) sets datatype to named_entity. After these changes the value of the attribute is just 694 and datatype is set to NUMBER, so the score method of AttributeValueMatcher is able to match it with the input cell value and the relation is discovered."

Upgrade to Any23 2.0

Hi folks, we recently released Any23 2.0 which has lots of improvements. Artifacts are available on Maven central and there are no API breakages IIRC. Please get us over on user @ any23.apache.org if you have any problems upgrading.

Exceptions after the LEARNING phase

Hi,
I installed the project (with NodeJS UI) following the instructions.

But I have some issues during/after the LEARNING phase; I tried two different tables, but the annotation process can't proceed further. With this table https://en.wikipedia.org/wiki/Commedia_all%27italiana I get an HttpException (detailed log following).
The KG endpoint is the default dbpedia.org/sparql and the parser is Wikipedia tables.
The process ends up with the following message:

"Your task is complete. Visit http://localhost:3000/user1/index.htm for your output. Thanks for using TableMiner+"

but index.htm is missing (error 404).
Inside the folder ui/tmp/user1 there are just 3 files:

_wiki_Commedia_all_27italiana.download.html (html page with red boxes)
_wiki_Commedia_all_27italiana.download.html.original (original html page)
xpaths.json (empty)
2019-07-25 09:39:24 INFO TableMinerPlusBatch:46 - Initializing entity cache...

2019-07-25 09:39:26 INFO TableMinerPlusBatch:50 - Initializing KBSearch...

2019-07-25 09:39:27 INFO TableMinerPlusBatch:67 - Initializing SUBJECT COLUMN DETECTION components ...

Thu Jul 25 09:39:27 UTC 2019 loading exception data for lemmatiser...

Thu Jul 25 09:39:27 UTC 2019 loading done

2019-07-25 09:39:28 INFO TableMinerPlusBatch:94 - Initializing LEARNING components ...

2019-07-25 09:39:28 INFO TableMinerPlusBatch:136 - Initializing UPDATE components ...

2019-07-25 09:39:28 INFO TableMinerPlusBatch:149 - Initializing RELATIONLEARNING components ...

2019-07-25 09:39:28 INFO TMPInterpreter:49 - >    PHASE: Detecting subject column...

2019-07-25 09:39:29 INFO TMPInterpreter:65 - >    PHASE: LEARNING ...

2019-07-25 09:39:29 INFO TMPInterpreter:82 - >> Column=0

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:70 - >> (LEANRING) Preliminary Column Classification begins

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [12]/66,(The Last Judgement) NAMED_ENTITY

2019-07-25 09:39:29 INFO TCellDisambiguator:34 - >> (cold start disamb), candidates=10

2019-07-25 09:39:29 INFO TColumnClassifier:38 - >> update candidate clazz on column, existing=0

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [18]/66,(Il diavolo) NAMED_ENTITY

2019-07-25 09:39:29 INFO TCellDisambiguator:34 - >> (cold start disamb), candidates=2

2019-07-25 09:39:29 INFO TColumnClassifier:38 - >> update candidate clazz on column, existing=2

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:117 - >> (LEARNING) Preliminary Column Classification converged, rows:2/66

2019-07-25 09:39:29 INFO LEARNINGPreliminaryDisamb:39 - >> (LEARNING) Preliminary Disambiguation begins

2019-07-25 09:39:29 INFO LEARNINGPreliminaryDisamb:46 - >> re-annotate cells involved in cold start disambiguation

2019-07-25 09:39:29 INFO LEARNINGPreliminaryDisamb:50 - >> constrained cell disambiguation for the rest cells in this column

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([14]/66,0) (March on Rome) DATE candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([59]/66,0) (La stanza del vescovo) SHORT_TEXT candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([58]/66,0) (The Career of a Chambermaid) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([35]/66,0) (A Question of Honour) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([50]/66,0) (Vogliamo i colonnelli) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([57]/66,0) (Brutti, sporchi e cattivi) SHORT_TEXT candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([60]/66,0) (Un borghese piccolo piccolo) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([27]/66,0) (The Birds, the Bees and the Italians) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([41]/66,0) (The Libertine) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([48]/66,0) (The Seduction of Mimi) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([19]/66,0) (Il Boom) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([13]/66,0) (Mafioso) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([6]/66,0) (A Difficult Life) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([1]/66,0) (The Great War) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([47]/66,0) (Bello, onesto, emigrato Australia sposerebbe compaesana illibata) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([54]/66,0) (Profumo di donna) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([38]/66,0) (La ragazza con la pistola) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([3]/66,0) (Love and Larceny) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([34]/66,0) (L'ombrellone) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([56]/66,0) (Amici miei) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([7]/66,0) (Audace colpo dei soliti ignoti) SHORT_TEXT candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([43]/66,0) (Brancaleone alle Crociate) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([5]/66,0) (Adua e le compagne) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([46]/66,0) (Secret Fantasy) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([9]/66,0) (Divorce, Italian Style) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([15]/66,0) (The Conjugal Bed) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([22]/66,0) (Seduced and Abandoned) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([24]/66,0) (Il successo) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([45]/66,0) (In nome del popolo italiano) SHORT_TEXT candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([49]/66,0) (Lo scopone scientifico) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([17]/66,0) (Alta Infedeltà) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([26]/66,0) (Casanova 70) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([40]/66,0) (Il Commissario Pepe) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([25]/66,0) (Le bambole) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([63]/66,0) (Caro papà) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([31]/66,0) (The Man, the Woman and the Money) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([20]/66,0) (Yesterday, Today and Tomorrow) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([36]/66,0) (The Tiger and the Pussycat) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([44]/66,0) (Between Miracles) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([64]/66,0) (Amici miei Atto II) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([39]/66,0) (Il medico della mutua) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([52]/66,0) (C'eravamo tanto amati) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([2]/66,0) (Il vedovo) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([55]/66,0) (Romanzo popolare) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([23]/66,0) (Se permettete parliamo di donne) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([30]/66,0) (Io la conoscevo bene) SHORT_TEXT candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([4]/66,0) (Everybody Go Home) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([0]/66,0) (Big Deal on Madonna Street) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([11]/66,0) (The Easy Life) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([51]/66,0) (Pane e cioccolata) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([28]/66,0) (I complessi) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([33]/66,0) (L'Armata Brancaleone) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([61]/66,0) (Traffic Jam (film)) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([8]/66,0) (The Fascist) NAMED_ENTITY candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([29]/66,0) (Il Gaucho) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([42]/66,0) (Vedo nudo) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([65]/66,0) (Café Express) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([16]/66,0) (I mostri) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([10]/66,0) (Boccaccio '70) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([37]/66,0) (The Witches) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([21]/66,0) (The Reunion) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([53]/66,0) (Swept Away by an Unusual Destiny in the Blue Sea of August) UNKNOWN candidates=0

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([32]/66,0) (Me, Me, Me... and the Others) NAMED_ENTITY candidates=1

2019-07-25 09:39:29 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([62]/66,0) (Signore e signori, buonanotte) SHORT_TEXT candidates=0

2019-07-25 09:39:29 INFO LEARNINGPreliminaryDisamb:87 - >> constrained cell disambiguation complete 40/66 rows
2019-07-25 09:39:29 INFO LEARNINGPreliminaryDisamb:88 - >> reset candidate column class annotations

2019-07-25 09:39:29 INFO TMPInterpreter:82 - >> Column=2

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:70 - >> (LEANRING) Preliminary Column Classification begins

2019-07-25 09:39:29 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [2, 3, 6, 11, 14, 16, 29, 34, 36, 42, 45, 54, 58, 59, 63]/27,(Dino Risi) NAMED_ENTITY

2019-07-25 09:39:29 INFO TCellDisambiguator:34 - >> (cold start disamb), candidates=2

2019-07-25 09:39:30 INFO TColumnClassifier:38 - >> update candidate clazz on column, existing=0

2019-07-25 09:39:30 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [0, 1, 26, 33, 38, 43, 50, 55, 56, 60, 64]/27,(Mario Monicelli) NAMED_ENTITY

2019-07-25 09:39:30 INFO TCellDisambiguator:34 - >> (cold start disamb), candidates=2

2019-07-25 09:39:30 INFO TColumnClassifier:38 - >> update candidate clazz on column, existing=47

2019-07-25 09:39:30 INFO LEARNINGPreliminaryColumnClassifier:117 - >> (LEARNING) Preliminary Column Classification converged, rows:26/27

2019-07-25 09:39:30 INFO LEARNINGPreliminaryDisamb:39 - >> (LEARNING) Preliminary Disambiguation begins

2019-07-25 09:39:30 INFO LEARNINGPreliminaryDisamb:46 - >> re-annotate cells involved in cold start disambiguation

2019-07-25 09:39:30 INFO LEARNINGPreliminaryDisamb:50 - >> constrained cell disambiguation for the rest cells in this column

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([23, 40, 52, 57]/27,2) (Ettore Scola) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([4, 49, 61]/27,2) (Luigi Comencini) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([35, 39, 47]/27,2) (Luigi Zampa) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([9, 22, 27]/27,2) (Pietro Germi) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([12, 19, 20]/27,2) (Vittorio De Sica) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([5, 30]/27,2) (Antonio Pietrangeli) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([41, 46]/27,2) (Pasquale Festa Campanile) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([7, 65]/27,2) (Nanni Loy) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([48, 53]/27,2) (Lina Wertmüller) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([13]/27,2) (Alberto Lattuada) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([18]/27,2) (Gian Luigi Polidoro) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([44]/27,2) (Nino Manfredi) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([31]/27,2) (Eduardo De Filippo, Marco Ferreri, Luciano Salce) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([24]/27,2) (Mauro Morassi, Dino Risi) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([25]/27,2) (Mauro Bolognini, Luigi Comencini, Dino Risi, Franco Rossi) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([15]/27,2) (Marco Ferreri) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([8]/27,2) (Luciano Salce) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([17]/27,2) (Mario Monicelli, Franco Rossi, Elio Petri, Luciano Salce) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([51]/27,2) (Franco Brusati) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([28]/27,2) (Dino Risi, Luigi Filippo D'Amico, Franco Rossi) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([62]/27,2) (Luigi Comencini, Nanni Loy, Mario Monicelli, Ettore Scola, Luigi Magni) UNKNOWN candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([37]/27,2) (Luchino Visconti, Pier Paolo Pasolini, Vittorio De Sica, Franco Rossi, Mauro Bolognini) UNKNOWN candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([10]/27,2) (Mario Monicelli, Federico Fellini, Luchino Visconti, Vittorio De Sica) NAMED_ENTITY candidates=0

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([21]/27,2) (Damiano Damiani) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO TCellDisambiguator:64 - >> (constrained disambiguation in LEARNING) , position at ([32]/27,2) (Alessandro Blasetti) NAMED_ENTITY candidates=1

2019-07-25 09:39:30 INFO LEARNINGPreliminaryDisamb:87 - >> constrained cell disambiguation complete 31/27 rows

2019-07-25 09:39:30 INFO LEARNINGPreliminaryDisamb:88 - >> reset candidate column class annotations

2019-07-25 09:39:30 INFO TMPInterpreter:82 - >> Column=3

2019-07-25 09:39:30 INFO LEARNINGPreliminaryColumnClassifier:70 - >> (LEANRING) Preliminary Column Classification begins

2019-07-25 09:39:30 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [0]/63,(Marcello Mastroianni, Vittorio Gassman, Totò) NAMED_ENTITY

uk.ac.shef.dcs.sti.STIException: uk.ac.shef.dcs.kbsearch.KBSearchException: HttpException: 500

at uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPInterpreter.start(TMPInterpreter.java:105)
at uk.ac.shef.dcs.sti.experiment.STIBatch.process(STIBatch.java:326)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.process(TableMinerPlusSingle.java:58)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.main(TableMinerPlusSingle.java:147)
Caused by: uk.ac.shef.dcs.kbsearch.KBSearchException: HttpException: 500
at uk.ac.shef.dcs.kbsearch.sparql.DBpediaSearch.findEntityCandidates(DBpediaSearch.java:133)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNINGPreliminaryColumnClassifier.runPreliminaryColumnClassifier(LEARNINGPreliminaryColumnClassifier.java:95)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNING.learn(LEARNING.java:27)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPInterpreter.start(TMPInterpreter.java:83)
... 3 more
Caused by: HttpException: 500
at org.apache.jena.sparql.engine.http.HttpQuery.rewrap(HttpQuery.java:411)
at org.apache.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:355)
at org.apache.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:292)
at org.apache.jena.sparql.engine.http.QueryEngineHTTP.execResultSetInner(QueryEngineHTTP.java:359)
at org.apache.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:351)
at uk.ac.shef.dcs.kbsearch.sparql.SPARQLSearch.queryByLabel(SPARQLSearch.java:143)
at uk.ac.shef.dcs.kbsearch.sparql.DBpediaSearch.findEntityCandidates(DBpediaSearch.java:105)
... 6 more

missed: 0_https://en.wikipedia.org/wiki/Commedia_all%27italiana

java.io.FileNotFoundException: resources/failed.txt (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:133)
at java.io.FileWriter.(FileWriter.java:78)
at uk.ac.shef.dcs.sti.experiment.STIBatch.recordFailure(STIBatch.java:354)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.process(TableMinerPlusSingle.java:80)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.main(TableMinerPlusSingle.java:147)

Instead with this other table https://it.wikipedia.org/wiki/Gand%C3%ADa_Shore the exception is org.apache.jena.query.QueryParseException because in the table there is " character.

2019-07-22 13:51:51 INFO TableMinerPlusBatch:46 - Initializing entity cache...

2019-07-22 13:51:53 INFO TableMinerPlusBatch:50 - Initializing KBSearch...

2019-07-22 13:51:54 INFO TableMinerPlusBatch:67 - Initializing SUBJECT COLUMN DETECTION components ...

Mon Jul 22 13:51:54 UTC 2019 loading exception data for lemmatiser...

Mon Jul 22 13:51:54 UTC 2019 loading done

2019-07-22 13:51:55 INFO TableMinerPlusBatch:94 - Initializing LEARNING components ...

2019-07-22 13:51:55 INFO TableMinerPlusBatch:136 - Initializing UPDATE components ...

2019-07-22 13:51:55 INFO TableMinerPlusBatch:149 - Initializing RELATIONLEARNING components ...

2019-07-22 13:51:55 INFO TMPInterpreter:49 - > PHASE: Detecting subject column...

2019-07-22 13:51:56 INFO TMPInterpreter:65 - > PHASE: LEARNING ...

2019-07-22 13:51:56 INFO TMPInterpreter:82 - >> Column=0

2019-07-22 13:51:56 INFO LEARNINGPreliminaryColumnClassifier:70 - >> (LEANRING) Preliminary Column Classification begins

2019-07-22 13:51:56 INFO LEARNINGPreliminaryColumnClassifier:81 - >> cold start disambiguation, row(s) [0]/8,(José "Labrador" Sancho) NAMED_ENTITY

uk.ac.shef.dcs.sti.STIException: uk.ac.shef.dcs.kbsearch.KBSearchException: org.apache.jena.query.QueryParseException: Lexical error at line 2, column 44. Encountered: "\"" (34), after : "Labrador"
at uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPInterpreter.start(TMPInterpreter.java:105)
at uk.ac.shef.dcs.sti.experiment.STIBatch.process(STIBatch.java:326)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.process(TableMinerPlusSingle.java:58)

at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.main(TableMinerPlusSingle.java:147)
Caused by: uk.ac.shef.dcs.kbsearch.KBSearchException: org.apache.jena.query.QueryParseException: Lexical error at line 2, column 44. Encountered: "\"" (34), after : "Labrador"
at uk.ac.shef.dcs.kbsearch.sparql.DBpediaSearch.findEntityCandidates(DBpediaSearch.java:133)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNINGPreliminaryColumnClassifier.runPreliminaryColumnClassifier(LEARNINGPreliminaryColumnClassifier.java:95)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.LEARNING.learn(LEARNING.java:27)
at uk.ac.shef.dcs.sti.core.algorithm.tmp.TMPInterpreter.start(TMPInterpreter.java:83)
... 3 more
Caused by: org.apache.jena.query.QueryParseException: Lexical error at line 2, column 44. Encountered: "\"" (34), after : "Labrador"
at org.apache.jena.sparql.lang.ParserSPARQL11.perform(ParserSPARQL11.java:110)
at org.apache.jena.sparql.lang.ParserSPARQL11.parse$(ParserSPARQL11.java:52)
at org.apache.jena.sparql.lang.SPARQLParser.parse(SPARQLParser.java:34)
at org.apache.jena.query.QueryFactory.parse(QueryFactory.java:147)
at org.apache.jena.query.QueryFactory.create(QueryFactory.java:79)
at org.apache.jena.query.QueryFactory.create(QueryFactory.java:52)
at org.apache.jena.query.QueryFactory.create(QueryFactory.java:40)
at uk.ac.shef.dcs.kbsearch.sparql.SPARQLSearch.queryByLabel(SPARQLSearch.java:139)
at uk.ac.shef.dcs.kbsearch.sparql.DBpediaSearch.findEntityCandidates(DBpediaSearch.java:105)
... 6 more
java.io.FileNotFoundException: resources/failed.txt (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:133)
at java.io.FileWriter.(FileWriter.java:78)
at uk.ac.shef.dcs.sti.experiment.STIBatch.recordFailure(STIBatch.java:354)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.process(TableMinerPlusSingle.java:80)
at uk.ac.shef.dcs.sti.ui.TableMinerPlusSingle.main(TableMinerPlusSingle.java:147)

missed: 0_ https://it.wikipedia.org/wiki/Gandía_Shore

I also manually created the file resources/failed.txt, but I obtained the same FileNotFoundException.

Could you help us to solve those issues?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.