thiagocf05 / webnlg Goto Github PK
View Code? Open in Web Editor NEWThe enriched version of the WebNLG described at INLG 2018
The enriched version of the WebNLG described at INLG 2018
The full test set including references was released a few months ago.
I think it would be beneficial for completeness sake to also delexicalize that data.
Reasoning:
Thanks for doing all this.
I have a question regarding coverage, and if you tested your manual work's coverage.
Looking at train, 7triplets, first sentence (first file I open), I see in the first sentence:
AGENT-1 was born in PATIENT-4 and is from the U.S. . AGENT-1 graduated in 1955 from PATIENT-3 . AGENT-1 worked as PATIENT-2 and for NASA in PATIENT-6 . AGENT-1 spent PATIENT-5 in space and is now retired .
U.S.
should be replaced with PATIENT-1
, the entire in 1955 from UT Austin with a B.S
with PATIENT-3
and retired
should be replaced with PATIENT-7
.
Would you say that hese kind of problems are to be expected? Did you do any coverage test to make sure you didn't leave anything? (for 2 of these cases above, an automated test can catch them)
Found on
Category="WrittenWork" lid="Id1", size="5"
https://github.com/ThiagoCF05/webnlg/blob/master/data/v1.5/en/dev/5triples/WrittenWork.xml#L857
Category="SportsTeam" lid="Id1", size="2"
https://github.com/ThiagoCF05/webnlg/blob/master/data/v1.5/en/dev/2triples/SportsTeam.xml#L510
Dear authors,
I really like your enriched WebNLG, and admire your efforts on updating it to the newest v1.5! Sometimes it is hard for people who want to gets into the datasets quickly because of the xml format. Both transforming the format into a more user-friendly Python dictionary, and cleaning the dataset needs meticulous efforts.
I made this data reader for my own research project: WebNLG Reader. I wish to share this with you for better spread of your work. If there are future Python programmers who wants to use your dataset, they can easily adapt from my code and kick off projects more easily.
All in all, great work :D!
On the test files, inside each triple is contained inside a instead of a .
When scraping the file this becomes problematic.
Thank you for making the WebNLG dataset with the alignment available!
We would like to align sentences in the original text
and the triples in sortedtripleset
.
Is there a function/procedure which replicates the segmentation perfectly?
Here is the example from the README to ground what I mean by the original text
and sortedtripleset
.
...
<lex comment="good" lid="Id1">
<!-- ordered tripleset segmented in sentences -->
<sortedtripleset>
<sentence ID="1">
<striple>11th_Mississippi_Infantry_Monument | location | Adams_County,_Pennsylvania</striple>
</sentence>
<sentence ID="2">
<striple>11th_Mississippi_Infantry_Monument | established | 2000</striple>
<striple>11th_Mississippi_Infantry_Monument | category | Contributing_property</striple>
</sentence>
</sortedtripleset>
<!-- extracted referring expressions -->
<references>
<reference entity="11th_Mississippi_Infantry_Monument" number="1" tag="AGENT-1" type="description">The 11th Mississippi Infantry Monument</reference>
<reference entity="Adams_County,_Pennsylvania" number="2" tag="PATIENT-1" type="name">Adams County , Pennsylvania</reference>
<reference entity="11th_Mississippi_Infantry_Monument" number="3" tag="AGENT-1" type="pronoun">It</reference>
<reference entity="2000" number="4" tag="PATIENT-2" type="name">2000</reference>
<reference entity="Contributing_property" number="5" tag="PATIENT-3" type="name">contributing property</reference>
</references>
<!-- original text -->
<text>
The 11th Mississippi Infantry Monument which is located in Adams County, Pennsylvania. It was established in 2000 and falls under the category of contributing property.
</text>
...
Some entities contains apostrophes in their original form (e.g., "Hook_'em_(mascot)"), but are represented without this symbol in the tags and .
Example:
https://github.com/ThiagoCF05/webnlg/blob/master/data/v1.5/en/dev/1triples/Astronaut.xml#L460
Reported by @abevieiramota.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.