Coder Social home page Coder Social logo

webnlg's People

Contributors

abevieiramota avatar thiagocf05 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

webnlg's Issues

Delexicalized Test Set

The full test set including references was released a few months ago.
I think it would be beneficial for completeness sake to also delexicalize that data.
Reasoning:

  • It is a better test set for testing referring expressions - for both seen and unseen entities.
  • For systems who decouple the referring expressions generation from the sentence realization phase, this will make testing parts of systems simpler.

Coverage tests

Thanks for doing all this.

I have a question regarding coverage, and if you tested your manual work's coverage.

Looking at train, 7triplets, first sentence (first file I open), I see in the first sentence:

AGENT-1 was born in PATIENT-4 and is from the U.S. . AGENT-1 graduated in 1955 from PATIENT-3 . AGENT-1 worked as PATIENT-2 and for NASA in PATIENT-6 . AGENT-1 spent PATIENT-5 in space and is now retired .

U.S. should be replaced with PATIENT-1, the entire in 1955 from UT Austin with a B.S with PATIENT-3 and retired should be replaced with PATIENT-7.

Would you say that hese kind of problems are to be expected? Did you do any coverage test to make sure you didn't leave anything? (for 2 of these cases above, an automated test can catch them)

Nice Work! And here I made a Python Reader :).

Dear authors,

I really like your enriched WebNLG, and admire your efforts on updating it to the newest v1.5! Sometimes it is hard for people who want to gets into the datasets quickly because of the xml format. Both transforming the format into a more user-friendly Python dictionary, and cleaning the dataset needs meticulous efforts.

I made this data reader for my own research project: WebNLG Reader. I wish to share this with you for better spread of your work. If there are future Python programmers who wants to use your dataset, they can easily adapt from my code and kick off projects more easily.

All in all, great work :D!

Nice dataset! Question regarding: Segmentation sentences in the alignment between "sortedtripleset" and original text

Thank you for making the WebNLG dataset with the alignment available!

We would like to align sentences in the original text and the triples in sortedtripleset.

Is there a function/procedure which replicates the segmentation perfectly?

Here is the example from the README to ground what I mean by the original text and sortedtripleset.

...
<lex comment="good" lid="Id1">
        <!-- ordered tripleset segmented in sentences -->
        <sortedtripleset>
            <sentence ID="1">
                <striple>11th_Mississippi_Infantry_Monument | location | Adams_County,_Pennsylvania</striple>
            </sentence>
            <sentence ID="2">
                <striple>11th_Mississippi_Infantry_Monument | established | 2000</striple>
                <striple>11th_Mississippi_Infantry_Monument | category | Contributing_property</striple>
            </sentence>
        </sortedtripleset>
        <!-- extracted referring expressions -->
        <references>
            <reference entity="11th_Mississippi_Infantry_Monument" number="1" tag="AGENT-1" type="description">The 11th Mississippi Infantry Monument</reference>
            <reference entity="Adams_County,_Pennsylvania" number="2" tag="PATIENT-1" type="name">Adams County , Pennsylvania</reference>
            <reference entity="11th_Mississippi_Infantry_Monument" number="3" tag="AGENT-1" type="pronoun">It</reference>
            <reference entity="2000" number="4" tag="PATIENT-2" type="name">2000</reference>
            <reference entity="Contributing_property" number="5" tag="PATIENT-3" type="name">contributing property</reference>
        </references>
        <!-- original text -->
        <text>
            The 11th Mississippi Infantry Monument which is located in Adams County, Pennsylvania. It was established in 2000 and falls under the category of contributing property.
        </text>
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.