Coder Social home page Coder Social logo

Comments (3)

amir-zeldes avatar amir-zeldes commented on August 29, 2024

Thanks for reporting - some of these are definitely wrong; it seems mostly in the XML speaker list, rather than the coref clustering. This is especially true for GUM_conversation_christmas, which is very complex. I was able to fix some of these in 85b042d now.

However the entire zip files contain too many false positives to go over manually, esp. since, as you guessed, generic 'you' (meaning "one") is not clustered with actual referential 'you'. Is there some way you could produce a narrower list of candidates for any further errors? At a minimum, I think "you know" should be assumed to be generic, this is almost always true, unless followed by a complement.

BTW if you're curious, you can find the original speaker info for the conversation data in the Santa Barbara Corpus, for example: https://www.linguistics.ucsb.edu/sites/secure.lsit.ucsb.edu.ling.d7/files/sitefiles/research/SBC/SBC048.trn . The addressee info was filled in by annotators based on their understanding of the conversation.

from gum.

kybersutr avatar kybersutr commented on August 29, 2024

I've filtered out the "you know" occurences:
GUM_speaker_inverse2.zip
GUM_speaker2.zip

Also, in the speaker_inverse, I've removed the entities containing multiple people.

However, I don't know about any simple way to differentiate between generic and referential you. Also sometimes there is indirect speech in the text, which also generates some false positives, and which I also cannot detect.

from gum.

amir-zeldes avatar amir-zeldes commented on August 29, 2024

This took a while to get to, but the speaker inverse cases should now be resolved in the source files. The compiled files in the remaining formats incl. conllu will propagate on the next release. Thanks again for reporting the issue!

from gum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.