Coder Social home page Coder Social logo

speech-nlp-datasets's Introduction

speech-nlp-datasets

Contains links to publicly available datasets for modeling various health outcomes using speech and language.

Speech-based Corpora

TalkBank Project

  • [Corpus] CHILDES Database
    Contains speech of children with different conditions (e.g. Autism, Down's syndrome, hearing impairment) and across different languages (e.g. English, Dutch, Greek, Mandarin).
    MacWhinney, B. (2014). The CHILDES project: Tools for analyzing talk, Volume II: The database. Psychology Press.

  • [Corpus] DementiaBank (from TalkBank)
    Contains recordings of individuals with dementia across different languages. Includes around 400 subjects, most notable in size and containing control subjects is:

    • English Pitt: Longitudinal neuropsychological assessments of 319 subjects (dementia + control) performing Cookie Theft, Word Fluency, Story Recall, and Sentence Construction task. (Becker et al., 1994)
  • [Corpus] Clinical TalkBank
    In addition to DementiaBank, TalkBank contains:

    • RHDBank individuals with Right-Hemisphere Disorder
    • TBIBank individuals with Traumatic Brain Injury
    • AphasiaBank a communication disorder affecting ability to speak, write, and understand language due to some trauma to language parts of the brain.
    • FluencyBank contains individuals with language disfluencies due to being a second language learner, or due to stuttering.

Text-based Corpora

  • [Corpus] Reddit Self-reported Depression Diagnosis (RSDD) dataset
    Contains Reddit posts for ~9,000 users with a claim to depression and ~107,000 control users. (Yates et al., (2017))

  • [Corpus] MIMIC III (Medical Information Mart for Intensive Care)
    Contains medical details and outcomes of 40,000+ patients (e.g. demographics, vital signs, laboratory tests, medications) as well as 2M+ free-text written medical notes from medical personnel (e.g. physicians, nurses, etc.). (Johnson et al., (2016)).

  • i2b2/UTHealth NLP Task (contact authors for corpus?)
    Contains emergency medical records for 296 patients at Partners HealthCare and medical discharge and correspondance notes between medical personnel. Kumar et al., (2014) describes how the data was processed, and Stubbs et al. (2014) describes the 2014 task of identifying risk factors for heart disease over time.

  • Nun Study (contact authors for corpus?)
    Diaries of 93 nuns to used to evaluate cognitive impairment (Alzheimer's disease) in later life. Also contains neuropsychology tests and autopsy information. Study was authored by (Snowdon et al.,(1996))

speech-nlp-datasets's People

Contributors

mak-sim avatar talhanai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

speech-nlp-datasets's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.