Coder Social home page Coder Social logo

scriptbase's Introduction

The ScriptBase Corpus

If you use this data, please cite either or both of the following papers:

If you are using ScriptBase-alpha:

Philip John Gorinski and Mirella Lapata (2015). Movie Script Summarization as Graph-based Scene Extraction. In Proceedings of NAACL-HLT 2015, Denver, Colorado, USA.

If you are using ScriptBase-j:

Philip John Gorinski and Mirella Lapata (2018). What's this Movie about? A Joint Neural Network Architecture for Movie Content Analysis. In Proceedings of NAACL-HLT 2018, New Orleans, Louisiana, USA.

The ScriptBase Corpus is split in two parts:

ScriptBase-alpha: The first crawl of movie scripts

ScriptBase-J: Additional meta data from Jinni

ScriptBase-alpha can be found in the scriptbase_alpha folder.

It contains .tar.gz archives with the following data for 1,276 movies:

  • script.htm / script.html - in cases where the script was crawled from a web-page in HTML format
  • script.txt - plain-text version of the movie script
  • wiki.html - the movie's Wikipedia[1] page (2014 dump)
  • imdb.html - the movie's main IMDB[2] page (2014 dump)
  • keywords.html - the movie's IMDB keywords page
  • credits.html - the movie's IMDB credits page
  • summary.html - the movie's IMDB summaries page
  • synopsis.html - the movie's IMDB synopsis page
  • taglines.html - the movie's IMDB taglines page
  • processed/imdb_meta - meta data extracted from IMDB
  • processed/logTag.txt - the movie's log line(s) and tag line(s), if it has any
  • processed/wikiplot.txt - plain-text version of Wikipedia's plot section for the movie
  • processed/summaries/ - folder containing plain-text versions of the movie's IMDB summaries (if any)

ScriptBase-J can be found in the scriptbase_jinni folder.

It contains .tar.gz archives with the following additional data for 917 movies:

  • jinni.html - the movie's Jinni[3] page (2015 dump)
  • processed/script_clean.txt - plain-text version of the movie script, manually corrected for inconsistencies
  • processed/script.xml - XML version of the movie script, with various automatic annotations
  • processed/profile.txt - Jinni's movie profile in plain-text format
  • processed/genes.txt - all Jinni genes (attribute-value pairs) for the movie

References

[1] https://en.wikipedia.org/

[2] https://www.imdb.com/

[3] http://www.jinni.com/

scriptbase's People

Contributors

xylankant avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.