Coder Social home page Coder Social logo

size-of-an-x's Introduction

Looking at the changing frequencies of different size analogies in Google Books ngram data. For more context, check out my blog post.

sizeofthings.csv has a processed version of the data if you don't want to download several gb of raw ngram data and run the various munging scripts. The raw_total column is just the raw number of times size of $token appeared in books from 1800-2008 (it is occasionally fractional because of some trigram hacks described in the appendix of the blog post). The total column (and per-century total columns) are normalized by number of words scanned per year.

Code is pretty messy. Happy to explain/document it on request. If you're interested in extending/reusing some of this work, just drop me a line.

Rough steps for end-to-end repro:

  • Download 'si' 4grams and 5grams files and unzip
    • also, downloaded "as" 5grams, grepped for "as a man 's X" and "as a grain of X" to asamans.tsv and asagrain.tsv. Which are used in disambig.py called by make_df.py.
  • Make sizeof.tsv and sizeof_5.tsv by grepping for '^size of an? ' (grep.sh)
  • run make_df.py
  • do stuff with the resulting dataframe/csv thing (see ipython notebooks)

size-of-an-x's People

Contributors

colinmorris avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

tarun-ssharma

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.