Coder Social home page Coder Social logo

ngramr's Introduction

ngramr

R package to query the Google Ngram Viewer

The package has been updated to deal with the change to Google's website.

Note: with the switch to using RCurl to access SSL pages, ngramr will generally no longer work behind a proxy.

The Google Books Ngram Viewer allows you to enter a list of phrases and then displays a graph showing how often the phrases have occurred in a large corpus of books (e.g., "British English", "English Fiction", "French") over time. The current corpus collected in 2012 contains almost half a trillion words for English alone.

The underlying data is hidden in Web page, embedded in some Javascript. This package extracts the data and provides it in the form of an R dataframe. The code was adapted from a handy Python script available from Culturomics. It was written by Jean-Baptiste Michel.

Installing

This package required R version 2.15 or higher. If you are using an older version of R you will be prompted to upgrade when you try to install the package, so you may as well upgrade now!

The official release of ngramr is available on CRAN. To istall from CRAN, use the following command:

install.packages('ngramr')

If you have any problems installing the package on OSX, try installing from source:

install.packages("ngramr", type="source")

If you have devtools installed, install the latest stable version this package directly from GitHub:

require(devtools)
install_github("ngramr", "seancarmody")
require(ngramr)

and if you are feeling a little more adventurous, you can install the development version:

install_github("ngramr", "seancarmody", "develop")

although it may not always work.

If you are behind a proxy, install_github may not work for you. Instead of fiddling around with the RCurl proxy settings, you can download the ZIP archive and use install_local instead.

Examples

Here is an example of how to use the ngram function:

require(ggplot2)
ng  <- ngram(c("hacker", "programmer"), year_start = 1950)
ggplot(ng, aes(x=Year, y=Frequency, colour=Phrase)) +
  geom_line()

The result is a ggplot2 line graph of the following form:

Ngram Chart

The same result can be achieved even more simply by using the ggram plotting wrapper that supports many options, as in this example:

Ngram chart, with options

require(ggplot2)
ggram(c("monarchy", "democracy"), year_start = 1500, year_end = 2000, 
      corpus = "eng_gb_2012", ignore_case = TRUE, 
      geom = "area", geom_options = list(position = "stack")) + 
      labs(y = NULL)

The colors used by Google Ngram are available through the google_theme option, as in this example posted by Ben Zimmer at Language Log:

Ngram chart, with Google theme

require(ggplot2)
ng <- c("((The United States is + The United States has) / The United States)",
      "((The United States are + The United States have) / The United States)")
ggram(ng, year_start = 1800, google_theme = TRUE) +
  theme(legend.direction = "vertical")

Further Reading

For more information, read this Stubborn Mule post and the Google Ngram syntax documentation. If you would rather work with R and SQL on the raw Google Ngram datasets, see this post.

ngramr's People

Contributors

briatte avatar empee584 avatar karafso avatar seancarmody avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.