Coder Social home page Coder Social logo

readr's Introduction

readr

CRAN_Status_Badge Build Status Coverage Status

The goal of readr is to provide a fast and friendly way to read tabular data into R. The most important functions are:

  • Read delimited files: read_delim(), read_csv(), read_tsv(), read_csv2().
  • Read fixed width files: read_fwf(), read_table().
  • Read lines: read_lines().
  • Read whole file: read_file().
  • Re-parse existing data frame: type_convert().

Installation

readr is now available from CRAN.

install.packages("readr")

You can try out the dev version with:

# install.packages("devtools")
devtools::install_github("hadley/readr")

Usage

library(readr)
library(dplyr)

mtcars_path <- tempfile(fileext = ".csv")
write_csv(mtcars, mtcars_path)

# Read a csv file into a data frame
read_csv(mtcars_path)
# Read lines into a vector
read_lines(mtcars_path)
# Read whole file into a single string
read_file(mtcars_path)

See vignette("column-types") on how readr parses columns, and how you can override the defaults.

Output

read_csv() produces a data frame with the following properties:

  • Characters are never automatically converted to factors (i.e. no more stringsAsFactors = FALSE).

  • Column names are left as is, not munged into valid R identifiers (i.e. there is no check.names = TRUE).

  • The data frame is given class c("tbl_df", "tbl", "data.frame") so if you also use dplyr you'll get an enhanced display.

  • Row names are never set.

Problems

If there are any problems parsing the file, the read_ function will throw a warning telling you how many problems there are. You can then use the problems() function to access a data frame that gives information about each problem:

df <- read_csv(col_types = "dd", col_names = c("x", "y"), skip = 1, "
1,2
a,b
")
#> Warning message: There were 2 problems. See problems(x) for more details
problems(df)
#>   row col expected actual
#> 1   2   1 a double      a
#> 2   2   2 a double      b

It's likely that there will be cases that you can never load without some manual regexp-based munging in R. Load those columns with col_character(), fix them up as needed, then use convert_types() to re-run the automated conversion on every character column in the data frame. Alternatively, you can use parse_integer(), parse_numeric(), parse_date() etc to parse a single character vector at a time.

Compared to base functions

Compared to the corresponding base functions, readr functions:

  • Use a consistent naming scheme for the parameters (e.g. col_names and col_types not header and colClasses).

  • Are much faster (up to 10x faster).

  • Have a helpful progress bar if loading is going to take a while.

  • All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use locale().

Compared to fread()

data.table has a function similar to read_csv() called fread. Compared to fread, readr:

  • Is slower (currently ~1.2-2x slower. If you want absolutely the best performance, use data.table::fread().

  • Readr has a slightly more sophisticated parser, recognising both doubled ("""") and backslash escapes ("""). Readr allows you to read factors and date times directly from disk.

  • fread() saves you work by automatically guessing the delimiter, whether or not the file has a header, how many lines to skip by default and more. Readr forces you to supply these parameters.

  • The underlying designs are quite different. Readr is designed to be general, and dealing with new types of rectangular data just requires implementing a new tokenizer. fread() is designed to be as fast as possible. fread() is pure C, readr is C++ (and Rcpp).

Acknowledgements

Thanks to:

  • Joe Cheng for showing me the beauty of deterministic finite automata for parsing, and for teaching me why I should write a tokenizer.

  • JJ Allaire for helping me come up with a design that makes very few copies, and is easy to extend.

  • Dirk Eddelbuettel for coming up with the name!

readr's People

Contributors

hadley avatar romainfrancois avatar npjc avatar edwindj avatar ironholds avatar asnr avatar bearloga avatar dewittpe avatar kbenoit avatar kmillar avatar dgromer avatar tonyladson avatar uribo avatar lluisramon avatar johnmcdonnell avatar ijlyttle avatar christophergandrud avatar alyst avatar dickoa avatar

Watchers

James Cloos avatar Rene  Welch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.