Coder Social home page Coder Social logo

foca / csvreader Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pedrozath/csvreader

0.0 1.0 0.0 85 KB

csvreader library / gem - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)

License: Creative Commons Zero v1.0 Universal

Ruby 100.00%

csvreader's Introduction

csvreader - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)

Usage

line = "1,2,3"
values = CsvReader.parse_line( line )
pp values
# => ["1","2","3"]

or use the convenience helpers:

txt <<=TXT
1,2,3
4,5,6
TXT

records = CsvReader.parse( txt )
pp records
# => [["1","2","3"],
#     ["5","6","7"]]

# -or-

records = CsvReader.read( "values.csv" )
pp records
# => [["1","2","3"],
#     ["5","6","7"]]

# -or-

CsvReader.foreach( "values.csv" ) do |rec|
  pp rec
end
# => ["1","2","3"]
# => ["5","6","7"]

What about headers?

Use the CsvHashReader if the first line is a header (or if missing pass in the headers as an array) and you want your records as hashes instead of arrays of strings. Example:

txt <<=TXT
A,B,C
1,2,3
4,5,6
TXT

records = CsvHashReader.parse( txt )
pp records

# -or-

txt2 <<=TXT
1,2,3
4,5,6
TXT

records = CsvHashReader.parse( txt2, headers: ["A","B","C"] )
pp records

# => [{"A": "1", "B": "2", "C": "3"},
#     {"A": "4", "B": "5", "C": "6"}]

# -or-

records = CsvHashReader.read( "hash.csv" )
pp records
# => [{"A": "1", "B": "2", "C": "3"},
#     {"A": "4", "B": "5", "C": "6"}]

# -or-

CsvHashReader.foreach( "hash.csv" ) do |rec|
  pp rec
end
# => {"A": "1", "B": "2", "C": "3"}
# => {"A": "4", "B": "5", "C": "6"}

Frequently Asked Questions (FAQ) and Answers

Q: What's CSV the right way? What best practices can I use?

Use best practices out-of-the-box with zero-configuration. Do you know how to skip blank lines or how to add # single-line comments? Or how to trim leading and trailing spaces? No worries. It's turned on by default.

Yes, you can. Use

#######
# try with some comments
#   and blank lines even before header (first row)

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%

Bayerische Staatsbrauerei Weihenstephan,  Freising,  Hefe Weissbier,   5.4%
Brauerei Spezial,                         Bamberg,   Rauchbier Märzen, 5.1%
Hacker-Pschorr Bräu,                      München,   Münchner Dunkel,  5.0%
Staatliches Hofbräuhaus München,          München,   Hofbräu Oktoberfestbier, 6.3%

instead of strict "classic" (no blank lines, no comments, no leading and trailing spaces, etc.):

Brewery,City,Name,Abv
Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
Augustiner Bräu München,München,Edelstoff,5.6%
Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%

Q: How can I change the separator to semicolon (;) or pipe (|)?

Pass in the sep keyword option. Example:

CsvReader.parse_line( ..., sep: ';' )
CsvReader.parse( ..., sep: ';' )
CsvReader.read( ..., sep: ';' )
# ...
CsvReader.parse_line( ..., sep: '|' )
CsvReader.parse( ..., sep: '|' )
CsvReader.read( ..., sep: '|' )
# ...
# and so on

Note: If you use tab (\t) use the TabReader! Why? Tab =! CSV. Yes, tab is its own (even) simpler format (e.g. no escape rules, no newlines in values, etc.), see TabReader ».

Q: What's broken in the standard library CSV reader?

Two major design bugs and many many minor.

(1) The CSV class uses line.split(',') with some kludges (†) with the claim it's faster. What?! The right way: CSV needs its own purpose-built parser. There's no other way you can handle all the (edge) cases with double quotes and escaped doubled up double quotes. Period.

For example, the CSV class cannot handle leading or trailing spaces for double quoted values 1,•"2","3"•. Or handling double quotes inside values and so on and on.

(2) The CSV class returns nil for ,, but an empty string ("") for "","","". The right way: All values are always strings. Period.

If you want to use nil you MUST configure a string (or strings) such as NA, n/a, \N, or similar that map to nil.

(†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain

Appendix: Simple examples the standard csv library cannot read:

Quoted values with leading or trailing spaces e.g.

1, "2","3" , "4" ,5

=>

["1", "2", "3", "4" ,"5"]

"Auto-fix" unambiguous quotes in "unquoted" values e.g.

value with "quotes", another value

=>

["value with \"quotes\"", "another value"]

and some more.

Alternatives

See the Libraries & Tools section in the Awesome CSV page.

License

The csvreader scripts are dedicated to the public domain. Use it as you please with no restrictions whatsoever.

Questions? Comments?

Send them along to the wwwmake forum. Thanks!

csvreader's People

Contributors

geraldb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.