Coder Social home page Coder Social logo

loveleon / csv-parser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vincentlaucsb/csv-parser

0.0 1.0 0.0 8.33 MB

A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

License: MIT License

CMake 0.27% Makefile 0.23% C++ 99.40% Python 0.10%

csv-parser's Introduction

Vince's CSV Parser

Build Status codecov

Motivation

There's plenty of other CSV parsers in the wild, but I had a hard time finding what I wanted. Specifically, I wanted something which had an interface similar to Python's csv module. Furthermore, I wanted support for special use cases such as calculating statistics on very large files. Thus, this library was created with these following goals in mind:

Performance

This CSV parser uses multiple threads to simulatenously pull data from disk and parse it. Furthermore, it is capable of incremental streaming (parsing larger than RAM files), and quickly parsing data types.

RFC 4180 Compliance

This CSV parser is much more than a fancy string splitter, and follows every guideline from RFC 4180. On the other hand, it is also robust and capable of handling deviances from the standard. An optional strict parsing mode can be enabled to sniff out errors in files.

Easy to Use and Well-Documented

In additon to being easy on your computer's hardware, this library is also easy on you--the developer. Some helpful features include:

  • Decent ability to guess the dialect of a file (CSV, tab-delimited, etc.)
  • Ability to handle common deviations from the CSV standard, such as inconsistent row lengths, and leading comments
  • Ability to manually set the delimiter and quoting character of the parser

Well Tested

In addition to using modern C++ features to build a memory safe parser while still performing well, this parser has a extensive test suite.

All of this library's essentials are located under src/, with no dependencies aside from the STL. This is a C++17 library developed using Microsoft Visual Studio and compatible with g++ and clang. The CMakeList and Makefile contain instructions for building the main library, some sample programs, and the test suite.

GCC/Clang Compiler Flags: -pthread -O3 -std=c++17

CMake Instructions

If you're including this in another CMake project, you can simply clone this repo into your project directory, and add the following to your CMakeLists.txt:

include(${CMAKE_SOURCE_DIR}/.../csv-parser/CMakeLists.txt)

# ...

add_executable(<your program> ...)
target_link_libraries(<your program> csv)

Thirty-Second Introduction to Vince's CSV Parser

  • Parsing CSV Files from..
    • Files: csv::CSVReader(filename)
    • In-Memory Sources:
      • Small: csv::parse() or csv::operator""_csv();
      • Large: csv::CSVReader::feed();
  • Retrieving Parsed CSV Rows (from CSVReader)
    • csv::CSVReader::iterator (supports range-based for loop)
    • csv::CSVReader::read_row()
  • Working with CSV Rows
    • Index by number or name: csv::CSVRow::operator
    • Random access iterator: csv::CSVRow::iterator
    • Conversion: csv::CSVRow::operator std::vectorstd::string();
  • Calculating Statistics
    • Files: csv::CSVStat(filename)
    • In-Memory: csv::CSVStat::feed()
  • Utility Functions
    • Return column names: get_col_names()
    • Return the position of a column: get_col_pos();
    • Return column types (for uploading to a SQL database): csv_data_types();

Features & Examples

Reading a Large File (with Iterators)

With this library, you can easily stream over a large file without reading its entirety into memory.

C++ Style

# include "csv_parser.hpp"

using namespace csv;

...

CSVReader reader("very_big_file.csv");

for (CSVRow& row: reader) { // Input iterator
    for (CSVField& field: row) {
        // For efficiency, get<>() produces a string_view
        std::cout << field.get<>() << ...
    }
}

...

Old-Fashioned C Style Loop

...

CSVReader reader("very_big_file.csv");
CSVRow row;
 
while (reader.read_row(row)) {
    // Do stuff with row here
}

...

Indexing by Column Names

Retrieving values using a column name string is a cheap, constant time operation.

# include "csv_parser.hpp"

using namespace csv;

...

CSVReader reader("very_big_file.csv");
double sum = 0;

for (auto& row: reader) {
    // Note: Can also use index of column with [] operator
    sum += row["Total Salary"].get<double>();
}

...

Type Conversions

If your CSV has lots of numeric values, you can also have this parser (lazily) convert them to the proper data type. Type checking is performed on conversions to prevent undefined behavior.

# include "csv_parser.hpp"

using namespace csv;

...

CSVReader reader("very_big_file.csv");

for (auto& row: reader) {
    if (row["timestamp"].is_int()) {
        row["timestamp"].get<int>();
        
        // ..
    }
}

Specifying a Specific Delimiter, Quoting Character, etc.

Although the CSV parser has a decent guessing mechanism, in some cases it is preferrable to specify the exact parameters of a file.

# include "csv_parser.hpp"
# include ...

using namespace csv;

CSVFormat format = {
    '\t',    // Delimiter
    '~',     // Quote-character
    '2',     // Line number of header
    {}       // Column names -- if empty, then filled by reading header row
};

CSVReader reader("wierd_csv_dialect.csv", {}, format);

for (auto& row: reader) {
    // Do stuff with rows here
}

Parsing an In-Memory String

# include "csv_parser.hpp"

using namespace csv;

...

// Method 1: Using parse()
std::string csv_string = "Actor,Character\r\n"
    "Will Ferrell,Ricky Bobby\r\n"
    "John C. Reilly,Cal Naughton Jr.\r\n"
    "Sacha Baron Cohen,Jean Giard\r\n";

auto rows = parse(csv_string);
for (auto& r: rows) {
    // Do stuff with row here
}
    
// Method 2: Using _csv operator
auto rows = "Actor,Character\r\n"
    "Will Ferrell,Ricky Bobby\r\n"
    "John C. Reilly,Cal Naughton Jr.\r\n"
    "Sacha Baron Cohen,Jean Giard\r\n"_csv;

for (auto& r: rows) {
    // Do stuff with row here
}

Writing CSV Files

# include "csv_writer.hpp"
# include ...

using namespace csv;
using vector;
using string;

...

std::stringstream ss; // Can also use ifstream, etc.
auto writer = make_csv_writer(ss);
writer << vector<string>({ "A", "B", "C" })
    << vector<string>({ "I'm", "too", "tired" })
    << vector<string>({ "to", "write", "documentation" });
    
...

Contributing

Bug reports, feature requests, and so on are always welcome. Feel free to leave a note in the Issues section.

csv-parser's People

Contributors

bryceschober avatar gexclaude avatar nlohmann avatar vincentlaucsb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.