Coder Social home page Coder Social logo

ariafallah / csv-parser Goto Github PK

View Code? Open in Web Editor NEW
139.0 9.0 29.0 997 KB

Fast, header-only, extensively tested, C++11 CSV parser

License: MIT License

C++ 99.58% Makefile 0.05% JavaScript 0.04% Rust 0.12% CMake 0.22%
cpp csv parser csv-parser cpp11

csv-parser's Issues

Parsing UTF8 BOM files

Hi

I am trying:

	std::ifstream f(pathInfoDB);
	aria::csv::CsvParser parser(f);

	for (auto& row : parser) {
		for (auto& field : row) {
			CString a(field.c_str());
			::OutputDebugString(a);
		}
	}

But my files are UTF8-BOM and as a result I am getting gibberish at the beginning. How do I handle?

Crash in assignment of `std::unique_ptr<char[]>`

Hi, thanks for the nice work. I'm using the csv-parser since I while and it works well. However with the latest update there is a crash with clang-11 on Ubuntu 20.04. I could trace it in debugger to the following line:

std::unique_ptr<char[]> m_inputbuf = std::unique_ptr<char[]>(new char[INPUTBUF_CAP]{});

The code crashes with SIGSEGV. I could not immediately see whats wrong.

What license is the code under?

Hello Aria,
you have produced nice code.
What is the license of your code?
Public Domain, GPL, BSD, LGPL, MIT, ...?

I couldn't get any information on it.

Thanks

field_object.type is wrong

Sorry, my English is not good.

I have a csv file with two empty lines.

when I run the below code:

auto field = csv.next_field();
std::cout << (field.type == FieldType::DATA) << std::endl;

The result is 1 (true).
I think field.type should be FieldType::ROW_END instead of FieldType::DATA.

Something is wrong with input stream

        explicit CsvParser(std::istream& input) : m_input(input) {
            // Reserve space upfront to improve performance
            m_fieldbuf.reserve(FIELDBUF_CAP);
            if (!m_input.good()) {
                throw std::runtime_error("Something is wrong with input stream");
            }
        }

Empty fields at end of row are omitted

Using this example CSV:

F1 F2 F3 F4 F5 F6 F7
a b c d e f
A B C D E F G
F1,F2,F3,F4,F5,F6,F7
a,b,c,d,e,f,
A,B,C,D,E,F,G

Each row is supposed to contain 7 columns. However, the parser will only return 6 columns for the first data row. This seems to be caused by the fact that the last field is empty, i.e. the comma is directly followed by a CR LF.

This should not happen since according to RFC4180 The last field in the record must not be followed by a comma. Therefore the last, empty, field should be recognized as such instead of being skipped.

It only happens with unquoted CSV. a,b,c,d,e,f,"" would parse properly for example.


I was able to fix this issue locally by adjusting the finite state machine

case State::START_OF_FIELD:
	m_cursor++;
	if (c == m_terminator) {
		handle_crlf(c);
		return Field(FieldType::ROW_END);
	}

to

case State::START_OF_FIELD:
	m_cursor++;
	if (c == m_terminator) {
		handle_crlf(c);
		m_state = State::END_OF_ROW;
		return Field(m_fieldbuf);
	}

From what I understand this should only handle the special case of a CR LF directly after a separator char, i.e. the case that seems to be handled improperly.

Thank You! Fastest CSV parser

I have tested about 25 different CSV parsers and this one was the best performing of them all. Thank you for writing a parser that is efficient.

Parser can fail for large files (>128 KiB) due to stale reference

Description

When constructing a parser a std::istream referenced is used and stored. However, the lifecycle of the std::istream can be shorter than the parser, leaving it with a stale reference.

This won't affect small files (<= INPUTBUF_CAP), since those will be buffered on construction. However, with larger files it will continuously attempt to read from the stream. If the reference has become stale at that point it will result in an error.

This can for example happen when a Parser is constructed and stored as a class member with a locally scoped std::ifstream.

Sample Code

struct Test {
  std::unique_ptr<aria::csv::Parser> parser;
  
  Test(std::string path) {
    std::ifstream stream(path, std::ios::in);
    parser = std::make_unique<area::csv::Parser>(stream);
  }
};

Solutions/Workarounds

The problem can be avoided by making sure the std::istream doesn't go out of scope before the parser, making this error mostly a user error.

However, I think for a future revision it might be worthwhile to have the parser take possession of the passed stream, for example in form of a smart pointer.

Until then this issue might serve as help for the next poor fellow who falls into that trap. ;)

'./catch.hpp' file not found

parser_test.cpp:4:10: fatal error: './catch.hpp' file not found
#include "./catch.hpp"
^~~~~~~~~~~~~
1 error generated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.