ariafallah / csv-parser Goto Github PK
View Code? Open in Web Editor NEWFast, header-only, extensively tested, C++11 CSV parser
License: MIT License
Fast, header-only, extensively tested, C++11 CSV parser
License: MIT License
Hi
I am trying:
std::ifstream f(pathInfoDB);
aria::csv::CsvParser parser(f);
for (auto& row : parser) {
for (auto& field : row) {
CString a(field.c_str());
::OutputDebugString(a);
}
}
But my files are UTF8-BOM and as a result I am getting gibberish at the beginning. How do I handle?
Hi, thanks for the nice work. I'm using the csv-parser since I while and it works well. However with the latest update there is a crash with clang-11 on Ubuntu 20.04. I could trace it in debugger to the following line:
Line 66 in 544c764
The code crashes with SIGSEGV. I could not immediately see whats wrong.
The first row in my CSV has the headings. Can we access the values based on the headings?
Hello Aria,
you have produced nice code.
What is the license of your code?
Public Domain, GPL, BSD, LGPL, MIT, ...?
I couldn't get any information on it.
Thanks
Sorry, my English is not good.
I have a csv file with two empty lines.
when I run the below code:
auto field = csv.next_field();
std::cout << (field.type == FieldType::DATA) << std::endl;
The result is 1 (true)
.
I think field.type
should be FieldType::ROW_END
instead of FieldType::DATA
.
explicit CsvParser(std::istream& input) : m_input(input) {
// Reserve space upfront to improve performance
m_fieldbuf.reserve(FIELDBUF_CAP);
if (!m_input.good()) {
throw std::runtime_error("Something is wrong with input stream");
}
}
Using this example CSV:
F1 | F2 | F3 | F4 | F5 | F6 | F7 |
---|---|---|---|---|---|---|
a | b | c | d | e | f | |
A | B | C | D | E | F | G |
F1,F2,F3,F4,F5,F6,F7
a,b,c,d,e,f,
A,B,C,D,E,F,G
Each row is supposed to contain 7 columns. However, the parser will only return 6 columns for the first data row. This seems to be caused by the fact that the last field is empty, i.e. the comma is directly followed by a CR LF.
This should not happen since according to RFC4180 The last field in the record must not be followed by a comma
. Therefore the last, empty, field should be recognized as such instead of being skipped.
It only happens with unquoted CSV. a,b,c,d,e,f,""
would parse properly for example.
I was able to fix this issue locally by adjusting the finite state machine
case State::START_OF_FIELD:
m_cursor++;
if (c == m_terminator) {
handle_crlf(c);
return Field(FieldType::ROW_END);
}
to
case State::START_OF_FIELD:
m_cursor++;
if (c == m_terminator) {
handle_crlf(c);
m_state = State::END_OF_ROW;
return Field(m_fieldbuf);
}
From what I understand this should only handle the special case of a CR LF directly after a separator char, i.e. the case that seems to be handled improperly.
See title.
I have tested about 25 different CSV parsers and this one was the best performing of them all. Thank you for writing a parser that is efficient.
When constructing a parser a std::istream
referenced is used and stored. However, the lifecycle of the std::istream
can be shorter than the parser, leaving it with a stale reference.
This won't affect small files (<= INPUTBUF_CAP
), since those will be buffered on construction. However, with larger files it will continuously attempt to read from the stream. If the reference has become stale at that point it will result in an error.
This can for example happen when a Parser is constructed and stored as a class member with a locally scoped std::ifstream
.
struct Test {
std::unique_ptr<aria::csv::Parser> parser;
Test(std::string path) {
std::ifstream stream(path, std::ios::in);
parser = std::make_unique<area::csv::Parser>(stream);
}
};
The problem can be avoided by making sure the std::istream
doesn't go out of scope before the parser, making this error mostly a user error.
However, I think for a future revision it might be worthwhile to have the parser take possession of the passed stream, for example in form of a smart pointer.
Until then this issue might serve as help for the next poor fellow who falls into that trap. ;)
To fix this problem
Change
else if (!inquotes && (c == '\r' || c == '\n'))
to
else if (!inquotes && (c == '\r' || c == '\n' || in.eof()))
:)
parser_test.cpp:4:10: fatal error: './catch.hpp' file not found
#include "./catch.hpp"
^~~~~~~~~~~~~
1 error generated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.