vincentlaucsb / csv-parser Goto Github PK
View Code? Open in Web Editor NEWA modern C++ library for reading, writing, and analyzing CSV (and similar) files.
License: MIT License
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.
License: MIT License
CSVRow
that converts it to a valid JSON string.
For example, if there's a row with column names "Artist" and "Album" with entries "Florida Georgia Line" and "Here's To The Good Times" respectively, then the output should be
{"Artist":"Florida Georgia Line","Album":"Here's To The Good Times"}
We may consider using another C++ library to handle JSON serializing instead of implementing it ourselves. That library can then be added as an optional dependency, e.g. users will need to have that library to use JSON serialization. For unit tests, that library can be included under the /tests/
directory.
Suggested library: https://github.com/nlohmann/json
Is there a way to iterate over the values of a specific column using a column name (string)? I've searched the documentation but didn't find a clear answer or an example to my question.
Would you like to remove a statement like the following because a null pointer should not be used for deletion?
if (!in) { // Nullptr --> Die
- delete in;
break;
}
I think "CSVField::get()" is almost const member function.
Please consider to make it const.
I want to write following code:
for (const auto& row : rows) {
for (const auto& field: row) {
field.get<std::string_view>();
}
}
How do you think about to adjust software build parameters in the CMake script in a safer way for multi-threading support?
I often encounter CSV files that use comments (lines that start with a specific character, usually ; in my experience) to encode metadata about the file in the first few lines.
Reading these files causes a crash, I suspect because it looks like a single column CSV file at first and then the data starts with many columns.
example:
;Instrument ABCDEF
;Collected 14JUN2018 field site Alpha
time,temperature,humidity,pressure
12312351,23.3,120,234
12312352,24.0,122,233
...
...
I know comments are not mentioned in RFC4180, but even if this library does not handle them it should ignore/throw gracefully.
Currently, the CSV parser stores every column. However, many use cases only require certain columns to be parsed, and there may be optimizations that can be performed.
For example, if a user only wants columns A, B, C out of A, B, … X, Y, Z then we can speed up the parsing process by skipping to the new newline once we parse C.
We want to implement this enhancement without sacrificing performance for more general use cases, and without complicating this library's public API. If additional classes are necessary, then CSVReader
should be refactored as a wrapper around these helper classes.
Master branch or single-header branch?
I prefer the single-header branch, but is the branch stable, and what's the difference between master with single-header?
Would you like to wrap any pointer variables with the class template “std::unique_ptr”?
Steps to reproduce:
change main cmake file to include
if(MSYS OR MINGW)
add_definitions(-DUNICODE -D_UNICODE)
endif()
change tests/cmakelist.txt to include:
if(MSYS OR MINGW)
target_compile_options(
csv_test PRIVATE -municode )
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -municode")
endif()
change single_include_test/my_header.hpp
Add #include <windows.h> before #include "csv.hpp"
run "cmake ../ -G "MSYS Makefiles"" from build directory (already created)
then make.
Without windows header before csv header all compiles success.
gcc --version
gcc.exe (Rev2, Built by MSYS2 project) 9.2.0
I tried your example code for some file statistics here csv::CSVStat Class Reference. Here is a minimal example:
csv::CSVStat stats(csvFilePath);
auto colDataTypes = stats.get_dtypes();
auto colNames = stats.get_col_names();
// Doesn't work for colDataTypes but works for other defined statistics like max, min and so `on...`
for (int64_t it = 0; it < colNames.size(); it++){
std::cout << colDataTypes[it] << std::endl;
}
// Doesn't work either
for (auto &type : colDataTypes){
std::cout << type << std::endl;
}
How can I get the colDataTypes
of each column printed out? If I understood it correctly that is what the get_dtypes()
function suppose to do.
For example, the csv library currently stores large integers as long double
if they exceed the limits of 64-bit integers. However, this can lead to loss of precision for very large integers. Furthermore, very large floating point values, or floating point values with many significant digits will not be stored without losing information by this method.
Investigate different methods of storing large numbers and pick one to implement.
The goal of this task is to at least provide support for parsing arbitrarily large numbers without losing information (and without affecting performance). Implementing arithmetic between big numbers is entirely optional, and does not need to be fast. End users who desire performance should combine this library with a dedicated bignum library such as GMP (https://gmplib.org/).
The library fails to correctly parse the header row of the simple CSV file contained in this archive: test_data.zip
The resulting column-names (from get_col_names()) are:
Note how the first column-name includes the comment lines that precede the header line.
Currently CSVReader
moves parsed rows into std::deque<CSVRow> record_buffer
. When the user uses a CSVReader iterator or calls read_row
, records are pulled from this deque.
Perhaps the implementer can implement a subclass of CSVReader
(call it CSVProcessor
?) that has two deques, one for rows that haven't been processed by the whitespace stripper and those that have. This new design should maximize code reuse so that we don't have to reimplement iterators for CSVProcessor.
If we decide to implement CSVProcessor
, we should consider allowing users to add their own custom processing logic.
Hi,
Firstly i want to thank you for developing this library. Its usage is really elegant and easy.
Unfortunately, i faced with an issue which csv-parser parses the csv file wrongly. It parses fields wrongly. Am i missing something or is it a bug ? ( I generated .csv file from GNU Octave, i think its format is correct)
Scenario to reproduce the issue:
I have a csv file which contains 16384 columns and two rows. First row is an header, second row contains floating point values. .csv file is delimited by ','. (.csv file attaced as .zip)
#include <csv.hpp>
int main(int argc, char *argv[])
{
using namespace csv;
CSVReader reader("time-result.csv");
for (CSVRow& row: reader) { // Input iterator
auto i = 0;
for (CSVField& field: row) {
if ( i == 7003 ) // 7003th is one of the wrongly parsed fields, there are more
std::cout << field.get<>() << std::endl;
++i;
}
}
}
Output :
e-069.99
But it should be a valid floating point numerical value string.
Library Version : 1.3.0
Compiler : gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
#include "csv.hpp"
int main() {
csv::CSVFormat format;
format.delimiter('|').column_names({"A", "B", "C"});
csv::CSVReader csv("foo.dat", format);
for (auto& row : csv) {
std::cout << row.to_json() << std::endl;
}
return 0;
}
with foo.dat
:
1|2|3|
2|3|4|
causes segmentation fault
.
Changing to column_names({"A", "B", "C", "dummy"});
will do the job.
Is there any elegant way to catch such error than a segment fault? Also, Is there any option to handle such "dummy" column?
It appears the quote character defaults to '"'
which works in many cases but I have ran into situations
where the file has no quote character specified. In such cases when '"'
is encountered the parser produces incorrect results. It can even lead to program crash.
This is an Makefile enhancement:
Add targets: install
, libcsv
.
Add variables: STD
, PREFIX
.
Target libcsv
builds a static (libcsv.a
) and shared (libcsv.so
) libraries.
Target install
install libraries into $(PREFIX)/lib
directory, single header into $(PREFIX)/single_include
directory and all other headers into $(PREFIX)/include
directory.
Variable STD
is the argument for -std g++ option, by default is c++11
.
Variable PREFIX
gives the base installation directory, by default is /usr
.
The following fails to parse correctly and causes the unit test to fail.
std::string s("0.15");
long double out;
REQUIRE(data_type(s, &out) == CSV_DOUBLE);
REQUIRE(is_equal(out, 0.15));
The parsed value is actually 1.5 instead of 0.15.
large file with 2000 x ~20char wide double
columns, generated like this:
const int cols_n = 2000;
std::string big_filename = "MedDataset.txt";
std::ofstream ofstream(big_filename);
if (!ofstream.is_open()) {
std::cerr << "failed to open " << big_filename << '\n';
exit(1);
}
std::random_device rd;
std::mt19937 gen{rd()};
std::uniform_real_distribution<double> dist{0, 1};
ofstream << std::setprecision(16);
for (int r = 0; r < 1000; r++) {
for (int c = 0; c < cols_n; c++) {
double num = dist(gen);
ofstream << num;
if (c != cols_n -1) ofstream << ',';
}
ofstream << "\n";
}
ofstream.close();
parsing like this:
CSVReader reader("MedDataset.txt");
{
std::vector<double> r;
for (CSVRow& row: reader) { // Input iterator
for (CSVField& field: row) {
r.push_back(field.get<double>());
}
// use vector...
r.clear();
}
}
getting this error during parsing..
terminate called after throwing an instance of 'std::runtime_error'
what(): Not a number.
If I reduce the cols_n = 2000
to 1800 it runs just fine.
I have visually inspected the file and not seeing any weird characters. All programmatically produced.
It feels like there some sort of "buffer overflow" due to the very large row --- roughly 32kb....?? 100% percent reproducible for me eventhough the values of the fields are random.
clang++ -O2 -std=c++17 ... -lpthread
clang++ --version
clang version 8.0.0-3 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
https://help.github.com/en/articles/organizing-information-with-tables
MarkdownWriter
class with the following:
operator<<
that accepts CSVRow
as an inputoperator<<
that accepts any random-access iterator (over strings?) as inputset_column_names(std::vector<std::string>)
methodHow read the first line with a file without header ?
Is it possible to catch errors while reading the file?
Thank's in advance
Sylm,
Expected, when a field contains only whitespace and "trim" is set on format, that it will behave like an empty field, i.e. "1,,2".
However in this case parsing of the row seems incorrect. See attached test case for an example of failure.
TEST_CASE("Test trim empty field") {
CSVFormat format;
format.column_names({ "A", "B", "C" })
.trim({' '});
std::stringstream csv_string;
csv_string << "1, two,3" << std::endl
<< "4, ,5" << std::endl
<< "6,7,8 " << std::endl;
auto rows = parse(csv_string.str(), format);
CSVRow row;
rows.read_row(row);
// First Row
REQUIRE(row[0].get<uint32_t>() == 1);
REQUIRE(row[1].get<std::string>() == "two");
REQUIRE(row[2].get<uint32_t>() == 3);
// Second Row
rows.read_row(row);
REQUIRE(row[0].get<uint32_t>() == 4);
REQUIRE(row[1].is_null());
REQUIRE(row[2].get<uint32_t>() == 5);
// Third Row
rows.read_row(row);
REQUIRE(row[0].get<uint32_t>() == 6);
REQUIRE(row[1].get<uint32_t>() == 7);
REQUIRE(row[2].get<uint32_t>() == 8);
}
The actual method to be used is
`
CSVFormat csvFileFormat;
csvFileFormat.column_names(columnNames);
`
Error message:
internal\csv_row.hpp(204): error C2440: '' : cannot convert from 'csv::string_view' to 'std::string' No constructor could take the source type, or constructor overload resolution was ambiguous
I have a file with 16k columns and the API can not parse it.
Column names are nor parsed correctly neither the lines
Can I send a example file?
I'm coming from version 1.1.2 and want to use 1.3.0. When I run cmake to build my project I get this errors.
/CLionProjects/student_research/dev/hmmenc_client/main.cpp: In function ‘int main(int, char**)’:
/CLionProjects/student_research/dev/hmmenc_client/main.cpp:536:136: error: ‘type_name’ is not a member of ‘csv::internals’; did you mean ‘type_num’?
536 | cout << colNames[i] << " has: " << item.second << " entries of type: " << csv::internals::type_name(item.first) << endl;
| ^~~~~~~~~
| type_num
/CLionProjects/student_research/dev/hmmenc_client/main.cpp:820:80: error: ‘class csv::CSVStat’ has no member named ‘correct_rows’
820 | auto idManualLim = (uint64_t)stats.correct_rows; // All Rows
| ^~~~~~~~~~~~
/CLionProjects/student_research/dev/hmmenc_client/main.cpp:821:79: error: ‘class csv::CSVStat’ has no member named ‘correct_rows’
821 | if (idManualLim > (uint64_t)stats.correct_rows)
| ^~~~~~~~~~~~
/CLionProjects/student_research/dev/hmmenc_client/main.cpp:823:79: error: ‘class csv::CSVStat’ has no member named ‘correct_rows’
823 | idManualLim = (uint64_t)stats.correct_rows;
| ^~~~~~~~~~~~
/CLionProjects/student_research/dev/hmmenc_client/main.cpp:848:82: error: ‘type_name’ is not a member of ‘csv::internals’; did you mean ‘type_num’?
848 | columnNameType = csv::internals::type_name(item.first);
| ^~~~~~~~~
| type_num
I guess that stats.correct_rows
form csv::CSVStat stats()
was changed to stats.num_rows
, correct?
Next thing I use is:
// Get the type of the values in the column
auto columnNameIndex = (uint64_t)readerInfo.index_of(columnName);
string columnNameType;
for (auto item : colDataTypes[columnNameIndex])
{
columnNameType = csv::internals::type_name(item.first);
cout << columnName << " has " << item.second << " elements of type: " << columnNameType << endl;
}
Could you please tell me what should I use instead of type_name
in csv::internals::type_name(item.first);
to get the same effect? The v1.3.0 doesn't have that named member in csv::internals
.
Currently, CSVRow
objects store their data in a contiguous string. However, a separate vector of index positions (size_t
) is also maintained so we know where every individual field starts.
Creating this vector is responsible for the majority of calls to new
and is a significant source of CPU overhead. Whoever is responsible for this task should either
on Windows
Visual Studio 2017
Library version 1.3.0
Trying to parse this file https://www.kaggle.com/austinreese/craigslist-carstrucks-data
Assertion failed!
Program: ...x64\Debug\Fileparse.exe
.. \csv.hpp
Line: 883
Expression: pos < size()
A uint16_t should handle values from 0 to 65535, however when a string containing the value 65535 is parsed using CSVRow.get<uint16_t>(), it reports a C++ overflow error exception.
Add the ability to use unsigned integer types with CSVField::get<>()
. Currently, attempting to do so will fail a static_assert
.
Windows 10, Visual Studio 2017
My data
EMPLOYEEKEY FIRSTNAME HIREDATE LASTNAME TITLE
2 Kevin 2006-08-26 Brown Marketing Assistant
3 Roberto 2007-06-11 Tamburello Engineering Manager
4 Rob 2007-07-05 Walters Senior Tool Designer
5 Rob 2007-07-05 Walters Senior Tool Designer
6 Thierry 2007-07-11 D'Hers Tool Designer
7 David 2007-07-20 Bradley Marketing Manager
8 David 2007-07-20 Bradley Marketing Manager
9 JoLynn 2007-07-26 Dobney Production Supervisor - WC60
10 Ruth 2007-08-06 Ellerbrock Production Technician - WC10
My Code
void f0()
{
csv::Reader foo;
foo.configure_dialect("my_dialect")
.delimiter("\t")
.quote_character('"')
.double_quote(true)
.skip_initial_space(false)
.trim_characters(' ', '\t')
// .ignore_columns("foo", "bar")
.header(true)
.skip_empty_rows(true);
foo.read("sample.csv");
auto rows = foo.rows();
for (auto& row : rows)
{
auto key = row["EMPLOYEEKEY"];
auto fname = row["FIRSTNAME"];
auto hdate = row["HIREDATE"];
auto lname = row["LASTNAME"];
auto title = row["TITLE"];
std::cout << key << " " << fname << " " << hdate << " " << lname << " " << title << "\n";
}
```}
**Outpu**t
Kevin 2006-08-26 Brown Marketing Assistant
Roberto 2007-06-11 Tamburello Engineering Manager
Rob 2007-07-05 Walters Senior Tool Designer
Rob 2007-07-05 Walters Senior Tool Designer
Thierry 2007-07-11 D'Hers Tool Designer
David 2007-07-20 Bradley Marketing Manager
David 2007-07-20 Bradley Marketing Manager
JoLynn 2007-07-26 Dobney Production Supervisor - WC60
Ruth 2007-08-06 Ellerbrock Production Technician - WC10
Hello. It looks like if a row starts with an emtpy field that field and all the subsequent empty fields get initialized by the first non-empty field in the row.
E. g. parsing
category,subcategory,project name
,,foo-project
bar-category,,bar-project
gives
row 0
0 foo-project
1 foo-project
2 foo-project
row 1
0 bar-category
1
2 bar-project
Example code used is
std::string csvString(R"(category,subcategory,project name
,,foo-project
bar-category,,bar-project
)");
auto format = csv::CSVFormat();
csv::CSVReader reader(format);
reader.feed(csvString);
reader.end_feed();
auto rowNum = 0;
for (auto row: reader) {
qDebug() << "row" << rowNum;
auto colNum = 0;
for (auto col: row) {fi
qDebug() << colNum << col.get<>().c_str();
colNum += 1;
}
rowNum += 1;
}
return 0;
The CSVFormat object is below
CSVFormat csvFileFormat;
csvFileFormat.column_names("name,age");
csvFileFormat.strict_parsing();
The data in csv file is like
A,10
B,5000
C,30
D,100
E,100,5000
Note that the fifth row has three columns.
I am creating a csv reader object but it is not catching the std::runtime_error during the csv reader object creation
try {
CSVReader reader(filePath,csvFileFormat);
}catch(std::runtime_error& e){
std::cout << "Error" << std::endl;
}
My main motive is to get all the malformed records present inside the csv file so that I can write those malformed records in a separate file.
With -Werror
, I get the following compiler error:
csv-parser/src/csv_reader.cpp: In member function ‘void csv::CSVGuesser::second_guess()’:
csv-parser/src/csv_reader.cpp:82:14: error: variable ‘current_delim’ set but not used [-Werror=unused-but-set-variable]
char current_delim;
^
I observed that the col_names vector is overwritten when I pass in multiple delimiters.
This is because the CSVReader::CSVReader(csv::string_view filename, CSVFormat format)
constructor overrides the format object that's passed in.
clang++ --version
clang version 8.0.0-3 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
gives these with latest master.
clang++ -O2 -Wall -Wshadow -std=c++17 -o build/corr corr.cpp -lpthread
In file included from corr.cpp:3:
/home/oliver/c/leet/include/csv.hpp:4151:9: warning: explicitly defaulted move constructor is implicitly deleted [-Wdefaulted-function-deleted]
CSVReader(CSVReader&&) = default; // Move constructor
^
/home/oliver/c/leet/include/csv.hpp:4302:20: note: move constructor of 'CSVReader' is implicitly deleted because field 'feed_lock' has a deleted move constructor
std::mutex feed_lock; /**< Allow only one worker to write */
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_mutex.h:97:5: note: 'mutex' has been explicitly marked deleted here
mutex(const mutex&) = delete;
^
In file included from corr.cpp:3:
/home/oliver/c/leet/include/csv.hpp:4153:20: warning: explicitly defaulted move assignment operator is implicitly deleted [-Wdefaulted-function-deleted]
CSVReader& operator=(CSVReader&& other) = default;
^
/home/oliver/c/leet/include/csv.hpp:4302:20: note: move assignment operator of 'CSVReader' is implicitly deleted because field 'feed_lock' has a deleted move assignment operator
std::mutex feed_lock; /**< Allow only one worker to write */
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_mutex.h:98:12: note: 'operator=' has been explicitly marked deleted here
mutex& operator=(const mutex&) = delete;
-Wno-defaulted-function-deleted
suppresses them obviously.
This file crashes the parser. The file is parsed correctly in Excel and OpenOffice/LibreOffice
My code:
void f0()
{
csv::CSVFormat format;
format.delimiter(',').quote('"').header_row(0);
csv::CSVReader reader("problem.txt",format);
auto column_names = reader.get_col_names();
std::cout << column_names.size() << std::endl;
for (auto& cv : column_names)
{
std::cout << cv << "\t";
}
std::cout << "\n";
for (csv::CSVRow& row : reader)
{
std::cout << row.size() << std::endl;
for (auto& rv : row)
{
std::cout << rv << "\t";
}
std::cout << "\n";
}
}
The data:
ACCOUNT_TYPE,ACCOUNT_NUMBER,TRANSACTION_DATE,CHEQUE_NUMBER,DESCRIPTION1,DESCRIPTION2,CAD,USD
Chequing,07451-1007186,1/2/1987,,"Bill Payment","Purchase Order",-4.00,,
Saving,07451-1007186,1/29/1987,,"Account Payable Pmt","Mac 6000 INCO",210424.25,,
Chequing,07451-1007186,2/1/1987,,"Misc Payment","Purchase Order",-200.00,,
Chequing,07451-1007186,2/5/1987,,"Membership fees","VAT-Y 4007633",-917.33,,
Chequing,07451-1007186,2/5/1987,,"Membership fees","TXINS 4007659",-950.69,,
Saving,07451-1007186,2/26/1987,,"Account Payable Pmt","Mac 6000 INCO",79034.35,,
Chequing,07451-1007186,2/28/1987,,"Membership fees","VAT-Y 7453902",-7905.02,,
Chequing,07451-1007186,2/28/1987,,"Membership fees","TXINS 7454013",-823.93,,
Chequing,07451-1007186,3/1/1987,,"Bill Payment","Purchase Order",-8.00,,
Saving,07451-1007186,3/4/1987,,"Online transfer sent - 1872","Great Outdoors",-17000.00,,
Hi, I found returning the number of columns in CSVStat::get_ mins of include/internal/csv_stat.cpp’(commit 6323ff8) crashes with the attached .csv file (test.csv). I think this may be related to the line #41-42 of include/internal/csv_stat.cpp. The crash was observed on Ubuntu 18.04.3 with kernel 4.15.0-72-generic and x86_64.
The crash can be reproduced by the following command:
$./csv_stat test.csv
Here’s the the crash stack trace taken with GDB:
#0 0x00005555555da731 in std::__1::allocator::construct<long double, long double const&> (this=,
__p=0x555555a2e520, __args=) at /home/cockatiel01/LLVM/bin/../include/c++/v1/memory:1811
#1 std::__1::allocator_traits<std::__1::allocator >::__construct<long double, long double const&> (__a=...,
__p=0x555555a2e520, __args=) at /home/cockatiel01/LLVM/bin/../include/c++/v1/memory:1716
#2 std::__1::allocator_traits<std::__1::allocator >::construct<long double, long double const&> (__a=...,
__p=0x555555a2e520, __args=) at /home/cockatiel01/LLVM/bin/../include/c++/v1/memory:1562
#3 std::__1::vector<long double, std::__1::allocator >::__push_back_slow_path<long double const&> (
this=0x7fffffffdbc0, __x=) at /home/cockatiel01/LLVM/bin/../include/c++/v1/vector:1613
#4 0x00005555555c5821 in std::__1::vector<long double, std::__1::allocator >::push_back (this=,
__x=) at /home/cockatiel01/LLVM/bin/../include/c++/v1/vector:1632
#5 csv::CSVStat::get_mins (this=0x7fffffffdc50)
at /home/jihyunee/ang-csv-parser/csv-parser-fast/include/internal/csv_stat.cpp:52
#6 0x0000555555570604 in main (argc=, argv=)
at /home/jihyunee/ang-csv-parser/csv-parser-fast/programs/csv_stats.cpp:15
This crash was found with Angora fuzzer, and test.csv is originated from ints_join.csv
in tests/data/fake_data directory.
Hope this help.
test.csv.zip
version 1.3.1
on Windows 10
enum class VariableColumnPolicy {
THROW = -1,
IGNORE = 0,
KEEP = 1
};
The constant IGNORE clashes with
#define IGNORE 0 // Ignore signal
In WinBase.h
Currently, CSVReader
rejects all rows not the same size as the predetermined header row. This causes issues when parsing CSV files which are not quite up to spec.
Although it is possible to handle weird rows by creating a subclass of CSVReader
and overriding CSVReader::bad_row_handler
, that's kind of annoying.
CSVFormat
will get a new method called allow_variable_lengths(false)
. CSVReader
will then simply not perform row length checking until read_row()
is called. This may even lead to performance improvements as the nested if/else branches in CSVReader::write_record
will no longer be necessary.
For the default case (reject different length rows), CSVReader
will behave as it has before, i.e. bad rows are tossed out and ignored with no user intervention.
If a user wants to keep rows of different length but still use CSVReader
's format guessing ability, then when iterating over the read rows, then the library will provide a size()
method (and potentially others such as is_weird_length()
, is_shorter()
, etc. so that the user can tell which rows are malformed.
If "foobar" is the name of the 16th column, and some malformed row has <16 columns, then row["foobar"]
shall result in an error being thrown.
If a CSV mostly has 16 columns but some row has >16 columns, then the extra columns should only be retrieved using operator[](size_t)
and not operator[](string)
. The CSVRow
iterator should iterate through all entries of shorter and longer rows without crashing.
cmake complains about Doxygen being missing but goes on without it OK. During the make, I get this:
[ 45%] Building CXX object programs/CMakeFiles/csv_generator.dir/csv_generator.cpp.o
/home/chris/Apps/CsvParser/programs/csv_generator.cpp:2:10: fatal error: charconv: No such file or directory
#include <charconv>
^~~~~~~~~~
compilation terminated.
programs/CMakeFiles/csv_generator.dir/build.make:62: recipe for target 'programs/CMakeFiles/csv_generator.dir/csv_generator.cpp.o' failed
This is with clang++ 6.0.0-1ubuntu2 . That little ^ pointer is actually under "<" in the header reference.
BTW, what package includes that header?
Thanks
madGambol
First of all, thanks for doing this library.
When compiling files in single_include_test directory, the following compilation errors occurred:
$ g++ -pthread --std=c++14 -o file1 file1.cpp
In file included from my_header.hpp:2:0,
from file1.cpp:1:
csv.hpp:3975:28: error: enclosing class of constexpr non-static member function ‘bool csv::CSVRow::iterator::operator==(const csv::CSVRow::iterator&) const’ is not a literal type
constexpr bool operator==(const iterator& other) const {
^~~~~~~~
csv.hpp:3945:15: note: ‘csv::CSVRow::iterator’ is not literal because:
class iterator {
^~~~~~~~
csv.hpp:3945:15: note: ‘csv::CSVRow::iterator’ has a non-trivial destructor
csv.hpp:3979:28: error: enclosing class of constexpr non-static member function ‘bool csv::CSVRow::iterator::operator!=(const csv::CSVRow::iterator&) const’ is not a literal type
constexpr bool operator!=(const iterator& other) const { return !operator==(other); }
^~~~~~~~
To fix the errors above I've just modified the following lines:
(the lines commented are the original ones)
csv.hpp
class iterator {
......
/** Two iterators are equal if they point to the same field */
//| constexpr bool operator==(const iterator& other) const {
inline bool operator==(const iterator& other) const {
return this->i == other.i;
};
//| constexpr bool operator!=(const iterator& other) const { return !operator==(other); }
inline bool operator!=(const iterator& other) const { return !operator==(other); }
file1.hpp
//|int foobar(int argc, char** argv) {
int main(int argc, char** argv) {
using namespace csv;
The file2.hpp is ok.
Add functions that take a filename as input, parses the data types of every column, and generates a CREATE TABLE
command.
Suggested databases to support:
CSVStats
can be used to determine the proper data types.
This means that let's say that total number of header columns are 5
From the above scenario we can say that the record second and third are invalid as the number of column is more than the number of header column in second record
and the number of column in third record is less than the number of header column
### I want to get these error records so that I can dump then in some new file.
Please Help
Something is wrong with the master branch. I am unable to build (using cmake) in a clean environment.
docker run -it ubuntu bash
> apt-get update -y && apt-get upgrade -y
> apt-get install -y build-essential doxygen git cmake make python3-dev python3
> git clone https://github.com/vincentlaucsb/csv-parser.git
> cd cdv-parser
> mkdir build
> cd build
> cmake -DCSV_CXX_STANDARD=11 ../
> make
This is the initial error message that appears:
/csv-parser/programs/data_type_bench.cpp:1:10: fatal error: charconv: No such file or directory
#include <charconv>
^~~~~~~~~~
compilation terminated.
programs/CMakeFiles/data_type_bench.dir/build.make:62: recipe for target 'programs/CMakeFiles/data_type_bench.dir/data_type_bench.cpp.o' failed
make[2]: *** [programs/CMakeFiles/data_type_bench.dir/data_type_bench.cpp.o] Error 1
CMakeFiles/Makefile2:184: recipe for target 'programs/CMakeFiles/data_type_bench.dir/all' failed
make[1]: *** [programs/CMakeFiles/data_type_bench.dir/all] Error 2
Makefile:94: recipe for target 'all' failed
make: *** [all] Error 2
I see the same error when trying to build for C++17.
I am wondering that will csv-parser support c++11 some time later?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.