d99kris / rapidcsv Goto Github PK

View Code? Open in Web Editor NEW

799.0 22.0 170.0 15.56 MB

C++ CSV parser library

License: BSD 3-Clause "New" or "Revised" License

CMake 3.05% C++ 94.62% Shell 2.03% Batchfile 0.30%

csv-parser c-plus-plus library linux macos windows c-plus-plus-11 utf8 utf16

rapidcsv's People

Contributors

Stargazers

Watchers

Forkers

smac89 snowheat yanboyang713 jwdeitch axeller8 werkamsus skyformat99 ruleless mjj29 xiaomaofeng headupinclouds louie17 mensong blreay damandoh watmough jsjolund gaoshuaixiong spitfire-audio akeyliu giaminhhoang frederikschaff owenmx alenstarx jaejaking condector wingunder tschoepping srisadhan seedorf161 daixi001 geotyper josuetleoro h2magic-axious radioflash dotkt phuctd95 vladiro koobin albertocruzluis wangyuaqi iit-danieli-joint-lab bryanhaley 0x0c mazispider ooxi sven-molkenstruck xueshuangpro adrienhahahah programming-tools auscanaoy jaredtherriault gemakada kumarvis xinsongyan jimhar8 cascay iainfullelove lewy12121212 muhamdasim hsdk123 moubjeje guzmalalo linpan lufengwei2010 sramakrishnan2324 leolovemary zeta1999 heisenbuug kelas myboyhood ericprimelles grafail veridisq mahdi-massahi yougoup leo-drive markusbuchholz sibaba8888 mark-yeatman sssomeone killianrutherford bontey xyh-cosmo externalrepositories hiroki-chen joechuang01 canxue1435 kiyoshika mmrwizard twillis209 fura95 morgalus bonsaigardener rnichollx capybasilisk jonmest cainiaoc4 salmagro unist-kmdo

rapidcsv's Issues

msvc compiler warnings

warning GB70A83FB: use of old-style cast [-Wold-style-cast]
        if (rowIdx >= (int) mData.size())

I'm noticing a lot of these warnings appear when building with msvc.

GetRow index

Does index 0 start from the header, or does it start with the first row after the header?

MSVC error upon compiling simple main.cpp

Hello! I'm trying to use your library to parse intraday and daily stock data and I get the following errors. (I removed some of my file paths)

[main] Building folder: avapi avapi_test
[build] Starting build
[proc] Executing command: ".../cmake.EXE" --build .../avapi/build --config Debug --target avapi_test -- /maxcpucount:14
[build] Microsoft (R) Build Engine version 16.8.3+39993bd9d for .NET Framework
[build] Copyright (C) Microsoft Corporation. All rights reserved.
[build] 
[build]   main.cpp
[build] .../rapidcsv.h(1199,47): error C2589: '(': illegal token on right side of '::'
[build] .../rapidcsv.h(1199,1): error C2062: type 'unknown-type' unexpected
[build] .../rapidcsv.h(1199,1): error C2059: syntax error: ')'
[build] Build finished with exit code 1
[main] Failed to prepare executable target with name 'undefined'

Here is the main.cpp I tried to compile for parsing the intraday data.

#include "../inc/avapi.h"
#include "../inc/rapidcsv.h"
#include <vector>

int main()
{
    rapidcsv::Document doc("../../data/intraday_data.csv");
    
    std::vector<float> col = doc.GetColumn<float>("open");
    
    for (auto &i : col) {
    std::cout << i << '\n';
    }
    std::cout << std::endl;
    return 0;
}

Here is the main.cpp I tried to compile for parsing the daily data.

#include "../inc/avapi.h"
#include "../inc/rapidcsv.h"
#include <vector>

int main()
{
    rapidcsv::Document doc("../../data/daily_data.csv",
                           rapidcsv::LabelParams(0, 0));

    std::vector<float> close = doc.GetRow<float>("2021-02-12");
    std::cout << "Read " << close.size() << " values." << std::endl;
    return 0;
}

And here is the CMakeLists.txt

cmake_minimum_required(VERSION 3.0.0)

project(avapi_test LANGUAGES CXX)


set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(CURL CONFIG REQUIRED)

set(PROJECT_SOURCES
        src/main.cpp
        src/avapi.cpp
        inc/avapi.h
        inc/rapidcsv.h
)

add_executable(avapi_test ${PROJECT_SOURCES})
target_link_libraries(avapi_test PRIVATE CURL::libcurl)

I just realized that clang-format ran on your header file when I saved it. Would that mess anything up in this case?

Improvement on CSV content parse.

I reviewed the source code of RapidCSV. and have some suggestion for code in line 952-988.
if (buffer[i] == '"') { if (cell.empty() || quoted ) { quoted = !quoted; } else { //Throw Exception for un-paired content. } cell += buffer[i]; } else { if ( quoted ) { cell += buffer[i]; } else if (buffer[i] == mSeparatorParams.mSeparator) { row.push_back(cell); cell.clear(); } else if (buffer[i] == '\r') { ++cr; } else if (buffer[i] == '\n') { ++lf; row.push_back(cell); cell.clear(); mData.push_back(row); row.clear(); quoted = false; // disallow line breaks in quoted string, by auto-unquote at linebreak } }

Unicode support

I checked file with encoding UTF-16 LE. Parser skips 1st column name here.

[Feature] allow std::vector<char> column load

From the code, it is possible to see that the focus is for CSV that contains numbers (int long float double), but in my case I have a lot of data in the form of columns of just one char (time-series CSV discretized using SAX), so my propose is to include code to convert from char and obtain std::vector

Add option to ignore comment lines

I often have csvs with a header, like this:

# some description of the data
# some more blablabla
# N, M, K
123,456,2
432,253,8

NumPy and Pandas have a parameter to ignore comment lines, by specifying a comment character, in this case it would be #.

It would be really great to have this feature in rapidcsv.

reading a csv stream from aws s3

Hi there, I am trying to read a csv file on s3 using rapidcsv. To this end, I have a workng snippet, which is simply this:
https://docs.aws.amazon.com/code-samples/latest/catalog/cpp-s3-get_object.cpp.html
and I can read the stream.

I would now like to use rapidcsv to get them into vectors and I have tried doing this:

...
    if (get_object_outcome.IsSuccess())
    {
        auto& retrieved_file = get_object_outcome.GetResultWithOwnership().
            GetBody();

      std::stringstream sstream(retrieved_file);
      rapidcsv::Document doc(sstream, rapidcsv::LabelParams(-1, -1));
...

but obviously this does not seem right :( Any clues on what I should be doing here wold be great.

Thank you.

Getter most precise

I wish know names of cols and rows rapid...

now I have implemented in my Konectik tool (ANN algorithm), this exemple of a part in my code:

CSVLoader::CSVLoader(string filepath, string separator){
    std::ifstream in(filepath);
    if (in.is_open()){
        string line;
        getline(in, line);
        in.close();
        vector<string> headercolnames;
        size_t position = line.find(separator);
        while (position != string::npos){
            headercolnames.push_back(line.substr(0,position));//here I collect names of cols header...
            line = line.substr(position+1, line.size());
            position = line.find(separator);
        }
        rapidcsv::Document doc(filepath, rapidcsv::LabelParams(0, -1));
        int long npos = 0;
        for (auto it=headercolnames.begin(); it!=headercolnames.end(); ++it){
            vector<float> col = doc.GetColumn<float>((*it));
            //XDATA[lpos][npos] = val;
            int long lpos = 0;
            for (auto itc=col.begin(); itc!=col.end();  ++itc){
                if ((*it).find("y")==0){
                    YDATA[lpos][npos] = (*itc);
                }
                else {
                    XDATA[lpos][npos] = (*itc);
                }
                lpos++;
            }
            npos++;
        }
    }
    else{
        cerr << filepath << " CANNOT BE OPEN !" << endl;
        exit(1);
    }
}

If I don't understand and no see, tell me.

Why debugging says ‘stoi’ is not a member of ‘std’?

This is my c_cpp_properties.json:

{
    "configurations": [
        {
            "name": "Linux",
            "includePath": [
                "${workspaceFolder}/**"
            ],
            "defines": [],
            "compilerPath": "/usr/bin/gcc",
            "cStandard": "c11",
            "cppStandard": "c++17",
            "intelliSenseMode": "gcc-x64",
            "compileCommands": "${workspaceFolder}/build/compile_commands.json"
        }
    ],
    "version": 4
}

However, it shows in the rapidcsv.h says that:
‘stoi’ is not a member of ‘std’ .. about 78 errors;

#include <codecvt>
#include <BaseTsd.h>
typedef SSIZE_T ssize_t;

don't work. Thanks.

single threaded check

Hi, I just wanted to make sure that this code works on a single thread, without spinning off multiple threads. Would someone be able to confirm?

GetNextRow

Hi, I'm looking at the GetRow function and realising that it does a linear search through all rows to get the specific row. This can be non-optimal sometimes and would like to request a GetNextRow() feature that would just increment from the previous row.

Column not found exception. I don't know why

This CSV file

"VarName";"TimeString";"VarValue";"Validity";"Time_ms"
"TT_TK-001";"31/07/2020 11:34:43";0;0;44043482447,9051
"TT_TK-001";"31/07/2020 11:35:14";0;0;44043482796,0532
"TT_TK-001";"31/07/2020 11:44:43";48,6292;1;44043489384,1204
"TT_TK-001";"31/07/2020 11:45:13";48,19878;1;44043489730,9259
"$RT_OFF$";"31/07/2020 11:45:26";0;2;44043489885,5324
"TT_TK-001";"31/07/2020 11:52:34";25,58232;1;44043494838,6111
"TT_TK-001";"31/07/2020 11:53:04";25,54253;1;44043495185,4977
"$RT_OFF$";"31/07/2020 11:53:12";0;2;44043495274,5602
"$RT_COUNT$";9;;;;

and this code

'
rapidcsv::Document doc(path_2_csv.generic_string().c_str(),
rapidcsv::LabelParams(0, 0),
rapidcsv::SeparatorParams(';'));

auto varname_vector = doc.GetColumn("VarName");
'

throws that excepcion in rapidcsv.h line 512

template<typename T>
std::vector<T> GetColumn(const std::string& pColumnName) const
{
    const ssize_t columnIdx = GetColumnIdx(pColumnName);
    if (columnIdx < 0)
    {
        throw std::out_of_range("column not found: " + pColumnName);
    }
    return GetColumn<T>(columnIdx);
}

Do you know why?. Is it because of the last line?

Thanks

Confusing exception when a cell contains a newline

A CSV file gave threw an exception ("invalid vector subscript") when I called:

document.GetColumn<std::string>(someIndex);

This exception was confusing to me. someIndex was less than the result returned by document.GetColumnCount(), so I didn't understand what the problem was and had to debug the code to figure it out.

It turns out that the CSV file has a newline \n character in the middle of a quoted cell. So, if I set pQuotedLinebreaks to true in my SeparatorParameters it fixes the problem.

But, this was really non-obvious to me, and it seems strange that rapidcsv doesn't do any validation when parsing to catch that a row has the wrong number of cells and then assumes in GetColumn() that the number will be correct. The way the behavior currently works makes it seem like there is a problem with GetColumn(), when really the problem is with the source data.

I would suggest, in ParseCsv(std::istream& pStream, std::streamsize p_FileLength), some kind of check whenever mData.push_back(row) is about to be called to verify that row.size() == GetColumnCount() (or similar), and if it doesn't then an exception could be thrown. That would help identify what the problem really is (whether it's the result of a newline or just bad data) rather than having parsing apparently succeed but then unexpected errors happen when the results are used.

Insert Rows and Columns - A Thank you!!!

Assuming you are interested in adding them, here is code for inserting rows and columns. Guess you will want to tweak things a bit as well as making sure I didn't mess up something. Tried to keep things in line with your other code. Check on the cCount thing I did in the row insert. Probably a better way since you will know what you are doing.

Thanks again.

jim

    /**
     * @brief   Insert row by index.  Inserts at postion so new row will preceed specified row.
     * @param   pRowIdx               zero-based row index.
     */
    void InsertRow(const size_t pRowIdx, const string defValue = "", const int cCount = 0)
    {
        const ssize_t rowIdx = pRowIdx + (mLabelParams.mColumnNameIdx + 1);
        vector<string> insVector(GetColumnCount()+cCount, defValue);
        mData.insert(mData.begin() + rowIdx, insVector);
    }

    /**
     * @brief   Insert row by name.  Insert at postion so new row will preceed specified row.
     * @param   pRowName              row label name.
     */
    void InsertRow(const std::string& pRowName, const std::string newRowName = "", const string defValue = "")
    {
        ssize_t rowIdx = GetRowIdx(pRowName);
        if (rowIdx < 0)
        {
            throw std::out_of_range("row not found: " + pRowName);
        }

        InsertRow(rowIdx, defValue, 1);
        SetRowName(rowIdx, newRowName);
    }



    /**
     * @brief   Insert column by index. Inserts at specified position so will preceed column at current position.
     * @param   pColumnIdx            zero-based column index.
     */
    void InsertColumn(const size_t pColumnIdx, const string defValue = "")
    {
        const ssize_t columnIdx = pColumnIdx + (mLabelParams.mRowNameIdx + 1);
        for (auto itRow = mData.begin(); itRow != mData.end(); ++itRow)
        {
            itRow->insert(itRow->begin() + columnIdx, defValue);
        }
    }

    /**
     * @brief   Insert column by name.  Inserts at specified position so will preceed column at current position.
     * @param   pColumnName           column label name.
     */
    void InsertColumn(const std::string& pColumnName, const string newColName = "", const string defValue = "")
    {
        ssize_t columnIdx = GetColumnIdx(pColumnName);
        if (columnIdx < 0)
        {
            throw std::out_of_range("column not found: " + pColumnName);
        }

        InsertColumn(columnIdx, defValue);
        SetColumnName(columnIdx,newColName);
    }

Using String for Input

First. Thanks for all the work on this project. It is nicely done.

Second. Hopefully I am not missing something obvious and wasting your time but given the following:

string txt_all = "1,2,3,4,5,6";
rapidcsv::Document csv_doc(txt_all, rapidcsv::LabelParams(-1, -1),
rapidcsv::SeparatorParams(delimiter[0] /* pSeparator /,
false / pTrim /,
rapidcsv::sPlatformHasCR / pHasCR /,
false / pQuotedLinebreaks /,
true / pAutoQuote */));

Can you help me see why this fails? I get a RunTime Library error. "This application has requested the Runtime to terminate in an unusual way...."

Using the exact some code with a file works fine. I only have this problem with a "string". Thanks.

Jim

Handle empty rows/columns

I am parsing a CSV that has empty columns (no column heading, no data, just columns) and getting this error

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stod
[1]    20474 abort (core dumped)  ./test_trip

Which makes perfect sense, but is there a good workaround? (Apart from indexing the columns by column IDs)?

Add support for access original data for advanced process and load csv from string.

Thanks for your contribution for rapidcsv, and I'm using it in my project.
There are some new requirement from me for advanced usage for rapidcsv, and I hope it can help on improving rapidcsv.

I suggest to add another function to load csv, currently ,there are 2 function to call ReadCSV, one is the csv filename in string, another is the istream. And there is another usally condition to load CSV from string object, which I think need to add it to rapidcsv for user convinience.
I add 2 new functions to access the original csv data wihch is mData in rapidcsv which can provide more powerful access to original data. and I using the follow code.
/**
- @brief Get Cell Data by index.
- @param pRowIdx zero-based row index.
- @param pColumnIdx zero-based column index.
- @returns cell data.
  */
std::string GetCellData(const size_t pColumnIdx, const size_t pRowIdx) const
{
const ssize_t columnIdx = pColumnIdx + (mLabelParams.mRowNameIdx + 1);
const ssize_t rowIdx = pRowIdx + (mLabelParams.mColumnNameIdx + 1);
return mData.at(rowIdx).at(columnIdx);
}

/**
- @brief Get Cell Data by name.
- @param pColumnName column label name.
- @param pRowName row label name.
- @returns cell data.
  */
  std::string GetCellData(const std::string& pColumnName, const std::string& pRowName) const
  {
  const ssize_t columnIdx = GetColumnIdx(pColumnName);
  if (columnIdx < 0)
  {
  throw std::out_of_range("column not found: " + pColumnName);
  }
const ssize_t rowIdx = GetRowIdx(pRowName);
if (rowIdx < 0)
{
throw std::out_of_range("row not found: " + pRowName);
}

return GetCellData(columnIdx, rowIdx);
}

[Bug] apparent index error

Apparently there is a index error, this is my test case:

sample.csv:

A,B,C
1,2,3
1,2,3
1,2,3
1,2,3

test.cpp

# include "rapidcsv.h"

int main(){
    rapidcsv::Document doc("sample.csv");
    doc.GetColumn<int>(0);
    doc.GetColumn<int>(1);
    doc.GetColumn<int>(2);

    return 0;
}

error:

$ g++ -pipe -std=c++11 -pedantic -Wall -Wextra -fexceptions -g test.cpp -o test
$ ./test 
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 3) >= this->size() (which is 3)
Aborted

Modern CMake?

Would you be willing in accepting a patch to modernize CMake usage? Like defining a target instead of variables containing include paths?

I would be willing to create such a patch, just checking whether there's a reason like backwards compatibility for keeping it that way :-)

stod exception

Hello,
Thank you so much for this library ! It's amazing 👍
I struggle reading my first document though. I receive a stod exception, and I don't exactly know why.

my sample.csv:

ID,latitude,longitude
A003411,‐12.5799,134.3092
ABTC28343,‐12.8833,135.4667

the code:

      rapidcsv::Document doc("sample.csv");
      std::vector<std::string> IDs = doc.GetColumn<std::string>("ID");
      std::cout << IDs.at(0) << std::endl;
      std::vector<double> lats = doc.GetColumn<double>("latitude");
      std::cout << lats.at(0) << std::endl;
      std::vector<double> longs = doc.GetColumn<double>("longitude");

Output:

A003411
stod

I don't really know what's wrong :/ Thanks !

Get column names

Hi, this is a request to add a func called GetColumnNames() that would return the column names in the csv. I notice there is already a variable called mColumnNames.

[Feature] allow the use of GetRowCount and GetColumnCount methods, making them public

I have a use case where the input CSV has a variable number of columns, and I need to load them as vectors.

Actually, I think that I need to do something like that:

#include "rapidcsv/src/rapidcsv.h"
#include <iostream>

int main(){
    rapidcsv::Document doc(rapidcsv::Properties("sample.csv", 0, -1));

    std::vector<std::vector<char>> db;

    std::vector<char> col;
    unsigned int i;
    bool stop;

    for(i = 0, stop = 0; ! stop; ++i)
    {
        try
        {
            std::cout << i << " of unknown" << std::endl;
            col  = doc.GetColumn<char>(i);
            db.push_back(col);
        }
        catch(std::out_of_range)
        {
            std::cout << "Cannot obtain column " << i << std::endl;
            stop = 1;
        }
    }

    return 0;
}

to know at runtime that there are just 3 columns:

0 of unknown
1 of unknown
2 of unknown
3 of unknown
Cannot obtain column 3

when I could use the (currently private) doc.GetColumnCount():

# include "rapidcsv/src/rapidcsv.h"

#include <iostream>

int main(){
    rapidcsv::Document doc(rapidcsv::Properties("sample.csv", 0, -1));
    std::cout << "columns: " << doc.GetColumnCount() << std::endl;
    std::cout << "rows: " << doc.GetRowCount() << std::endl;

    std::vector<std::vector<char>> db;

    std::vector<char> col;
    unsigned int i, cols = doc.GetColumnCount();

    for(i = 0; i < cols; ++i)
    {
        std::cout << i << " of " << cols << std::endl;
        col  = doc.GetColumn<char>(i);
        db.push_back(col);
    }

    return 0;
}

and obtain:

columns: 3
rows: 5
0 of 3
1 of 3
2 of 3

PS the used data is the follow:

A,B,C
a,b,c
a,b,c
a,b,c
a,b,c

Adjust input/output parameter order

Swap reference and const reference argument order. It's a good practice to put all modified parameters at the end of function argument list.

rapidcsv is not that rapid!

Thanks for this module. I expected it would be faster than it is however.

I changed the main character addition line from:
cell = cell + buffer[i]; to
cell += buffer[i];
and on Visual Studio 2017 64 bit it ran twice as fast!

Is there a plan to release conan packages?

Load New Document using same variable.

I think I have everything working I wanted to do with one exception. I probably have a somewhat unique use case where I need to load a document to a global variable so I can make calls from other from other functions but the situation is such that I cannot pass the document as a parameter. This works fine apart from the fact that I can't load a new file. It keeps the old one. Seems odd and I can't find any way to load another file.

Also related, how can I clear the contents from memory. If someone loads a large file and have finished I would like to clear it. The answer to the first part will probably answer this one.

Again related and perhaps will also be answered by the first one. How can I load a document with the preferred settings to a global variable. It appears I have to do the following to load the document with settings which, of course, creates a local version. I have, sort of, worked around some of the issue by creating 4 different versions globally and one generic instance and have the user supply an option parameter and based on that use the Load() method and then copy the loaded one to the generic one. This seems like it shouldn't be that complicated so guessing I am missing something obvious.

rapidcsv::Document doccsv(istringstream(""), rapidcsv::LabelParams(-1, -1),
rapidcsv::SeparatorParams(',' /* pSeparator /,
false / pTrim /,
rapidcsv::sPlatformHasCR / pHasCR /,
false / pQuotedLinebreaks /,
true / pAutoQuote */));

Thank you again. Hopefully I am not wasting your time as I so blatantly did last time.

Jim

Add more examples to README

Proposed examples to be added:

List supported datatypes for Get/Set-functions, i.e. int, long, long long, unsigned, unsigned long, unsigned long long, float, double, long double, char, std::string.
How do construct Document from a std::string containing CSV data, using stringstream - see #25
Examples showing reading three common CSV file types (with table illustrating file content); no headers, column headers, both column and row headers.

cmake option for building tests

Hi, I'm pulling in the library through FetchContent_Declare (cmake), and this consequentially makes it s.t. all the tests are a part of the build as well. Would be great if an option is added to cmakelists to prevent tests from being added.

Sort headers in alphabetic order

The includes short be sorted alphabetically.

Reading line by line and ignoring empty lines

Hello

Is there an example of reading a file line by line?

Also how can I handle empty lines

e.g.

x, y, z
1,2,3
4,5,6

7,8,9

Using a different seperator?

Hey, I really love this project. It makes my life so much easier.
So far the rapidcsv.h header file worked really well with actually comma separated files. However. I could not find any parameters which would allow me to use separators like ';' or ' '.
Only for writing I have found a way to solve this problem, by modifying line 598ish properly. Do you have any suggestion how the use of different separators could be applied?

Assign empty string using std::string() rather than ""

It's better to replace const std::string& pPath = "" by const std::string& pPath = std::string() to avoid unnecessary type conversion.

Document copy constructor should not be explicit

The Document copy constructor should not be explicit, because that makes it pretty much useless. For example, it needs to be called implicitly when returning a Document by value from a function.

The explicit qualifier should be removed, or even better, the entire copy constructor could actually just be removed, because it's just a trivial copy constructor that the compiler will implicitly declare anyway.

Make functions of `Converter` class as const

Make functions of Converter class as const as they don't modify the class.

Add explicit constructor for struct Properties

struct Properties has default implicit constructor

Can't call GetColumn on First Column

A call to GetColumn(std::string) always results in a std::out_of_range.
Tried this with ex001 and "Date" as collumn.
I guess the Problem is that this: https://github.com/d99kris/rapidcsv/blob/master/src/rapidcsv.h#L1144

return mColumnNames.at(pColumnName) - (mLabelParams.mRowNameIdx + 1);
is returning -1

as mColumnNames.at(pColumnName) = 0 and mLabelParams.mRowNameIdx + 1 = 1

Doesn't support number in scientific notation

For example, if a number is written as 2.00E-07, the code does not recognize this scientific notation and will return 0.000000 instead when printed out as float.

How to ignore empty lines at end of file?

Given the following CSV

COLUMN_A,COLUMN_B
value 1,value 2
value 3,value 4

Calling document.GetColumn<std::string>("COLUMN_A") works but document.GetColumn<std::string>("COLUMN_B") throws an exception:

  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)

If I remove the empty line and the document's end, it works as expected. How can I tell rapidcsv to automatically ignore the empty line at the end?

Why GetRowIdx is private?

Hi, I use rapidcsv with ImGui to show csv as a table.

I want to hightlight a row in table that corresponds to row name.
To do so, GetRowIdx needs to be public method, however it's not.
I think that make GetRowIdx method public is good for use with other C++ library.
Could you tell me any reason of the method is private?
If there is no reason, I'll send PR to make it public.

rapidcsv is very convenient to process csv in C++ and want to improve this library :)

First column not being read

When I read a csv file as follows:
std::vector picseq = doc.GetColumn("picseq");

where picseq is the first column, I get a column not found error. It's attached, renamed with a .txt extension.
I seemed to have fixed it with the changes below, dropping some "+ 1" - however, let me stress I do not know what I am doing - I don't really understand the code and I just tried to make bad indices look good, and I have not investigated possible unwanted consequences of this change.
Thanks for your attention and your code.

@@ -454,7 +454,7 @@ namespace rapidcsv
template
std::vector GetColumn(const size_t pColumnIdx) const
{
const ssize_t columnIdx = pColumnIdx + (mLabelParams.mRowNameIdx + 1);
const ssize_t columnIdx = pColumnIdx + (mLabelParams.mRowNameIdx);
std::vector column;
Converter converter(mConverterParams);
for (auto itRow = mData.begin(); itRow != mData.end(); ++itRow)
@@ -1351,7 +1351,7 @@ namespace rapidcsv
{
if (mColumnNames.find(pColumnName) != mColumnNames.end())
{
return mColumnNames.at(pColumnName) - (mLabelParams.mRowNameIdx + 1);
return mColumnNames.at(pColumnName) - (mLabelParams.mRowNameIdx);
}
}
return -1;

VSL.txt

Add method to check if a column exists

It would be nice if rapidcsv will have public method to check if a column with a specific name exists in a csv file. So far, we can only load columns with a given name, and if the column does not exist it throws the exception and quits the program.

It would be good to give the user of rapidcsv ability to handle cases when specific column does not exists.

CSV file with header row at other position

I have a CSV file whose header row is at row 6.

I tried using doc.RemoveRow() to delete the first few rows and then read the values of a column and it doesn't seem to work.

Howe can I delete row 1 to 5 using the program?

TimingSets2-10M.zip

Add support for removing double quotes around cells

This feature suggestion came in via email and I thought it made sense so I'm logging an issue.

Today when using rapidcsv to read a file with quoted cells, the actual double quotes are retained in the read data. So reading a file like
https://github.com/vincentarelbundock/Rdatasets/blob/master/csv/carData/TitanicSurvival.csv which has content like

"","survived","sex","age","passengerClass"
"Allen, Miss. Elisabeth Walton","yes","female",29,"1st"
"Allison, Master. Hudson Trevor","yes","male",0.916700006,"1st"
"Allison, Miss. Helen Loraine","no","female",2,"1st"
"Allison, Mr. Hudson Joshua Crei","no","male",30,"1st"

then getting a cell using GetCell<std::string> the resulting string will contain the double quotes, e.g. "Allison, Miss. Helen Loraine".

The proposed functionality is to enable automatic stripping of the start/end quotes (and escaped double quotes "" should just be a single double quote ").

This functionality should be controlled by a parsing option.

Similarly for writing CSV files, this option should control whether to automatically quote strings with space(s).

Add a trim column name option

Would it be possible to add a trim option for colum name?
It would allow to read the following CSV, and get a, b, c as columns names and not b, c

a, b, c
0, 1, 2

I guess it requires to update line 1012 of the file

Make get-methods const

Support for Column Data Type that is string

I have seen that there is an issue that is about supporting column data type that is "char".
How about a string (in C/C++, it means a null terminated char array).

In the attached example csv format file, there are columns that contain strings.

Pin Voltage Levels.txt

Allow GetCell() by row index and column name

It would be convenient to allow direct access to a cell's value by row index and column name like this:

T GetCell(const std::string& pColumnName, const size_t pRowIdx) const;

Motivation: I'm iterating through a CSV with time series and I know the rows are in chronological order. However I only know the column names but cannot rely on their order.

Parse CSV from string/memory buffer

Hi @d99kris, thanks a lot for the great library!

I had a small question: it is possible to use rapidcsv to load csv text that has been already loaded in memory, for example that is contained in a std::string or a similar memory buffer? This is convenient when the CSV text to parse is not part of a file in a filesystem, but for example has been obtained from a network communication.

Not Reading First Column of CSV File

Hello,
First of all thank you for your work on this library it seems pretty awesome.
That said I have an issue that I don't understand:
I have a CSV file like this:

Nodes,Inputs,Outputs,Group,DATA SOURCE
Coatings,Bio-polymers,,1,barron18MarchMeeting.pdf p. 4
Chemicals,,,1,barron18MarchMeeting.pdf p. 4
Health,Lipids,,1,barron18MarchMeeting.pdf p. 4
Personal care,,,1,barron18MarchMeeting.pdf p. 4
.........................................................................

When I try to read it it ignores the first column...
To get it to read it I have to add commas before every value.
Here's my code:

vector<string> nodes;
rapidcsv::Document doc("InternetOfEnergy.csv");

nodes = doc.GetColumn<string>(0);

for (int i = 0; i != nodes.size(); ++i)
{
    cout << nodes.at(i) << endl;
}

Any idea why this is happening?
Thank you
Carlo