Coder Social home page Coder Social logo

Comments (10)

vincentlaucsb avatar vincentlaucsb commented on May 11, 2024

Your header row has 7 commas (8 columns) while your data rows have 8 commas (9 columns).

You'll either want to strip out the extra commas before sending it through the CSVReader or create your own subclass of CSVReader which overrides bad_row_handler such as here: https://github.com/vincentlaucsb/csv-parser/blob/master/include/internal/csv_reader.cpp#L34

from csv-parser.

bangusi avatar bangusi commented on May 11, 2024

OK. But I would have though the library should not silently fail when encountering such files.
BTW I did not create this file, it was generated by some bank so it is a legitimate case of csv files one may encounter in real life. As mentioned above office applications such as LibreOffice are able to handle it.

from csv-parser.

vincentlaucsb avatar vincentlaucsb commented on May 11, 2024

I agree there should probably be a warning when the majority of rows are rejected.

However, I don't think it's a reasonably formatted CSV file. I don't know why you'd add an extra comma to your data rows when there's not one in your header row. I've heard of it before, but it's not present in the majority of CSV files.

This CSV parser is also capable of handling it, you just need to tell it that it's a weird file. I think the easiest way to do it is to just turn the header row off, use auto-detection, and use bad_row_handler to capture the header row. I built this parser mainly to load CSV files into databases or run data analysis on them, which is why it's a bit anal about the column count.

from csv-parser.

bangusi avatar bangusi commented on May 11, 2024

Another suggestion would be to provide a configuration option that allows the user to ignore extra rows. I have seen this feature in other file import (into database ) utilities.
It is not practical to modify the library for every weird case.

from csv-parser.

bangusi avatar bangusi commented on May 11, 2024

I just noticed that this issue #66 is supposed to address the same problem that I reported in #77.
Looking forward to the enhancement.
o called "Malformed rows" are actually not rare in csv files.

from csv-parser.

vincentlaucsb avatar vincentlaucsb commented on May 11, 2024

The new version should handle variable length-rows in a first-class manner

from csv-parser.

bangusi avatar bangusi commented on May 11, 2024

Wondering what the expected behaviour should be after the fix.
Now the library instead of crashing is simply silently fails ( stops processing the file ).
Is there a configuration change I need to make?
All I did was download version 1.3.0 the run against the same input file that was crashing it.

from csv-parser.

vincentlaucsb avatar vincentlaucsb commented on May 11, 2024

Try passing in a CSVFormat where you set variable_columns(true)

Example: https://github.com/vincentlaucsb/csv-parser#handling-variable-numbers-of-columns

from csv-parser.

bangusi avatar bangusi commented on May 11, 2024

Yes, now works.

from csv-parser.

vincentlaucsb avatar vincentlaucsb commented on May 11, 2024

Good to hear.

from csv-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.