Coder Social home page Coder Social logo

Comments (6)

Jeehut avatar Jeehut commented on May 21, 2024

When working with (potentially) large text files it's a good idea to read the file line-by-line to have all the benefits of it (mentioned in the README). Though if you know your file isn't too big to handle or if you don't care, then you can always use this initializer to create a CSVImporter object using a String. To do this you would need to read the contents of the CSV file by yourself. This way you have a String object which you can use to get the total number of lines. The code could look something like this:

let contentString = try! String(contentsOfFile: "path/to/your/file.csv")
let totalLinesCount = contentString.components(separatedBy: CharacterSet.newlines).count
let importer = CSVImporter<[String: String]>(contentString: contentString)

You can also see this example in the tests here.

The above code is a workaround though and might not perfectly work depending on the line ending of your file. As you can see here we already have the lines somewhere within CSVImporter, but it's not public, so you can't read it.

I think to add official support for the total number of lines we could add a public computed property which returns an Optional to CSVImporter which could look like this:

public var totalDataLinesCount: Int? {
    guard case let stringSource = source as? StringSource else { return nil }
    return stringSource?.lines.count
}

It would only work, if you initialize CSVImporter with a String, but it would make sure you don't get into trouble with line endings.

@ambujpunn Would you like to add this feature with test and send a PR? 😃

from csvimporter.

ambujpunn avatar ambujpunn commented on May 21, 2024

@Dschee Wouldn't this only work for when loading an entire csv file into a huge string? Ideally, we'd like to continue and extend the awesome behavior of CSVImporter which is to read line by line rather than store it first somewhere

from csvimporter.

Jeehut avatar Jeehut commented on May 21, 2024

Well, there's a logical problem there though, isn't it? I mean, if you wanna read a file "line by line" then you can't know how many lines the file has since you haven't read the entire file yet, no? What you could do is guess the total number of lines based on the file size. But as this is not accurate by any means, I tend not to include such a feature into CSVImporter. It's gonna result in this.

If you have any other idea of how we could do this, then please, explain and I'll consider adding it.

from csvimporter.

loukrieg avatar loukrieg commented on May 21, 2024

Just a suggestion, but perhaps a separate API could be added that would iterate through the file in chunks, so everything wouldn't need to be in memory at once, just counting the line endings (not within quoted strings).

from csvimporter.

Jeehut avatar Jeehut commented on May 21, 2024

Yeah, that could be possible. But it would still mean that the file is traversed twice, once for checking the total number of lines and once for actually processing the data. Of course, in some cases this might not be a problem, so as long as documentation is very clear on the performance drawback, I'd be happy to merge this feature into CSVImporter. Any volunteers? Cause I won't much time the coming months, maybe sometime in December ...

from csvimporter.

Jeehut avatar Jeehut commented on May 21, 2024

I'm closing this feature as not many people seemed to be interested in it and there's a workaround available by checking the file manually. Feel free to post a PR if you want this feature and are ready to implement yourself.

from csvimporter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.