Coder Social home page Coder Social logo

Stream files from the network about papaparse HOT 4 CLOSED

mholt avatar mholt commented on August 17, 2024
Stream files from the network

from papaparse.

Comments (4)

mholt avatar mholt commented on August 17, 2024

Some preliminary testing shows that this is not too difficult. Fortunately both the FileReader API and AJAX requests are asynchronous, so the groundwork is already laid to parse chunks of a CSV file asynchronously. We just need to build in support for using something like this:

$.ajax("/plu_codes.csv", {
    type: "GET",
    headers: {
        "Range": "bytes=0-1024"
    }
}).done(function(data)
{
    // ... treat data just as if it was a chunk read from the FileReader 
});

from papaparse.

mholt avatar mholt commented on August 17, 2024

Thinking out loud here...

Already in the current version of Papa Parse, downloading a file and parsing it (assuming it is not too huge and can fit in memory well -- say, under 100 MB) works as easy as this:

$.get("some_file.csv", function(data) {
    var results = $.parse(data);
});

But that doesn't "stream" the file: if that file is too big for the browser tab to handle, say even 1 GB, then this would just cause the browser the lock up. In order to download and parse huge files, while keeping Papa easy to use, what about invoking Papa so that it uses the Range header as described above like this:

$.parse("some_file.csv", {
    ajax: true,
    step: function(data, jqxhr) {
        console.log(data.results);
    }
});

So you specify ajax: true in the config object in order to tell Papa that the string you gave it is a path to a CSV file to download, so it uses a GET request to download the file. If you also specify the step function, as we have here, it uses the Range header to stream the file chunks at a time.

Two things to work out still:

  1. When doing AJAX parsing, the call to $.parse is asynchronous, meaning it needs a callback function. This is similar to how files are already parsed (you supply a complete) callback. How should this work?
  2. The underlying AJAX requests aren't customizable. I'm worried that letting users pass in a config object for $.ajax would make it easy for users to break Papa unintentionally. In other words, the target file better be accessible with a simple GET request. It's a tradeoff I think I'm willing to make, but will accept feedback if anyone has it.

Since this is for 3.0, I'm willing to make big breaking changes to keep Papa easy to use.

from papaparse.

mholt avatar mholt commented on August 17, 2024

Okay, I think I've resolved both those things.

$.get.parse("files/asdf.csv", {
    config: {
        step: function(data, handle) {
            console.log(data, handle);
            // handle gives access to pause(), resume(), jqxhr, etc.
        }
    },
    complete: function(data) {
        console.log("Done!");
    }
});

Calling $.get.parse indicates to Papa that the string given it should be downloaded via a GET request to then be parsed. The second argument has basically the same object structure as when you parse a file, thus resolving number (1) from above.

Number (2) above is resolved because I've decided that the AJAX request done by Papa will be a simple GET request. However, the internal functions that perform the network requests, file reading, and do the parsing will be exposed so the user can utilize them at a lower level if desired.

from papaparse.

mholt avatar mholt commented on August 17, 2024

Still have some tweaking and optimizing to do, but this is now done.

from papaparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.