Comments (12)
Prior art by @jeresig : http://ejohn.org/blog/node-js-stream-playground/
from docs.
modules you should use if you care about error handling (aka "why i dont use .pipe in production"):
other ones I use all the time
good streaming data format modules
- http://npmjs.org/csv-parser
- http://npmjs.org/ndjson
- http://npmjs.org/JSONStream
- http://npmjs.org/tar-stream
from docs.
So, what Jan was alluding to (re: twitter) is that I built a project that initially blew up node (due to buffering so much data in memory) so I moved to streams.
A quick list (off the top of my head) of the main pain points of working with streams (though I had it working in the end):
- There's no real information on when a stream starts or why it might stop
- Large streams with lots of
through
points can randomly stop (I was supposed to be processing 1/2 million objects, but my streaming code would just randomly stop at ~120,000 objects) - I'd regularly see streams bail halfway through, but I'd never see an actual error (and I wasn't entirely sure where I was I supposed to put an error handler)
- Telling if the stream was no longer readable because it hadn't started or because it had finished wasn't clear (there's a
readable
property, but I couldn't find the documentation that went with the property, and I'm wary on relying on guess work) - I wrote a peek method to look at the first object in a stream, but it would frequently totally bork my stream - I had tests working, but production code (wrapped up in promises - obviously bespoke to me)
I used through a lot/exclusively, and found it the simplest way of working.
You can see the project source code that I ended up with here: https://github.com/eHealthAfrica/universal-exporter/
The big issue I kept hitting is that all the examples and demos around streaming were for extremely small datasets, which is fine, but this isn't what streaming solves. Streaming is a perfect match for very large datasets. Buffered programming is less taxing on the brain, so if I'm committing to the new/better way of framing the problem around streams, I need to have examples that are dealing with equally large datasets (and yes, stdin is infinite, but you rarely push a gig or more through it during a demo).
from docs.
@remy great list! here's some random feedback
- for peeking we use https://www.npmjs.com/package/peek-stream
- definitely use through2 over through
- for error handling, always use
pump
with a callback and.destroy
- on debugging, I started this discussion rvagg/through2#33 but it is not resolved yet. the current idea is to monkeypatch the
Stream
base constructor to insert debug info but nobody has done this
in general I think debugging sucks (lack of stream specific debugging tools), but on perf and error handling I am happy
from docs.
@maxogden I did look at that peek-stream module and it made no sense to me. It only seemed to solve the specific problem of redirecting the stream. I read through the code several times before - trust me, I definitely did not want to re-invent anything!
Re: through2 vs through - why? When I asked, through2 just seemed to give more control/options and thus complexity.
What I'd love to see: a visualisation of stream pipelines, and some way of injecting a test object into the stream to watch it mutate through the stream(s). ...but I can dream on! :)
from docs.
@remy ah ok, sorry about that. I'll try and improve the docs. To use it to only peek at the value you could treat it like a transform stream, e.g.
var read = fs.createReadStream('orig.txt')
var write = fs.createWriteStream('copy.txt')
var peeker = peek(function(data, swap) {
console.log(data)
swap(writer) // immediately swap
})
pump(read, peeker, function (err) { })
re: through vs through2, all the reasons are related to streams2, the tl;dr of which is that backpressure works differently, and better, in streams2 and above, so its important to use streams2 modules with other streams2 to make sure things get buffered correctly etc
from docs.
👍 to good stream tutorials! Feel free to do what you'd like with my playground, as well: https://github.com/jeresig/node-stream-playground
If you'd like to host it (!) that'd be super-appreciated, as well. Would be happy to transfer the domain name and everything. Let me know how I can help!
from docs.
@jeresig oooh maybe @finnp would be interested in collaborating there, he has been working on http://www.finnpauls.de/streams-editor/
from docs.
👍 Streams are rather confusing to new devs. There's a lot of magic in there.
We've discussed having separate guides for both using streams and for implementing streams, along with a topic page on understanding streams internals. PR are of course welcome. Do you think that set of docs pages could cover everything well?
from docs.
The biggest antipattern I've seen in 'implementing streams' education is where people (both teachers and learners) think they have to know all aspects of streams1, 2 and 3, exhaustively list all caveats and properties, methods etc. In reality you don't need to know 80% of that stuff to get started implementing streams, and trying to learn it all at once always fails and causes people to give up. Also I don't think people should require('stream') personally to implement streams (it's just too hard to learn), they should use the abstractions that already exist.
from docs.
I wrote a small article about real use case of using streams in Node.js https://hackernoon.com/node-js-streams-in-action-1495c22fafec
Maybe somebody find it useful
from docs.
Closing as this repository is dormant and likely to be archived soon. If this is still an issue, feel free to open it as an issue on the main node repository.
from docs.
Related Issues (20)
- Node v6.3.1 docs: net.Socket HOT 1
- StackOverflow Documentation for Node.js HOT 5
- Circular reference for OS Constants HOT 3
- http ClientRequest documentation unclear about inheritance when visually scanning HOT 2
- .read() stream not fully explained HOT 9
- Rough Meeting Notes (2016-12-01 @ NINA) HOT 8
- What errors can be thrown?
- Async meetings HOT 6
- How-to use LetsEncrypt Guide HOT 13
- Meeting #1 HOT 33
- Add @vsemozhetbyt? HOT 4
- Meeting #2 HOT 2
- descriptions of "The module Object"'s property are not clear HOT 1
- clarity on asynchronous methods throwing exceptions HOT 2
- http.ClientRequest is missing some methods HOT 3
- Package documentation (how to intl) HOT 1
- Decharter this Working Group? HOT 8
- Better wording for modules_all_together HOT 3
- Suggestion: Return type in function declaration & possible option to view types by clicking HOT 2
- Improve the words usage in socket.setTimeout() definition HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docs.