Coder Social home page Coder Social logo

Simple streams feedback about js-git HOT 64 CLOSED

creationix avatar creationix commented on September 27, 2024
Simple streams feedback

from js-git.

Comments (64)

dominictarr avatar dominictarr commented on September 27, 2024

I approve of consume.

Hmm, if you want this to appeal to a standards committee, you probably should expand the names into something more verbose, with camel case. read -> readNext, stop -> abortStream, consume -> readFrom

Of course, it will need a constructor type, and isSimpleStream static method.

I'm actually being serious. Add extra stuff so that people can argue about it, and then vote to leave off the extras.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@dominictarr I think in recent times the W3C standards favor short names.

They probably will need a constructor though.

from js-git.

creationix avatar creationix commented on September 27, 2024

The problem with needing a constructor is it assumes a prototype and dependence on a proper "this" when calling. Also I much prefer defining a interface rather than a concrete type. I don't want to have to subclass Stream to create a stream.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

+1 for avoiding this

from js-git.

creationix avatar creationix commented on September 27, 2024

As far as always requiring that sinks be objects, I think that's unnecessary. A stream is very often a value you pass around and makes sense to be an object because it has two very distinct channels (data and close). But a sink it usually either a standalone helper function in some library or a part of some other object. The only thing I want to standardize is the property name for the sink in the case where it's part of a duplex stream object. I think consume is fine. I chose sink to be consistent with min-streams and also because it's four letters just like read and stop.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@creationix

var writer = window.FileWriter(location1)
var stream = window.FileStream(location2)
writer.consume(stream)

I have a feeling that the W3C may like that sort of interface more. It also means you can do duck type checks to see whether something is a sink.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

simple streams in generators

The example is while (part = yield stream.read) which feels weird.

Having while (part = yield stream.read()) would be nicer but would murder the API for generators convenience.

from js-git.

creationix avatar creationix commented on September 27, 2024

Yeah, the fact that read is a continuable is just a coincidence, not an API goal.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

read(callback)

We should specify the allowed states. A stream is basically zero or more values followed by an end (an end which is either natural or caused by an error)

stop(err, [callback])

We should specify that once the callback fires any call to .read() will not return any more values. Optionally specify that when you call .stop() and even before the callback fires it should not return any values to the read callback. Basically it would be useful to spec out the relationship between stop() and read()

Also I don't think the callback should be optional.

fileStream.stop(null, function (err) {
  if (err) { /* disk close error */ } 
})

stop() may cause some kind of error when the stream tries to close itself, when that happens it should inform someone.

pullTransform(stream) -> stream

I have called a pullTransform a duplex function before. and I would do duplex TCP like

// tcp.createServer := (Number, (Stream) => Stream)
tcp.createServer(8080, function (socket) {
  return socket
});

A more interesting one would be

// tcp.createServer := (Number, (Stream) => Stream)
tcp.createServer(8080, function (socket) {
  return jsonSerialize(app(jsonParse(socket)))
});

from js-git.

creationix avatar creationix commented on September 27, 2024

@Raynos I really like the tcp accepting a transformer instead of just providing a duplex object and ignoring the return value!

I'm fine with stop requiring the callback if you feel strongly about it. It will make the API a little harder to use, but considering it's not a hot path for most programs, that should be fine.

We should specify the allowed states. A stream is basically zero or more values followed by an end (an end which is either natural or caused by an error)

Yes

We should specify that once the callback fires any call to .read() will not return any more values. Optionally specify that when you call .stop() and even before the callback fires it should not return any values to the read callback. Basically it would be useful to spec out the relationship between stop() and read()

This gets tricky in larger chains. The source may be done sending events and indeed never send anything after calling the callback to .stop(), but the layers after it may still have data pending in the pipeline that eventually comes out. The stop channel is very fast and often will be a direct reference to the stop function in the source. The spec should then say:

The source will never output data events after a stop request has been received, but be aware that other layers downstream may still contain data.

from js-git.

creationix avatar creationix commented on September 27, 2024

Though, I guess since technically every layer exports a new "source" interface, it would need to observe the same rule for stop and clear it's data queue when stop is called. This makes chaining stop much harder, hmmmm.

module.exports = function (source) {
  // ... code including definition of dataQueue
  return { read: read, stop: stop };
  function stop(err, callback) {
    dataQueue.length = 0;
    source.stop(err, callback);
  }
  function read(callback) {
    // do stuff
  }
}

from js-git.

creationix avatar creationix commented on September 27, 2024

Even truncating dataQueue, there can still be pending read calls to the parent source and a flag would need to be set that tells onRead to ignore the value when it finally returns (or call onRead directly or something otherwise crazy)

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

oh, I just realized that this api is nearly identical to the iterator api used in https://github.com/rvagg/node-leveldown see also https://github.com/dominictarr/async-iterator

we originally adopted this because it would be easy to wrap into either streams1, or stream2 and modeled the underlying leveldb api pretty closely.

we chose next and end, but same idea otherwise.

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

regards a consume method, I think consistency trumps necessity.
There are places where you don't care whether something is a complete sink, or just one side of a transform. Also, if a sink is an object, then you can explain sources, explain sinks, and then explain a transform as both a source and a sink.

I started using the word "sink" to designate a stream that has no readable side - this reflects the usage of the word 'sink' in graph theory http://en.wikipedia.org/wiki/Sink_(disambiguation) . having a method "sink" on a transform stream breaks this.

from js-git.

creationix avatar creationix commented on September 27, 2024

@dominictarr oh cool. Do you think it would be confusing to use the same names, but have slightly different semantics. I see leveldown doesn't accept a reason (error) when ending and my data events won't have key and value, just value. Otherwise it seems very close.

from js-git.

creationix avatar creationix commented on September 27, 2024

I think we could get away with not having err in stop/end/close if we slightly changed how error propagation worked. In the new system, end would be used only to notify upstream that we won't be consuming anymore. We could send an appropriate error event downstream at the same time from wherever the error started. Upstream would never know about the error, but I think that's fine. It just needs to know it can safely clean up resources.

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

hmm, yes I think that leveldown could be made compatible with this proposal. the difference is very slight.

from js-git.

creationix avatar creationix commented on September 27, 2024

@dominictarr

regards a consume method, I think consistency trumps necessity.
There are places where you don't care whether something is a complete sink, or just one side of a transform. Also, if a sink is an object, then you can explain sources, explain sinks, and then explain a transform as both a source and a sink.

I started using the word "sink" to designate a stream that has no readable side - this reflects the usage of the word 'sink' in graph theory http://en.wikipedia.org/wiki/Sink_(disambiguation) . having a method "sink" on a transform stream breaks this.

Having a transform that is an object with both readable and writable ends is interesting. (also much closer to how node works with readable and writable streams)

var source = fs.readStream("input.txt");
var sink = fs.writeStream("output.txt");

function transform() {
  return {
    next: function (callback) { ... },
    end: function (callback) { ... },
    consume: function (stream) { ... }
  }
}

// Very nice chaining API though
sink.consume(transform()).consume(source);

But this has issues. next and end have no meaning and nothing to pull from until consume is called setting up their data source. To enable chaining, consume could return the thing it just consumed.

The sink will start pulling from the transform before it has a chance to connect to it's source. This would mean all next functions would need an extra state to handle this early call and defer calling the callback. The nice syntax may or may not be worth this cost. (keep in mind most transforms would already have internal queues, flags and checks. Adding one more isn't too bad in most cases)

var source = fs.readStream("input.txt");
var sink = fs.writeStream("output.txt");

function transform(source) {
  return {
    next: function (callback) { ... },
    end: function (callback) { ... }
  };
}

// Very simple consuming API
// note that I'm still expressing the sink as an object even though the transform is just a function.
sink.consume(transform(source));

If a transform was modeled as a function that accepted a source and returned a new source, it would be more straightforward.

Now duplex streams (as opposed to transform filters) are modeled great as a single object with {next, end, consume} and app logic where symmetry is needed is easy to model.

var jsonCodec = require('json-codec');
var lineCodec = require('line-codec');

tcp.createServer(8080, function (socket) {
  // socket is a "duplex" stream with {next, end, consume}
  // Apply protocol de-framing and framing on the duplex stream
  socket = lineCodec(socket);
  // Apply JSON parsing and Serialization to the duplex stream
  socket = jsonCodec(socket);
  // App is a simple echo server, so echo objects back
  socket.consume(socket);

  // or written as a single expression
  socket.consume(jsonCodec(lineCodec(socket))
});

But if we make the TCP library act like a filter itself, then socket is no longer duplex.

var json = require('json-codec');
var line = require('line-codec');

tcp.createServer(8080, function (socket) {
  // Written as a sequence of actions
  socket = line.deframe(socket);
  socket = json.decode(socket);
  socket = app(socket);
  socket = json.encode(socket);
  socket = line.frame(socket);
  return socket;

  // Or written as one expression:
  return line.frame(json.encode(app(json.decode(line.deframe(socket)))));
});

function app(stream) {
  // Just an echo server
  return stream;
}

from js-git.

Raynos avatar Raynos commented on September 27, 2024

I'm fine with stop requiring the callback if you feel strongly about it.

I don't mind too much, I think it would make things simpler.

I think we could get away with not having err in stop/end/close if we slightly changed how error propagation worked

That could work nicely, all upstream would do with it is echo it back anyway so that's probably better.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

I am wary of

function transform() {
  return {
    next: function (callback) { ... },
    end: function (callback) { ... },
    consume: function (stream) { ... }
  }
}

That is a function that takes no arguments and returns two things as a result. I am personally going to avoid this and focus on fn(input) -> output instead of fn() -> { input, output }

from js-git.

Raynos avatar Raynos commented on September 27, 2024

We can write a simple series helper like @creationix series function and @dominictarr continuable-series module to make working with a chain of duplex function stream things easier.

I am +1 for this as it allows us to model implementation details as functions and embed a helper flow control function in the app code without relying on a simple-stream flow control library.

It also means the implementation of "json-code" probably has no dependency as it's a simple transform function

var json = require('json-codec');
var line = require('line-codec');

tcp.createServer(8080, function (socket) {
  return series(
    json.decode,
    line.deframe,
    app,
    line.frame,
    json.encode
  )(socket)
});

function app(stream) {
  // Just an echo server
  return stream;
}

function series() {
  var args = [].slice.call(arguments)
  return function duplex(stream) {
    for (var i = 0; i < args.length; i++) {
      stream = args[i](stream)
    }
    return stream
  }
}

from js-git.

creationix avatar creationix commented on September 27, 2024

@Raynos, so then looks like you vote for modeling transforms as function (stream) -> stream and letting external libraries make it pretty (basically what I've been doing for min-streams all along)

from js-git.

Raynos avatar Raynos commented on September 27, 2024

yes!

from js-git.

creationix avatar creationix commented on September 27, 2024

Ok, so recap with the latest API proposal:

var stream = {
  next: function (callback) { ... },
  end: function (callback) { ... }
};

function transform(stream) {
  // ...
  return { next: next, end: end };
  function next(callback) { ... }
  function end(callback) { ... }
}

var duplexStream = {
  next: ...
  end: ...
  consume: ...
};

A transform function could be a duplex transform and accept a duplex stream and return a new duplex stream. Though it's probably better to write transforms as separate encode and decode. A generic duplex transform that accepted encode and decode would be easy to write.

function duplex(decode, encode) {
  return function (original) {
    var transformed = decode(original);
    transformed.consume = function (source) {
      original.consume(encode(source));
    };
    return transformed;
  };
}

from js-git.

creationix avatar creationix commented on September 27, 2024

Usage of duplex above would be:

// Create a duplex stream
var socket = tcp.connect(1337);

// The protocol is framed, let's remove that layer.
socket = duplex(line.deframe, line.frame)(socket);

// The protocol is also JSON encoded
socket = duplex(json.decode, json.encode)(socket);

from js-git.

creationix avatar creationix commented on September 27, 2024

Hmm, I just realized this duplex helper is just a weaker version of the series helper from above from the point of view of the consumer. It works differently inside because it's working with duplex streams and not simple streams though.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@creationix I think the symmetrical sugar is weak enough that something like

function serialize(stream) {
  return series(line.deframe, json.decode, stream, line.frame, json.encode)
}

Would cover most use cases.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

pushTransform(emit) -> emit

I have yet to really understand the motivation for that one. I find function (stream) -> stream simple enough that I don't know when to use a push transform instead.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

Sink with errors / finish

var sink = {
  consume: function (stream, callback) { }
}

if a sink took a callback it could call it with an error if it failed to consume the source. It can also call it without an error to signal that it finished consuming the source.

from js-git.

chrisdickinson avatar chrisdickinson commented on September 27, 2024

I took some time to put together a little playground to test simple-streams out. I have yet to test out the duplex pattern.

Some thoughts:

  • sinks need more of a spec. in particular, I didn't know what how to communicate completion/errors out to client, non-stream code. I'm +1 on @Raynos's suggestion.
  • it feels much easier to get right than min-streams.
  • is it the responsibility of the sink to call .end on its input stream to allow for cleanup? or is stream.end() more exceptional than that?
  • what do we do with extra read callbacks? the dom example has a filter stream module that collects a list of callbacks. right now I simply truncate that buffer on end, then forward the end. is this correct?
  • there's more boilerplate than using through, but impressively, not much.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@chrisdickinson I would assume that if a stream ends with or without an error then you should NOT call .end() / .stop() / .abort() on said stream.

Also you can do filter way simpler

function filter(lambda) {
    return function duplex(stream) {
        return { next: next, end: stream.end }

        function next(callback) {
            stream.next(function onread(err, value) {
                if (value === undefined) {
                    return callback(err)
                }

                var keep = lambda(value)
                if (keep) {
                    return callback(null, value)
                }

                stream.next(onread)
            })
        }
    }
}

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@creationix with min streams I believe calling next() and end() concurrently was invalid. Because both next() and end() were the same function.

Maybe we should dis allow the consumer of a stream to call those two functions concurrently and force them to wait for a result from next() before they can call end()

from js-git.

creationix avatar creationix commented on September 27, 2024

sinks need more of a spec. in particular, I didn't know what how to communicate completion/errors out to client, non-stream code. I'm +1 on @Raynos's suggestion.

In min-streams, sinks return continuables for exactly this reason. It's worked out great so far in my js-git code. The continuable will resolve with the end/error event in the stream.

it feels much easier to get right than min-streams.

That's the goal.

is it the responsibility of the sink to call .end on its input stream to allow for cleanup? or is stream.end() more exceptional than that?

No, you only call end if you want to end the stream early. Perhaps calling is end is confusing with the end in node's writable stream interface?

what do we do with extra read callbacks? the dom example has a filter stream module that collects a list of callbacks. right now I simply truncate that buffer on end, then forward the end. is this correct?
there's more boilerplate than using through, but impressively, not much.

After sleeping on it, I think it's best to say that calling .end() doesn't guarantee the data stream will stop right away. That adds too much boilerplate and extra code to each and every layer for little gain. Also I don't think the source needs to insert an end event into the stream when .end() is called. The "end" event in the stream is for natural ends. If a filter in the middle wants to cleanup something it needs to listen for both "end" events and the callback to ".end(callback)" since either could end the stream (and both may happen sometimes).

As far as truncating callbacks, just make sure to always eventually call every callback. It really messes up programs when callbacks never get called.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

btw I would vote for { read, abort } as more intuitive then { next, end }

Also "just make sure to always eventually call every callback." answers a lot of questions about read and abort interaction.

from js-git.

chrisdickinson avatar chrisdickinson commented on September 27, 2024

Yeah. I was thinking that cancel or abort would be more descriptive than end. +1 on a name change.

On Jul 2, 2013, at 8:33 AM, Raynos [email protected] wrote:

btw I would vote for { read, abort } as more intuitive then { next, end }

Also "just make sure to always eventually call every callback." answers a lot of questions about read and abort interaction.


Reply to this email directly or view it on GitHub.

from js-git.

creationix avatar creationix commented on September 27, 2024

I think there needs to be very little interaction between read and abort. If you try to read from a source that's been stopped, it will simply emit end. So I take back what I said about pieces in the middle needing to intercept the abort call. Everyone can just keep reading from their source till value is undefined and then know the stream is done.

var stream = {
  read: function (callback) { /* callback(err, value) */ },
  abort: function (callback) { /* callback(err) */ }
};

function transform(source) {
  return { read: read, abort: source.abort };
  function read(callback) {
    source(function (err, value) {
      if (value === undefined) return callback(err);
      callback(null, value.toUpperCase());
    });
  }
}

from js-git.

creationix avatar creationix commented on September 27, 2024

Also I think I want to model tcp streams as stream transforms instead of duplex streams. Adding in duplex complicates the model a lot.

// Echo server
tcp.createServer(8080, function (socket) {
  return socket;
});

// Echo client
tcp.connect(8080, function (socket) {
  return socket;
});

from js-git.

creationix avatar creationix commented on September 27, 2024

As to @chrisdickinson's question about the cases where you really need a writable interface (where normal imperative logic is better than a state-machine transformer):

tcp.createServer(8080, function (input) {
  var output = writableSource();
  // output has both readable and writable interfaces:
  // { read(callback), abort(callback), write(value, callback), end(err, callback) }
  // or { read(callback), abort(callback), emit(err, value, callback) }
  // The exact interface for writable doesn't matter because it's not part of the simple-stream spec.
  return output;
});

Though you usually also want your input to be pushed to you in the app case, so the push-filter interface is best here I think.

tcp.createServer(8080, pushToPull(function (emit) {
  // Call emit(err, item) every time we want to write data outwards.
  return function (err, item) {
    // Called every time data is written to us
  };
}));

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

What about if a transform stream is still a {consume} but it just returns it self. sure there is a thing here where someone might read from it, but it doesn't have an input yet, but that can be abstracted, and is even useful in some cases.

At least, as the stated benefit of objects is that you get structural typing, making other streams objects, but transforms not objects is not consistent. Conceptually, they are still streams.

from js-git.

creationix avatar creationix commented on September 27, 2024

I don't see transforms as streams. That's just a pattern I've seen done in node where there are duplex streams. Transforms can be modeled as duplex streams, but I don't feel it properly represents them.

To me, having only streams be objects is quite symmetrical and clean. The only basic type here is the stream. It's the only thing in the spec that everything else has to agree on. Things like sources, filters (transforms), and sinks are simply functions that either accept or return streams. Push-filters are just an easy way to create normal stream consuming filters and are really not part of the spec.

  • source = function that returns a stream (and accepts optional stream setup arguments)
  • filter = function that accepts a stream and returns a stream (and may have additional option arguments)
  • sink = function that accepts a stream (and returns a continuable)

from js-git.

creationix avatar creationix commented on September 27, 2024

So to further explain, let's take some known APIs from the non-streaming world. A string is like a stream, but instead of through time, it's a stream of bytes through memory, but all seen in an instant. If I wanted to take a string of JSON and transform it into the object that it represents, I don't use an object to do that conversion, I use a function.

var obj = JSON.parse(json);

Likewise if I have a stream of raw json strings through time and want a new stream of parsed objects through time, I would do the same thing.

var objStream = json.decode(jsonStream);

from js-git.

creationix avatar creationix commented on September 27, 2024

The signature of JSON.parse is (jsonString) -> object and the signature of json.decode is (stream<jsonString>) -> stream<object>.

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

right, I see, so the only difference between this design and pull/min is that it's two separate functions, rather than one that combines them.

from js-git.

creationix avatar creationix commented on September 27, 2024

@dominictarr basically yes, and I changed the error handling slightly so that you don't send a reason when aborting the stream and you don't wait for the source to reflect the reason, but send your own error downstream directly.

from js-git.

creationix avatar creationix commented on September 27, 2024

Ok, I've updated the official spec to reflect these changes. It's looking real clean. https://github.com/creationix/js-git/blob/master/specs/simple-stream.md

from js-git.

creationix avatar creationix commented on September 27, 2024

@dominictarr also, since sink isn't part of the stream spec, you're free to implement your transforms/filters as objects if you want. I can still interop as long as we all use the same interface for the streams themselves. As for me personally, I much prefer filters being functions that accept and return streams. I aim to write all js-git related filters as functions.

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

so, I can think of a few situations where the reason might be important. example: on tcp you want to know if the stream failed because there was no server, or if it dropped the connection, or it timed out.

if you where writing to a file, you want to know if the error was that you didn't have permissions, or if you ran out of diskspace. hmm, although I guess you could just use the continuable to get the error type...

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@creationix

// Echo client
tcp.connect(8080, function (socket) {
  return socket;
});

THAT. THAT A MILLION TIMES. client and server just became the exact same code. This is amazing.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

The only basic type here is the stream. It's the only thing in the spec that everything else has to agree on.

I have found the difference between standardizing on promise and continuable to be about standardize the types and not the functions. With streams we can all agree what Readable and Writable is and then we can go do our own things with transform functions, monads or duplex stream objects.

from js-git.

Raynos avatar Raynos commented on September 27, 2024

sink = function that accepts a stream (and returns a continuable)

It might make sense for sink to be a function that returns an object with a consume method for purposes of structural typing. That consume method then accepts a stream and returns a continuable

from js-git.

creationix avatar creationix commented on September 27, 2024

@dominictarr

so, I can think of a few situations where the reason might be important. example: on tcp you want to know if the stream failed because there was no server, or if it dropped the connection, or it timed out.

Sorry if I didn't explain right, but of course you need the reason downstream. And you'll still have it, that's what the err argument in read's callback is for. I was talking about the err argument that would be in abort before the callback.

All of these error cases you mentioned would still be reported and come out of the continuable that the sink returns.

from js-git.

creationix avatar creationix commented on September 27, 2024

The "reason" that's not important is for a source to know why it's consumer is going to no-longer consume from it. It doesn't care why, it just needs to know so it can clean up stuff. It's downstream, the consumer, that cares why stuff is broken.

from js-git.

creationix avatar creationix commented on September 27, 2024

@Raynos

sink = function that accepts a stream (and returns a continuable)

It might make sense for sink to be a function that returns an object with a consume method for purposes of structural typing. That consume method then accepts a stream and returns a continuable

There is no reason you can't do that. I just don't want to force such a verbose construct in the spec since it's not needed or even wanted most the time.

Structural typing matters more for anonymous things that are passed around and used as return values and arguments all the time. Streams definitely fall under this category. Sinks are more like API endpoints that consume streams. I don't think they need structural typing as much. I know that fs.writeStream(stream, path, options) -> continuable is a sink because of it's API docs, it's name, and it's documented signature.

from js-git.

creationix avatar creationix commented on September 27, 2024

So usage would be:

fs.writeStream(path, options).consume(stream)(callback);

vs

fs.writeStream(stream, path, options)(callback);

With the first one, I feel an urge to use a promise instead of a continuable so that it's .consume(stream).then(callback)

from js-git.

Raynos avatar Raynos commented on September 27, 2024

Aw man promises. That's going to be a hard battle.

W3C is going to be like "your api seems nice but read() should return a promise"

from js-git.

creationix avatar creationix commented on September 27, 2024

And I'll say, well if you insist on taking-no-args-and-then-returning-an-object-that-has-a-prototype-that-has-a-method-that-accepts-a-callback-and-an-errback-and-optional-progressback instead of just take-the-callback, then I guess you insist on complexity and better change the name away from "simple streams"

from js-git.

Raynos avatar Raynos commented on September 27, 2024

@creationix you are preaching to the choir.

from js-git.

Gozala avatar Gozala commented on September 27, 2024

I'm jumping to this little late (& maybe I should not do it at all). But I'll still provide my feedback based of my experience working on streams / signals / channels or whatever you wanna call them.

  1. I think in nature there is just input if you look it from the other side it will be an output. This is to say you don't need sync or duplex, you just need a data type for representing collection of eventual data chunks. Then you can write transformation functions that transforms a -> b. If you want duplex is just pair of same data types where data is pushed from left side to input end and from right side on the output end. sync is just a function that's just aware of data type's interface and there for can read data out of it and do whatever it needs to. That's also where reduce functions is somewhat relevant since it takes accumulates state by calling reducer with previous state and a next value.
  2. It took me a while but I got to understand that data types (or shapes if you prefer so) are a lot more composable. That to say that I dislike .end, .abort .close. I think stream / signal API does not needs any of these, although it can be added as a sugar if desired. If I'd done it today I'd define stream / signal as simple as this:
 var stream = {
   spawn: function(next) {
     next(1);
     next(2);
     next(END);
   }
}

Where END is whatever you desire it to be special value, type or shape does not matter as long as it's specified. If streams has an error it can pass them and good thing is JS already has Error type for this.

 var stream = {
   spawn: function(next) {
     next(1);
     next(2);
     next(Error("oops!");
   }
}
  1. Sync or I'd rather say consumer can have a direct coordination with an input without having to have a hold of it or having a methods like abort on every transformation. All it needs to is return value:
function take(n, input) {
  return {
    spawn: function(next) {
      var left = n;
      input.spawn(function(value) {
        return n === 0 ? ABORT : next(value)
      })
    }
  }
}

Of course our input will have to recognize ABORT and send back END. If input does not really recognizes some of this messages we still can wrap it a normalizer to force it to comply, or to the very prevent it's brokenness to infect
rest of the pipeline.

  1. Of course we can't talk about streams without back-pressure, but you may have noticed that 3. actually illustrates
    I/O coordination and back-pressure is just a different form of it. To be more specific consumer can return back data
    indicating that backpressure should be applied, then respectful source will do that. And if you happen to deal with streams that don't really respect backpressure still not a big deal, cause you can write buffer(input) that will respect
    backpressure and will buffer up input for consumer. I have explored this technique in fs-reduce library where all of the streams respect backpressure, but if you happen to face stream (like array) that does not it will be just buffered up until pressure is released.
  2. I don't think there is any winner in pull VS push type streams, and if there was a one it'll be push since in nature we have events we can't actually pause (users clicking their mice for example). But hey that's not a big deal to since
    that's just another flavor of I/O coordination all you need to do is make pull(stream) which will give you pull based API (maybe that'll match min-stream proposal) and all it will have to do is:
function pull(stream) {
  var buffer = [];
  var reads = [];
  var resume = function() { stream.spawn(accumulate); }

  function drain() {
    while (reads.length && buffer.length) reads.shift()(buffer.shift())
    return buffer.length ? PAUSE : null;
  }

  function accumulate(value) {
    // save value and pause stream
    buffer.push(value);
    return drain(buffer, reads);
  }

  // stream will give provide function to resume it.
  function PAUSE(go) { resume = go }

  return {
    read: function(callback) {
      reads.push(callback);
      drain();
    }
  }
}

This is queue like (imperfect) API it should give an idea how different kind of ones can be easily created in form of simple functions that compose.

That sums up my opinion on how streams should work based experience of building them for at least 4 times over past few years. I hope this will be helpful and not totally boring and insane.

from js-git.

mhart avatar mhart commented on September 27, 2024

I really like the abort method - it's something that still puzzles me about streams as they stand in v0.10.x - hence my as-yet-unanswered question on the node.js group.

The only other question I'd pose (not sure if it has been already) is whether you should include anything about ES6 iterators (and generators) in the spec - I notice they're not mentioned at all and figure they should at least be referred to, even if it's to say that there's no goal to make them compatible, or whatever.

from js-git.

creationix avatar creationix commented on September 27, 2024

@Gozala I was wondering when/if you would comment on this thread.

Yes, I agree that the only interface that needs to be specified is the readable stream.

As far as using special tokens for END, ABORT, and Error classes, I'd rather not. instanceof Error doesn't work if the error is from another context. There is no Error.isError helper function though Object.prototype.toString.call(err) === "[object Error]" seems to be reliable. I'd hate to force such a verbose type check on each and every data chunk that goes through the stream. Having two positional arguments tells us a lot that speeds up such checks.

Yes back-pressure can be done with a manual side-channel and pause and resume commands, but I much prefer the implicit backpressure provided by pull style. In my experience I'm much more likely to get it right if I'm using pull-streams than writing the back-pressure by hand using manual pause and resume.

Yes there are general helpers that convert between types. I am publishing a module right now called push-to-pull that lets you write the easier push filters, but use them as back-pressure honoring pull-filters without writing your own queues. A reduce transform could easily be written as could a filter transform.

Thanks for the input.

from js-git.

creationix avatar creationix commented on September 27, 2024

@mhart I'm glad you like abort. You can thank @dominictarr for convincing me to add that to the official spec. I really didn't want to.

I've also considered what a stream would look like if we had access to ES6 generators. I think the simplest construct would be a generator that yielded values.

function* source() {
  yield 1
  yield 2
  yield 3
}

// Consume like any other generator to get [1, 2, 3]

But like most I/O streams, you can't yield everything at once, so the generator could yield continuables instead of raw values.

By happy coincidence, simple-stream's read function is itself a continuable. So turning a simple-stream into a generator based stream is as simple as:

function* () {
  // Create a simple-stream
  var stream = fs.readStream("myfile.txt");
  // and yield it forever
  while (true) yield stream.read
}

In fact my gen-run library does something very much like this, but as a control-flow helper library.

run(function* () {
  var stream = fs.readStream("myfile.txt");
  var data;
  var items = [];
  while (data = yield stream.read) {
    items.push(data);
  }
  return items;
});

I don't want to require generators for streams since it will be a long time before most js environment can assume ES6 generators. I am, however, very aware of how they will interact and keep these things in mind.

from js-git.

Gozala avatar Gozala commented on September 27, 2024

Regards

Irakli Gozalishvili
Web: http://www.jeditoolkit.com/

On Tuesday, 2013-07-02 at 19:36 , Tim Caswell wrote:

@Gozala (https://github.com/Gozala) I was wondering when/if you would comment on this thread.
Yes, I agree that the only interface that needs to be specified is the readable stream.
As far as using special tokens for END, ABORT, and Error classes, I'd rather not. instanceof Error doesn't work if the error is from another context. There is no Error.isError helper function though Object.prototype.toString.call(err) === "[object Error]" seems to be reliable. I'd hate to force such a verbose type check on each and every data chunk that goes through the stream. Having two positional arguments tells us a lot that speeds up such checks.

Actually you don't have to handle them at all it's matter of just having some sort of transform operation that passes meta values between input and output. For example you could have something like this folds:

https://github.com/Gozala/signalize/blob/master/core.js#L146-L172

Then all the filter map drop etc.. can be easily implemented without concerning self with either error checking or special value handling:
https://github.com/Gozala/signalize/blob/master/core.js#L192-L284

Yes back-pressure can be done with a manual side-channel and pause and resume commands, but I much prefer the implicit backpressure provided by pull style. In my experience I'm much more likely to get it right if I'm using pull-streams than writing the back-pressure by hand using manual pause and resume.

Main issue with pull is it's inherently solver and you can't apply category theory to optimise transformation pipeline. And of course you can not represent streams that can't be paused or stopped like user events. That is why I prefer to decouple notion of stream from consumption semantics since pull is one of the ways there are more and from case to case you may want different ones. I have started writing some spec for push & pull signals that in best case perform as push and in worst case de-optimize to plain pull & of course any other case in between:
https://gist.github.com/Gozala/5314269

It's slightly out of date though

Yes there are general helpers that convert between types. I am publishing a module right now called push-to-pull that lets you write the easier push filters, but use them as back-pressure honoring pull-filters without writing your own queues. A reduce transform could easily be written as could a filter transform.
Thanks for the input.


Reply to this email directly or view it on GitHub (#17 (comment)).

from js-git.

dominictarr avatar dominictarr commented on September 27, 2024

There isn't gonna be one right answer to this stream thing. not with the languages we have today. Maybe in some future scifi language, but today, the best we can do hope to fit some fairly broad but non-exhaustive set of use-cases.

Anyway, it's not that hard to write custom stream stuff that there needs to be One True Stream. You can always convert from one to the other, and pick the stream best suits the way you think and the sort of programming you do.

from js-git.

creationix avatar creationix commented on September 27, 2024

@gozalla, I'm having trouble understanding you. I do like the idea of the main data channel being only data and letting everything else go through a meta channel.

I'm pretty sure I want pull based for several reasons. Besides the natural back-pressure it provides, it also provides a nice 1:1 mapping between continuations chains since each callback will be called only once for each read call. This makes tracing and error handling much easier. I'm currently working on improving domains in node.js and wish that everything in node had this nice 1:1 mapping. It makes for a very simple and robust system when every stack has a direct and obvious parent stack that initiated it.

I know that you can't pause some inputs easily (like user clicks or http requests), but that doesn't mean pull-streams are a bad idea. You just buffer the events at the source waiting for someone to pull them. Even those cases can usually be paused somewhat in extreme cases (you could disable the UI button if the stream wasn't ready to handle it or tell the TCP socket to stop accepting connections)

Already have two data channels in the form of two arguments to onRead callbacks (err, item). When item is anything other than undefined, then it's a data item. Otherwise, it's a meta value that signifies natural end or error end. The channel for closing an upstream source goes the other direction and so can't be encoded here.

Do you have any ideas that are modifications to the current design that could simplify this?

from js-git.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.