Coder Social home page Coder Social logo

Comments (15)

kelunik avatar kelunik commented on June 8, 2024

The actual issue is the early return in case the stream isn't readable anymore. Currently we assume a stream is closed once it's closed the first time.

It just worked before, because the first read was deferred any there was always data to read after that.

return new Success; // Resolve with null on closed stream.

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

Well there are two issues. One is the one you are talking about (which should just mean read will be always empty), second one is that even if stream is closed, event loop should return context to pipe coroutine. I don't know why it doesn't so execution never goes out of this one while loop.

from byte-stream.

kelunik avatar kelunik commented on June 8, 2024

It doesn't, because of your while (true) loop in combination with Success, which is an immediately resolved promise.

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

Indeed. Still two problems though. But yes, both of them would be solved by simply removing the assumption that there are not going to be more writes to resource after it's eof.

I noticed you have added a test where you solved this issue by adding Delay. That's not ideal. Now try to modify test case also like following, which should reflect my real use case more. I cannot assume same amount of read/writes will be called during piping and while loop. I might not even know the size of source since it might not be a file.

--- test/ResourceStreamTest.php	(revision aa2ebfe17be166e197375020483fb78f9d497fa6)
+++ test/ResourceStreamTest.php	(date 1545781598000)
@@ -260,7 +260,7 @@
 
             \Amp\ByteStream\pipe(
                 new ResourceInputStream(\fopen(__FILE__, 'rb')),
-                new ResourceOutputStream(\fopen($middle, 'wb'))
+                new ResourceOutputStream(\fopen($middle, 'wb'), 1)
             );
 
             $middleReadStream = new ResourceInputStream(\fopen($middle, 'rb'));

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

ping @kelunik

from byte-stream.

kelunik avatar kelunik commented on June 8, 2024

@ostrolucky We can't simply remove that assumption, it's how we defined InputStream to work. I'd advise you to work with another API or write your own stream, that never returns null in the first place.

Not sure whether you know it, but file resources (e.g. opened with fopen) will always do blocking reads, never non-blocking reads, because async I/O isn't supported for file handles.

I'm fine with a scheduling change that sometimes does a Loop::delay(0) / Loop::defer to make the scheduling fairer.

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

file resources (e.g. opened with fopen) will always do blocking reads

Really? http://php.net/manual/en/function.stream-set-blocking.php says

This function works for any stream that supports non-blocking mode (currently, regular files and socket streams).

Anyway what I intend to do when solving this issue is to get rid of amphp/file and just use byte-stream, since according my testing it results in considerate boost of read/write speed

from byte-stream.

kelunik avatar kelunik commented on June 8, 2024

@ostrolucky Yes, really. While the read might return EAGAIN then instead of blocking, the actual problem is that polling doesn't work, it always returns immediately, see https://www.remlab.net/op/nonblock.shtml

Have you tried installing ext-eio or ext-uv?

from byte-stream.

bwoebi avatar bwoebi commented on June 8, 2024

@ostrolucky If you want raw speed, then yes, don't use amphp/file. If you want low overall latency, then by all means go with amphp/file.
amphp/file is slower by design in that it defers work to another thread/process. But while the other thread/process does work on executing your disk I/O, your current process can still do other work.

amphp/file is built to scale, not to be extremely fast, basically.

from byte-stream.

kelunik avatar kelunik commented on June 8, 2024

@trowski What's your thought on the scheduling?

from byte-stream.

trowski avatar trowski commented on June 8, 2024

I think the scheduling should remain as-is. There's a few things wrong here that will cause issues no matter what kind of scheduling we do.

The code originally posted is performing reads with no condition. In synchronous PHP code, it's effectively doing this:

$file = fopen('middle', 'r');
while (true) {
    echo fread($file, 8192);
}

while (true) with no escape condition should result in an infinite loop. Adding delays/defers just hides the problem IMO.

ResourceInputStream and ResourceOutputStream were never meant for files (we should probably put this as a warning box on the front page of the docs). The correct way to do this is with amphp/file. Install ext-eio or ext-uv if you're concerned about performance. I would also advise against using pipe if you want to do something else with the data read from the input stream.

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

ext-eio or ext-uv

  • I'm building simple tool, not a server running 24/7. I don't want to make consumers of my tool install non standard php extension. Making it work with good performance should be as simple as doing composer require and executing it. And don't get me even started on cross-platform compatibility issues
  • I did try them, they have various bugs making them unusable for me and since I don't really want to use them anyway because of first point, I don't want to make investment in creating detailed bug reports

@bwoebi That's exactly it. I don't need to scale much and hence am interested more in raw speed. I'm fine to block the script for few ms while fread is being called, since there is usually only 2 tasks running concurrently anyway.

We can't simply remove that assumption, it's how we defined InputStream to work

I don't understand this justification. Wouldn't InputStream continue to work for all current uses cases when this assumption is removed? It's obviously wrong assumption as I demonstrated. Resource being eof does not mean it will be eof indefinitely.

All I need is to have a way to skip execution of this line

return new Success; // Resolve with null on closed stream.
If you don't want to make that default, can you at least provide extension point so I can disable its execution?

from byte-stream.

trowski avatar trowski commented on June 8, 2024

InputStream returns null from read() to indicate EOF. This is no longer something can or should be changed. I and many others have written libraries and apps using this API without issue. ResourceInputStream was designed to work with pipes and sockets, not files. If your script was allowed to continue to the fread() and subsequent watcher activation, it would result in 100% CPU as the readable watcher would immediately report the stream as readable (again, because loop watchers do not work properly for file streams). If you were using anything other than NativeDriver for a loop, I believe even creating the watcher would fail.

As I said before, you shouldn't use pipe() if you want to do something with the data read from the source. Against my better judgement, here's some code showing what I meant using file streams with Resource*Stream objects:

Loop::run(function () {
    $source = new ResourceInputStream(\fopen('source/file/path', 'rb'));
    $destination = new ResourceOutputStream(\fopen('destination/file/path', 'wb'));

    while (($chunk = yield $source->read()) !== null) {
        echo $chunk; // Or whatever else you wish to do with $chunk.
        yield $destination->write($chunk);
    }
});

I'm certain you could have written that on your own, so I'm confused at your refusal to do so.


I would recommend a different approach to concurrency. Use amphp/parallel and Amp\Parallel\Worker\Task to download each file within a worker. You can even use amphp/file, which uses blocking operations when within a worker. If worker is too simple or opinionated of an API, you can use Amp\Parallel\Context\Process to write your own process while being provided a communication channel back to the parent. Initial file downloads/uploads will add some overhead of starting another process, but file operations within the process will not be encumbered by IPC.

from byte-stream.

ostrolucky avatar ostrolucky commented on June 8, 2024

My example code is simplest reproducer. There is a reason there are 3 resources there - I don't just echo in real use case, but write to resource 4 which doesn't exist at the time piping between resource 1 and 2 is initiated

Use amphp/parallel and Amp\Parallel\Worker\Task to download each file within a worker

Resource 1 in my case is stdin and resource 4 socket. I believe this makes this not offloadable to different thread/process

You can even use amphp/file

I do, it's way slower. That's why I want to use ResourceInputStream instead and submitted this issue. Would be much more performant in my use case to get rid of amp/parallel overhead and just not use 100% non-blocking code.

Feel free to experiment yourself with https://github.com/ostrolucky/stdinho and commands head -c 1G </dev/urandom | bin/stdinho 127.0.0.1:1337 -v and curl 127.0.0.1:1337 > /dev/null. Then try to replace amphp/file with ResourceInputStream inside \Ostrolucky\Stdinho\Responder class.

it would result in 100% CPU as the readable watcher would immediately report the stream as readable

It's nearly 100% even with amphp/file

I don't understand why is this logic necessary there, each consumer has its own logic to determine when reading is finished (most often with while chunk is not null) and after that it doesn't attempt to call read() at all. ResourceInputStream is purposely ignoring my requests to read the resource for God knows what reason. It is consumers responsibility to stop calling read() after file is EOF. If nothing else, this makes it confusing API. Method shouldn't be named read() if it doesn't even attempt to read the resource.

from byte-stream.

kelunik avatar kelunik commented on June 8, 2024

As you changed the title and as I already explained that topic: This is by design and won't change. If you don't care about blocking, I'd advise you to use fread directly over using ResourceInputStream.

We can, however, talk about the scheduling of immediate reads if a lot of them happen in a tight loop, as that might result in a blocking situation that isn't expected that way. I've opened #49 for that.

from byte-stream.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.