Coder Social home page Coder Social logo

Comments (11)

Ralith avatar Ralith commented on June 27, 2024 1

Thanks for the report! This indicates that Connection::space_can_send determined that non-ACK frames could be sent, but Connection::populate_packet failed to write any non-ACK frames. In other words, we predicted that packet assembly would send some data, but it didn't. Likely some logic in can_send_1rtt is inconsistent with the corresponding check in populate_packet or the helpers it calls.

Are you transmitting large application datagrams on the endpoint where this panic occurs? If so, #1837 is a likely culprit. In particular I wonder if 003b1d5 or ad4919b might be responsible. I can think of at least one likely broken pattern:

  1. A near-maximum-size application datagram is queued
  2. space_can_send determines that the datagram may be sent, which implies that an ACK-only packet may not be sent.
  3. populate_packet writes an ACK frame to the next outgoing packet
  4. populate_packet determines that insufficient space remains to include the queued application datagram
  5. The completed packet remains ACK-only, triggering the debug_assert.

In short, I think the actual behavior here is correct, and we should arrange for the assert not to fire, either by relaxing the assert somehow or by allowing can_send.acks to be set according to PendingAcks::can_send on the packet space in question even when application data is queued. The application datagram will be sent in the following packet regardless, where it is guaranteed to fit because there are no longer any new ACKs to send.

I'll try to draft a fix for at least that case this week, though more info is needed from you to provide confidence that this is specifically what you're hitting. Do you have a consistent repro?

from quinn.

djc avatar djc commented on June 27, 2024 1

I confirm this problem disappeared with the git HEAD. Thank you.

Note you can just upgrade your dependencies with cargo update this should be fixed, since we released quinn-proto 0.11.1 last night.

from quinn.

Ralith avatar Ralith commented on June 27, 2024 1

Concerning the current solution, if I understand correctly, then applications are better off sending smaller datagrams than max_datagram_size() because that way they can achieve higher performance by not sending small, non-piggybacked ACK packets. Am I right?

The intention is that these voluntary ACKs are sent infrequently enough for this not to be a significant factor. If you observe it imposing meaningful costs in practice, we have a lot of room to dial the rate further, too.

If so, wouldn't it make sense to have a next_datagram_size() that takes into account control-like quic frames in the next packet.

I don't think so. Most packets do not include extra control frames, and packet scheduling and assembly is only loosely bound to application call patterns. The only cases where a call like this would give you accurate information is when you're only sending one packet at a time, without any pipelining or queuing, which is exactly when overhead matters the least. Otherwise, it's difficult to be confident that the datagrams you queue will, in fact, be in the next packet scent.

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Real-time streaming media is a complex case. Using a stream gives the QUIC implementation much more flexibility in packet assembly, but is subject to head-of-line blocking, which I presume you wish to avoid.

A good middle ground might be to use many independent streams and take care that your receiver is willing to read them concurrently/out-of-order, where each stream is a data unit that's not useful unless received in full. You could even reset such streams when they become too old, though that may not be worth the effort.

from quinn.

Ralith avatar Ralith commented on June 27, 2024

On second thought, I suspect the issue I described above predates #1837, though if that's not what you're experiencing then it's still a plausible culprit.

from quinn.

Ralith avatar Ralith commented on June 27, 2024

Drafted some related cleanup in #1851. Testing indicates that the hypothesis I presented above is incorrect; we already handled that case properly, so I remain unsure what the root cause might be. I don't expect that PR will fix your issue, though it makes some of the code involved easier to follow. Let us know if you have a repro, and I'll look around a little more for other possible causes.

from quinn.

nemethf avatar nemethf commented on June 27, 2024

Are you transmitting large application datagrams on the endpoint where this panic occurs?

Yes, I am. (Independently of this issue, it seems send_datagram puts the datagram into a queue. What happens when max_datagram_size decreases before the datagram is sent out from the queue?)

If so, #1837 is a likely culprit.

We usually run our tests with all the offloading features disabled with ethtool otherwise wireshark is not able to dissect QUIC packets. Now I've run a video transmission test with offloading enabled, and it seems there is a regression, but this might be the result of some modifications to our local application. I'll create a separate issue after a detailed investigation. But I think this bug is probably not GSO related.

Unfortunately, #1851 does not fix the issue. The attached patch deterministically triggers the bug for me. If the server wants to send just 10 datagrams instead of the current 100, then the panic does not occur.

0001-Modify-example-to-demostrate-bug-1850.patch.txt

(I think it would be helpful if the repository contained a datagram example. It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives. I'm happy to contribute, but I cannot come up with a good example. Maybe a simple file transfer with datagrams?)

from quinn.

Ralith avatar Ralith commented on June 27, 2024

Thanks for the repro! I see the same behavior. I'll investigate.

As a temporary workaround, it looks like sending smaller datagrams avoids the assert, probably by leaving enough room for whatever other frames are competing for space in the packet. 1KiB would be a very safe threshold.

It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives.

Did you see Connection::closed? Typically you'd just wait for your actual workload (in this case the task calling send_datagram) to complete. It's not obvious to me how this differs from stream-oriented patterns.

from quinn.

Ralith avatar Ralith commented on June 27, 2024

This will be fixed by #1854.

from quinn.

nemethf avatar nemethf commented on June 27, 2024

This will be fixed by #1854.

I confirm this problem disappeared with the git HEAD. Thank you.

It took me a long time to figure out a way to keep the connection alive until an ApplicationClosed arrives.

Did you see Connection::closed?

I don't know what I was thinking. I didn't checked the documentation :( It is clear and easy to understand.


Concerning the current solution, if I understand correctly, then applications are better off sending smaller datagrams than max_datagram_size() because that way they can achieve higher performance by not sending small, non-piggybacked ACK packets. Am I right? If so, wouldn't it make sense to have a next_datagram_size() that takes into account control-like quic frames in the next packet. (We segment a byte-stream into datagrams, which might not be a good idea after all.)

from quinn.

djc avatar djc commented on June 27, 2024

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Right, that works against the grain of the protocol. What were you trying to achieve with using datagrams in this design?

from quinn.

nemethf avatar nemethf commented on June 27, 2024

(We segment a byte-stream into datagrams, which might not be a good idea after all.)

Right, that works against the grain of the protocol. What were you trying to achieve with using datagrams in this design?

We try to optimize media transmission in different network environments. Our baseline is RTP/UDP. It would be good to see how QUIC performs in comparison to the baseline when both use unreliable transmission. We also measure when the media is transmitted in QUIC streams.

from quinn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.