Comments (28)
@nipierre If you'd like to collaborate on this (e.g. vidoe chat, pair, etc) I could find some time in my schedule, I just don't have the remaining cognitive capacity to see this through to resolution on my own. If the problem is reproducible locally, it should be feasible to characterize it in an automated test.
from srt-rs.
It's drop_too_late_packets that appears to be broken. This line of code is not the problem. Calling drop_too_late_packets should only drop packets if there are late packets. Could you show me how I can reproduce this behavior locally? I can also schedule some time to review the drop_too_late_packets tests and implementation together. It could be helpful to have an extra pair of eyes scrutinizing the test cases and implementation.
from srt-rs.
@nipierre could you run your tests against PR #209?
from srt-rs.
Done
from srt-rs.
I don't have much time to look at this right now, but here's a few thoughts:
-
It looks like the statistics interval for the sender was set to 3ms?? 1s should really be sufficient, I can imagine an extremely low frequency could impact timing and performance.
-
the reference implementation receiver probably turns off late packet drop if the TLPKTDROP flag is not set by the sender during handshake. I don't think srt-rs currently sets that by default, but I could be wrong.
-
There could be a bug in the drop_too_late_packets function on ReceiveBuffer?
-
The timestamp drift recovery has not been well tested, this would affect relative time of packets in the receive buffer.
-
I'm curious if the srt-rs sender is dropping the packets first? We didn't test RTO extensively, retransmissions could be timing out? max_flow_size could be the setting to adjust. Hitting this limit on the sender will stall transmission and delay packets.
from srt-rs.
I don't have much time to look at this right now, but here's a few thoughts:
1. It looks like the statistics interval for the sender was set to 3ms?? 1s should really be sufficient, I can imagine an extremely low frequency could impact timing and performance. 2. the reference implementation receiver probably turns off late packet drop if the TLPKTDROP flag is not set by the sender during handshake. I don't think srt-rs currently sets that by default, but I could be wrong. 3. There could be a bug in the drop_too_late_packets function on ReceiveBuffer? 4. The timestamp drift recovery has not been well tested, this would affect relative time of packets in the receive buffer. 5. I'm curious if the srt-rs sender is dropping the packets first? We didn't test RTO extensively, retransmissions could be timing out? max_flow_size could be the setting to adjust. Hitting this limit on the sender will stall transmission and delay packets.
Thank you for the answer! I increased the statistics interval to 3 seconds. And now it seems like there are no issues with the packet drops. RTT-95ms, receive/send latency is 1 second. I'll keep testing with different settings and put feedback here.
from srt-rs.
@Lighty0410 could we possibly close this one out too?
from srt-rs.
@Lighty0410 could we possibly close this one out too?
I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things?
Thanks in advance!
from srt-rs.
@Lighty0410 could we possibly close this one out too?
I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things? Thanks in advance!
So you're still seeing packet loss issues, in spite of the change to the statistics interval?
from srt-rs.
@Lighty0410 could we possibly close this one out too?
I would love to close this issue. However, i didn't come up with the proper settings after a week of testing. I can write a unit test that covers the fixes required for this issue. But since i lack competence on the srt protocol itself can you elaborate on what should be unit-tested in order to clarify things? Thanks in advance!
So you're still seeing packet loss issues, in spite of the change to the statistics interval?
Yeah. That's why i crossed out the text. Tried using different settings according to the srt rfcs\documentation and no result whatsoever.
from srt-rs.
@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?
from srt-rs.
@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?
Thanks a lot ! I'm gonna test it asap.
from srt-rs.
@Lighty0410 look at the rr/rdrop branch I pushed. I'm wondering if you adjust the latency tolerance, if this would help. The value is hard coded right now, not configurable, so you'll have to work with a local build. Maybe experiment with a large tolerance like 100ms?
Pretty interesting. I've made some testing and here are the results.
They way i tested:
- Changed this function for easier debugging -
pub fn next_data(&mut self, now: Instant) -> Option<(Instant, Bytes)> {
match self.receiver.arq.pop_next_message(now) {
Ok(Some(data)) => {
self.debug(now, "output", &data);
Some(data)
}
Err(error) => {
// self.warn(now, "output", &error);
warn!(
"delay in millis: {:?}",
Duration::from_micros(error.delay.as_micros() as u64)
);
let dropped = error.too_late_packets.end - error.too_late_packets.start;
self.stats.rx_dropped_data += dropped as u64;
None
}
_ => None,
}
}
- Changed latency window to different values (5/20/50/100/200). And here are results with different values: https://gist.github.com/Lighty0410/925325a86b7f7a4b7cb5e1cf5a93c066
So as you may see. It doesn't matter what's the latency_window
value is. There's always a delay. Can you verify/deny my assumption: MessageError.delay
= latency_window
+ real delay ? If so the real delay is MessageError.delay - latency_window
. For example real_delay(3.185ms) = MessageError.delay(103.185ms) - latency_window(100ms)
. If so on average i get packets which are 1-10ms later than they should be and they probably should be tolerated. Which leads me to yet another assumption that there's probably something wrong with the tolerance calculation/handling. Because no matter which value i set the messages always get dropped. Any tips where to search next ?
from srt-rs.
Hi, bumping in here.
We are experiencing the same problem as @Lighty0410, whatever values we put as latency (from 120 ms to 2s) we end up having dropped packets even though we are on the same machine (and thus one should naïvely expect that everything should be ok).
We're probably gonna investigate on our side as it is a huge hurdle on our critical path...
from srt-rs.
I'll try to wrap my head around it first and then come back to you with, I hope, a more clear view of the problem :-)
from srt-rs.
Hi @robertream !
I think I have a quite clearer view of the problem. IMO there's two:
- Firstly is that after ~30min of stream, we see too late packets appearing for no apparent reason. The setup is one sender and one receiver on the same machine, sending text data in binary format to each other. For this problem I cannot see why it happens.
- Secondly, I saw that you implemented some options of the SRT socket, namely
too_late_packet_drop
for theReceiver
. However the repercussion on thedrop_too_late_packets
in the buffer is not implemented. That's something that I'd like to implement to bypass first point : I tested on my setup, removing this drop by commenting it out and it was working fine. It might also indicate where it's going wrong ? - Lastly, I read the SRT spec (https://haivision.github.io/srt-rfc/draft-sharabayko-srt.html#name-too-late-packet-drop) and I saw that the threshold for too late packets drop should be 1.25 times the latency (if I understand correctly). Does it corresponds to this ?
If I implement the second bullet, is it ok for you ? It will unlock us for the time being :-)
See nipierre@999af6f.
from srt-rs.
Yes, please do implement the second bullet. I added all the relevant options from the reference implementation and recall there may have been more than one that weren't wired up to actual functionality.
To fix the "too late packet drop" behavior, we should start with an isolation test for the expected threshold and packet drop behavior. I'm pretty sure there's tests that are supposed to cover this scenario, but they are likely not written appropriately.
from srt-rs.
Done the implementation in #208.
I'll try to think of an isolation test.
from srt-rs.
For my comprehension: do one really wants to drop packets in this case ?
In my tests, this is where it fails after some time (30-ish minutes), but it seems to me that it's dropping parts of the message it's supposed to buffer..
To illustrate:
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] FIRST (DATA PACKET): {DATA sn=1348615918 loc=PacketLocation(FIRST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"<?xml ve"]}
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] COUNT NOT DONE: 1
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615918 loc=PacketLocation(FIRST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"<?xml ve"]}
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] COUNT NOT DONE: 2
2023-09-17 22:25:45,093 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.604621|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615918)..SeqNumber(1348615919), delay: -00:00.118540 }
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615919 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"egere as"]}
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.604153|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615919)..SeqNumber(1348615920), delay: -00:00.118072 }
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615920 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"lectat, "]}
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,094 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,094 WARN [srt_protocol::connection] -35:45.603496|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615920)..SeqNumber(1348615921), delay: -00:00.117415 }
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615921 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"feriorem"]}
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,095 WARN [srt_protocol::connection] -35:45.603208|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615921)..SeqNumber(1348615922), delay: -00:00.117128 }
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615922 loc=PacketLocation(0x0) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=1316, start=b"spexit i"]}
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,095 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,095 WARN [srt_protocol::connection] -35:45.602953|SRT#7ABA91E3|output - MessageError { too_late_packets: SeqNumber(1348615922)..SeqNumber(1348615923), delay: -00:00.116873 }
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615923 loc=PacketLocation(LAST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=768, start=b" vacuita"]}
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] ACCUMULATE: ACCUMULATE: COUNT 0 AND NOT FIRST
2023-09-17 22:25:45,102 INFO [srt_protocol::protocol::receiver::buffer] NEXT_MESSAGE_PACKET_COUNT: None
2023-09-17 22:25:45,113 INFO [srt_protocol::protocol::receiver::buffer] FIRST: {DATA sn=1348615923 loc=PacketLocation(LAST) enc=None re=false msgno=428 ts=35:49.342232 dst=SRT#7ABA91E3 payload=[len=768, start=b" vacuita"]}
from srt-rs.
I created this repo (https://github.com/nipierre/srt-rs-testing) with a sender and a receiver.
Launch a cargo run --bin receiver -- -p 7777 -l -v
and a cargo run --bin sender -- -p 7777 -v
and wait for 30+ minutes to see packets drop.
from srt-rs.
@robertream Seems ok for me !
After merge, would it be possible to tag a version so that we can incorporate the fix ?
from srt-rs.
@Lighty0410 would you have some time for testing this too?
from srt-rs.
@robertream Seems ok for me ! After merge, would it be possible to tag a version so that we can incorporate the fix ?
@russelltg what needs to be done in order to publish this release to crates.io?
from srt-rs.
Did you end up force pushing the tag? It seems I have a Actions job to publish the crates automatically so if so it'll be the first one that got published...
from srt-rs.
Anyways, 0.4.2 is def on crates.io. If something missed that original tag we should make a 0.4.3
from srt-rs.
Oh, whoops. Do you have time to make a v0.4.3? I'm busy now.
from srt-rs.
@nipierre I can close this issue if your testing demonstrates this issue has been fixed
from srt-rs.
@robertream Sorry, missed your message, ofc you can close it :)
from srt-rs.
Related Issues (20)
- Demux srt mpegts stream through ffmpeg HOT 4
- The receiver stops receiving data unexpectedly.
- Limit send buffer size HOT 4
- duplicate the srt-live-transmit srt url syntax HOT 1
- Not compatiable with SRT < 1.3.0 (Support HSv4) HOT 17
- Multiplex server drops a client after a couple of seconds.
- Multithread connections HOT 3
- Gathering statistics on SrtListener is blocked unless all clients are dropped HOT 2
- Possibility to variate the latency HOT 2
- Use url > 2.1.0 HOT 11
- Release new version on crates.io HOT 6
- Use dependabot to track and automate dependencies update
- Implement key size mismatch HOT 7
- Handle server rejection properly
- tokio::net::lookup_host does not resolve
- Unclear debugging when buffers are too small HOT 2
- snip snip snip
- Build fail
- Transfer file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from srt-rs.