Comments (27)
Reading over my code there could be a race condition and 2 webrtcservers get created at startup. In this situation only 1 would be used for creating the all the transports.
Do you literally mean that you are creating WebRtcServers in different Workers and assigning them the same listening ports?
This is not possible. worker.createWebRtcServer()
will throw if given UDP or TCP port is already in use by other process:
mediasoup:WARN:Channel request failed [method:WORKER_CREATE_WEBRTCSERVER, id:1]: uv_listen() failed [transport:tcp, ip:'0.0.0.0', port:44444]: address already in use [method:worker.createWebRtcServer]
mediasoup:WARN:Channel request failed [method:WORKER_CREATE_WEBRTCSERVER, id:1]: uv_udp_bind() failed [transport:udp, ip:'0.0.0.0', port:44444]: address already in use [method:worker.createWebRtcServer]
from mediasoup.
We have 1 worker with 1 webrtcserver with 4 ports 40001+40002 for internal udp+tcp, 40003+40004 for public announced ips. Nothing is shared.
I found this out yesterday. We were running a lot (95%) of traffic through TCP b/c of a config error. Since fixing the config error we see fewer errors.
from mediasoup.
@GEverding do you know which devices connect to your server? They obviously implement ICE renomination which is the reason of the crash (see PR) and AFAIK Chrome and libwebrtc doesn't support it. Well it does but I'm not aware that it's enabled by default.
from mediasoup.
@GEverding I'm merging PR #1182 since there was an obvious bug in the code and that PR fixes the vulnerability in the code. 3.12.16 released. But still I'd appreciate if you could validate whether it fixes your scenario. Thanks a lot.
from mediasoup.
2023-10-15T20:43:50.557Z mediasoup:ERROR:PayloadChannel Producer PayloadChannel error: Error: read ECONNRESET
2023-10-15T20:43:50.558Z mediasoup:ERROR:Channel Producer Channel error: Error: read ECONNRESET
2023-10-15T20:43:50.559Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:20, code:null, signal:SIGABRT]
Also seen it with this error as well.
from mediasoup.
@GEverding, there is basically no description about the issue. Please elaborate the scenario in which the problem happens, and wether it reproduces always, etc.
from mediasoup.
My apologies I was dealing with another issue. I'm seeing this in normal use of mediasoup in production. It's a fairly rare issues. My guess is it happens when a client disconnects half way through setup because I see this more on startup of a new instance. I know you really need core dumps for this and I'm working on that.
from mediasoup.
The error refers to the UnixSocket between Node and C++ worker. It's not related to client connections.
from mediasoup.
Do you want a core dump to debug this or do you need something else. I did see this on v20 but I'm seeing it more on v21-nightly (there's a node ssl crash in v20-v18)
from mediasoup.
Coredump please. If you figure out reliable steps to reproduce it would be great but I think that will be hard.
from mediasoup.
Still working on the core dump. but saw this in the log and might be the issue:
2023-10-16T19:42:22.374Z mediasoup:ERROR:Worker (stderr) (ABORT) RTC::IceServer::HandleTuple() | failed assertion `this->tuples.empty()': state is 'disconnected' but there are 1 tuples
from mediasoup.
Interesting. I will check tomorrow. If you can get the core dump it would be better but maybe I'm able to figure out the issue without it. Thanks.
from mediasoup.
(gdb) bt full
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {0, 167607312, 0, 5, 0, 281470681743361, 281473119789568, 281473931202416, 281473119637420, 187650778675576, 281473114525712, 281473114527504, 187651764764656,
281473931202384, 281473115126340, 1}}
pid = <optimized out>
tid = <optimized out>
ret = <optimized out>
#1 0x0000ffff910a7aa0 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {281473116505920, 2676586395008836901, 2676586395008836901, 2334397743342383717,
7813874495109036581, 684901, 2317476195752624424, 0, 0, 18446744073692774400, 0, 0, 1, 4616194021469978624, 0, 4616194021471028225}}, sa_flags = 1074791425,
sa_restorer = 0xaaaada013978 <[email protected]>}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x0000aaaad9a4b00c in RTC::IceServer::HandleTuple(RTC::TransportTuple*, bool, bool, unsigned int) ()
No symbol table info available.
#3 0x0000aaaad9af7b5c in non-virtual thunk to RTC::WebRtcServer::OnTcpConnectionPacketReceived(RTC::TcpConnection*, unsigned char const*, unsigned long) ()
No symbol table info available.
#4 0x0000aaaad9ad3948 in RTC::TcpConnection::UserOnTcpConnectionRead() ()
No symbol table info available.
#5 0x0000aaaad9d91794 in uv.read ()
No symbol table info available.
#6 0x0000aaaad9d920f4 in uv.stream_io ()
No symbol table info available.
#7 0x0000aaaad9d98d54 in uv.io_poll ()
No symbol table info available.
#8 0x0000aaaad9d8c734 in uv_run ()
No symbol table info available.
#9 0x0000aaaad99d8784 in DepLibUV::RunLoop() ()
No symbol table info available.
#10 0x0000aaaad99e2060 in Worker::Worker(Channel::ChannelSocket*, PayloadChannel::PayloadChannelSocket*) ()
No symbol table info available.
#11 0x0000aaaad99d72a0 in mediasoup_worker_run ()
No symbol table info available.
#12 0x0000aaaad99d5ea4 in main ()
No symbol table info available.
Is this enough of a coredump? I have not been able to repo in testing.
from mediasoup.
Yes, it's enough. Thanks. Will work on it finally tomorrow. BTW do you have some special scenario such as clients connecting to mediasoup through TCP instead of UDP? Are you using separate WebRtcServer to make all WebRtcTransports listen in same port(s) or not?
from mediasoup.
Yes we have some clients using TCP b/c of network restrictions. We run UDP and TCP on different ports.
We have 1 worker and it gets a WebRTCServer. Reading over my code there could be a race condition and 2 webrtcservers get created at startup. In this situation only 1 would be used for creating the all the transports.
from mediasoup.
Reading over my code there could be a race condition and 2 webrtcservers get created at startup. In this situation only 1 would be used for creating the all the transports.
Do you literally mean that you are creating WebRtcServers in different Workers and assigning them the same listening ports?
from mediasoup.
So the thing is that iceServer->HandleTuple()
private method was called while in 'disconnected' mode.
- ' IceServer' entered in 'disconnected' mode before because
iceServer->RemoveTuple()
was called and it removed the last/only tuple, soIceServer
changed its state to "disconnected". That's the only way to switch to 'disconnected' state. - But somehow
iceServer->HandleTuple()
was called later and it added a tuple, but it didn't leave the 'disconnected' state. It doesn't make sense. - So later, while in 'disconnected' state and with a tuple in the list, somehow
iceServer->HandleTuple()
was called again and and here the assertion happens ("state is 'disconnected' but there are 1 tuples").
from mediasoup.
So the bug is here, in HandleTuple()
method (for both 'NEW' and 'DISCONNECTED' states (maybe more):
void IceServer::HandleTuple(
RTC::TransportTuple* tuple, bool hasUseCandidate, bool hasNomination, uint32_t nomination)
{
MS_TRACE();
switch (this->state)
{
case IceState::NEW:
{
// There should be no tuples.
MS_ASSERT(
this->tuples.empty(), "state is 'new' but there are %zu tuples", this->tuples.size());
// There shouldn't be a selected tuple.
MS_ASSERT(!this->selectedTuple, "state is 'new' but there is selected tuple");
if (!hasUseCandidate && !hasNomination)
{
MS_DEBUG_TAG(
ice,
"transition from state 'new' to 'connected' [hasUseCandidate:%s, hasNomination:%s, nomination:%" PRIu32
"]",
hasUseCandidate ? "true" : "false",
hasNomination ? "true" : "false",
nomination);
// Store the tuple.
auto* storedTuple = AddTuple(tuple);
// Mark it as selected tuple.
SetSelectedTuple(storedTuple);
// Update state.
this->state = IceState::CONNECTED;
// Notify the listener.
this->listener->OnIceServerConnected(this);
}
else
{
// Store the tuple.
auto* storedTuple = AddTuple(tuple);
if ((hasNomination && nomination > this->remoteNomination) || !hasNomination)
{
MS_DEBUG_TAG(
ice,
"transition from state 'new' to 'completed' [hasUseCandidate:%s, hasNomination:%s, nomination:%" PRIu32
"]",
hasUseCandidate ? "true" : "false",
hasNomination ? "true" : "false",
nomination);
// Mark it as selected tuple.
SetSelectedTuple(storedTuple);
// Update state.
this->state = IceState::COMPLETED;
// Update nomination.
if (hasNomination && nomination > this->remoteNomination)
this->remoteNomination = nomination;
// Notify the listener.
this->listener->OnIceServerCompleted(this);
}
}
break;
}
Notice that there are cases in which we store the tuple but we do NOT change the state
, so there we end with tuples in the list and still in 'DISCONNECTED' state:
from mediasoup.
So basically after this PR #756 we made it possible for IceServer
to have tuples in the list while none of them is selected and state remains 'new' or 'disconnected'.
Hi @penguinol, problems here.
from mediasoup.
PR done here: #1182
Will merge and release new version tomorrow after getting some eyes on it.
from mediasoup.
We have Chrome (electron), iOS and Android all connecting to our servers
from mediasoup.
Well, we only use udp, so we have never met this crash. I'll check the PR.
from mediasoup.
@GEverding it should be fixed in #1182, but I'm extremely busy working on N things at the same time. Is it possible for you to try the branch in that PR in your setup with TCP enabled and see if it works as expected? I've done all tests I could but I've not been able to reproduce the bug using v3 branch.
from mediasoup.
Yeah I can build it and put it out in production. I'll know by tmrw likely if I don't see any crashes.
from mediasoup.
Thanks a lot. I'm 95% sure it will work fine.
from mediasoup.
Hey its been out for 48hrs now and its looking good 👍
from mediasoup.
Great!
from mediasoup.
Related Issues (20)
- Not able to run, getting error as Unexpected token '??=' in Router.js HOT 1
- RTCP parsing needs a refactor
- RTCP CompoundPacket: ensure there are no more than 1 extended report of the same type in the XR packet HOT 2
- Avoid that mediasoup-worker-prebuild CI task runs for every PR HOT 19
- Inconsistency between fbs and worker HOT 3
- Add stream NTP timestamp access to the Node API HOT 4
- npm install fails with compilation error HOT 13
- flatc warning: You should add the boolean check kwarg to the run_command call HOT 1
- bug d entrer sur npm alors qu'il est installer nvm, npm et nodejs aussi
- Worker panic in Rust in WebRtcMessage::new HOT 28
- Rust docs for 0.13 failed to build HOT 4
- Validation functions at `ortc.ts` file are modifying input data HOT 10
- io-uring not enabled in Linux mediasoup-worker prebuilt since it uses kernel 5.15 HOT 10
- Need kernel 6 capable Linux CI host to produce worker prebuilt with io-uring support HOT 1
- Windows CI jobs failing due to meson HOT 4
- [Rust] Unable to build mediasoup > 0.12 with Docker HOT 3
- Issues in Node tests HOT 3
- Doubt with test HOT 6
- Worker hits ASSERT condition in production HOT 14
- About v3.13.15, io_uring, and liburing HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mediasoup.