Coder Social home page Coder Social logo

Comments (9)

cgutman avatar cgutman commented on July 19, 2024

First of all this commit entirely broke nintendo switch support:
After LiStartConnection() called, streaming picture and sound appears but input immediately stuck, after a few seconds audio and video also stuck and app becomes unresponsive.

If you put a Limelog() call before and after enet_socket_wait() can you tell if it's hanging inside that function call? If it's not hanging, can you print the value of condition and the return value of enet_socket_wait() to see if it's failing or something?

It looks like poll() is enabled by default in https://github.com/cgutman/enet/blob/8d69c5abe4b699e7077395e01927bd102b3ba597/unix.c#L107

If the Switch implementation of poll() is broken, you can try to comment out #define HAS_POLL 1 and see if the problem goes away. You could also try setting #define NO_MSGAPI 1 and see if that helps things.

First, something one (audio, video, input) could stuck, after a few seconds it starts work fine, but something else stuck instead, and so on, something will always stuck until an entire app freezes. If I'll try to close connection when something already start to freeze, app will crash. These symptoms are similar to behaviour without my fix so may be it could be related.

It's suspicious that you get simultaneous audio, video, and input hangs. Audio is completely decoupled from input and video. It runs on completely separate threads with separate queues, locks, and other data structures. The only possible way I could see audio failing is if no thread can successfully call enet_host_service() to keep the ENet control connection alive (which doesn't directly break audio, but causes the host to terminate the connection). That could be the case if a thread hangs while holding enetMutex.

My suggestion would be to add calls to Limelog() in various places to determine at least what code is executing when it hangs.

from moonlight-common-c.

XITRIX avatar XITRIX commented on July 19, 2024

Thanks for your reply!

  1. No, it's not stuck in enet_socket_wait()
  2. Condition == 0, sometimes it's == 2 when I send input data, return of enet_socket_wait() is always == 0
  3. Disabling #define HAS_POLL 1 not helped, double checked it by printing some stuff if it's presented.
  4. Disabling #define NO_MSGAPI 1 also not helped
  5. I found that sometimes app not freezes entirely, sometimes only video stuck (input and sound works), but than sound could freeze and video with input start to work, or video and audio works, but input stops, but app will also freeze entirely after about 30 seconds. I think that I replaced enet_socket_wait with old solution just delays the problem, and bad wifi connection just force it.

Also I've never seen my Switch app disconnected from host because of bad connection like PC does. But if another client will connect to host, Switch will successfully disconnect it self from host.

Update:
After freeze, any input starts to log "Input queue reached maximum size"

Here is a video footage for visual representation of what I am talking about: Video

from moonlight-common-c.

cgutman avatar cgutman commented on July 19, 2024

Update:
After freeze, any input starts to log "Input queue reached maximum size"

OK, that definitely sounds like a hang inside the ENet code. The input send thread is responsible for pulling items from that queue and sending them through the ENet control stream connection to the host. Since only one thread can call into ENet at a time, we have a mutex to synchronize the threads that need to send requests on that connection.

If a thread hangs inside enet_host_service(), then enetMutex will never be released and several related threads will all hang (input send thread, control recv thread, loss stats thread, request IDR frame thread) while waiting on that mutex. Since the host doesn't receive any periodic control stream traffic from us for a while, it will terminate the connection. However, the hung threads can't exit either because they're waiting on that mutex, so you have to force terminate Moonlight to stop the stream.

To debug this, we'll need to figure out why and where it's hanging. The far easiest way would be to use a debugger like GDB and dump the thread stacks. That will immediately show the precise location of the hang. If you have a debugger but are not sure about the syntax for the commands, I might be able to help.

If there's no debugger for the Switch, you can probably start by adding some debug prints before and after these calls here:

enet_host_service(client, NULL, 0);

err = serviceEnetHost(client, &event, 0);

from moonlight-common-c.

XITRIX avatar XITRIX commented on July 19, 2024

I think the only debugger for switch is twili, but I don't know why, it works only on linux, but my main machine is on MacOS, let's try to find the problem by logging first, if it will not help, I'll try to run twili.

I've wrapped them like this:
image
image

And here are logs that app produced:
log.log
log1.log
log2.log

from moonlight-common-c.

XITRIX avatar XITRIX commented on July 19, 2024

Ok, I think I've setted up gdb, but I have no any idea how to use it.

from moonlight-common-c.

cgutman avatar cgutman commented on July 19, 2024

Once Moonlight gets stuck in that hung state, break into gdb (Ctrl+C should do it) then run thread apply all bt in the debugger prompt. Hopefully that will show all the info we need, or at least enough to narrow down the investigation.

from moonlight-common-c.

XITRIX avatar XITRIX commented on July 19, 2024

Hi, finaly I've got gdb works, here are some logs from it:

switch.log
switch2.log
switch3.log

from moonlight-common-c.

cgutman avatar cgutman commented on July 19, 2024

OK, I think I see a pattern in the deadlocks. They all seem to originate from sessionmgrAttachClient().

Thread 5 (Thread 707 (?)):
#0  0x0000002bffd4ceb4 in svcWaitProcessWideKeyAtomic ()
#1  0x0000002bffd4dea8 in condvarWaitTimeout ()
#2  0x0000002bffd5819c in sessionmgrAttachClient ()
#3  0x0000002bffd51bf0 in bsdRecvMMsg ()
#4  0x0000002bffd48a2c in recvmmsg ()
#5  0x0000002bffd48b10 in recvmsg ()
#6  0x0000002bff2e691c in enet_socket_receive ()
#7  0x0000002bff2e4e14 in enet_host_service ()
#8  0x0000002bff2d738c in sendMessageEnet ()
#9  0x0000002bff2d7fd8 in lossStatsThreadFunc ()
#10 0x0000002bff2dbb88 in ThreadProc ()
#11 0x0000002bffd571b8 in __thread_entry ()
#12 0x0000002bffd38b2c in _EntryWrap ()
#13 0x0000000000000000 in ?? ()

All of the deadlocked threads are waiting in in libnx here - https://github.com/switchbrew/libnx/blob/c5a9a909a91657a9818a3b7e18c9b91ff0cbb6e3/nx/source/sf/sessionmgr.c#L47

From glancing at the code, it looks like only a certain number of "sessions" (concurrent BSD sockets calls) are allowed in libnx at a time. Moonlight has several of these calls in flight at any given time (generally at least 2 blocking recvfrom() calls and a poll()).

According to the default BSD socket session configuration, it allocates a maximum of 3 sessions. Moonlight can easily exceed this and it may be the cause of the deadlocks. This explains why 97216e1 made the problem worse, since it is putting yet another thread into a blocking BSD sockets call (poll()) rather than the old way which waited in a non-socket-related function usleep() that didn't count as a BSD sockets session.

If I'm reading the code properly, it looks like you just need to switch from socketInitializeDefault() to manually initializing sockets with addtional BSD sessions like:

SocketInitConfig cfg = *(socketGetDefaultInitConfig());
cfg.num_bsd_sessions = 8;
socketInitialize(&cfg);

EDIT: It looks like there may also be a bug in the wakeup logic of sessionmgrAttachClient() and sessionmgrDetachClient() that may be the root cause of the deadlocks when we reach the 3 session limit. I filed switchbrew/libnx#556

from moonlight-common-c.

XITRIX avatar XITRIX commented on July 19, 2024

Thanks a lot! While waiting libnx to release a patched version, your workaround with num_bsd_sessions = 8 works like a charm! You are the best!

from moonlight-common-c.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.