Coder Social home page Coder Social logo

Comments (65)

puddly avatar puddly commented on August 17, 2024 1

You can:

@combine_concurrent_calls
async def _discover_route(self, nwk: t.NWK) -> None:
"""
Instructs the coordinator to re-discover routes to the provided NWK.
Runs concurrently and at most once per NWK, even if called multiple times.
"""
# Route discovery with Z-Stack 1.2 and Z-Stack 3.0.2 on the CC2531 doesn't
# appear to work very well (Z2M#2901)
if self._znp.version < 3.30:
return
await self._znp.request(
c.ZDO.ExtRouteDisc.Req(
Dst=nwk,
Options=c.zdo.RouteDiscoveryOptions.UNICAST,
Radius=30,
),
)
await asyncio.sleep(0.1 * 13)

Remove the @combine_concurrent_calls decorator to make it send one every time you call the function.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024 1

Not if you disable it. The loggers intentionally use string formatting syntax to avoid unnecessary work if the log level is disabled.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024 1

Yeah. It fully reloads ZHA when adjusting configuration to be safe but we can probably make it less intrusive, eventually.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024 1

Almost nothing in asyncio is threadsafe so I wouldn't rely on it.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024 1

I ordered a CC1352P7. It seems to be pretty much the same as the P2 except that it has more memory.

On a side note: I yesterday tried and succeeded with creating a backup from ZNP and restoring it on my Conbee II. This is rediculously genius! (Reverted back because ZNP seemed to work much more reliably)

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024 1

Got the board, made a P7 firmware with some changes from koenkk's. Is up and running :-) for now..

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024 1

@puddly is zigpy-znp purposely suprressing route discovery mechanisms?

tx_options = c.af.TransmitOptions.SUPPRESS_ROUTE_DISC_NETWORK

If my quick search was correct the other libraries seem to not use a similar flag?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024 1

If my quick search was correct the other libraries seem to not use a similar flag?

If I remember correctly, it was used by other libraries in the past, though incorrectly named: https://github.com/Koenkk/zigbee-herdsman/search?q=DISCV_ROUTE

MTORR are broadcast periodically by the coordinator (check with a Zigbee sniffer), in addition to being explicitly requested by zigpy-znp when a device is unreachable. I believe the original reasoning was to reduce unnecessary runtime network traffic.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Could you email me (or upload here) a complete log of it crashing? I've been running that firmware for a few days and have experienced no problems with it. zigpy-znp should recover even from complete coordinator lock-ups (i.e. where the serial port is still alive) and try to reconnect.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

The serial port is not alive anymore when this happens. The only thing that helps is power cycling the coordinator. Do you need debug turned on? Or should warnings be enough?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

I see. So the lockup is detected, but the coordinator itself will never reset and zigpy-znp endlessly tries to reconnect? Are you using a TCP coordinator or is this a USB one?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

It's a Cc1352P2 launchpad connected via USB.

Yes if I remember correctly it seemed like zigpy recognized the connection issue and tried to reconnect but everything seemed to result in TimeoutError s (from the top of my head)

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Turned on debug logging

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Thanks. I feel like this is still a firmware problem: reading too slowly from a serial port shouldn't cause an unrecoverable crash, even if the other side of the serial connection is doing something to cause it.

That being said, moving pyserial-asyncio to another thread like you did may be necessary. We already do this with bellows (https://github.com/zigpy/bellows/blob/3c3ee0296d35eb43d0493eb9b2160bc4484e892c/bellows/uart.py#L386).

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I'm with you. No matter what the controller should stay up IMHO. It just got me thinking that z2m users seem to not have the same issues.

I am having a good experience with the dedicated thread. But I still get the seemingly random lock ups (eg at night).

Thanks for the link to bellows, maybe I can get some ideas for my znp PoC.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

@dumpfheimer Can you replicate the crash with https://github.com/puddly/serialpy?

You'll have to slightly patch zigpy-znp to use it but I'm wondering if pyserial itself may be the cause:

diff --git a/zigpy_znp/uart.py b/zigpy_znp/uart.py
index 5571e60..eb662e1 100644
--- a/zigpy_znp/uart.py
+++ b/zigpy_znp/uart.py
@@ -21,6 +21,10 @@ with warnings.catch_warnings():
     import serial_asyncio  # noqa: E402


+import serialpy as serial
+import serialpy as serial_asyncio
+
+
 LOGGER = logging.getLogger(__name__)

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I think I have it running. Can I somehow verify that it's using your serialpy impl?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

If you add serialpy.serial: debug to your HA logging config, you will see lines like this on startup:

2022-08-25 15:38:47.774 ubuntu zigpy_znp.api DEBUG Toggling RTS/DTR pins to skip bootloader or reset chip
2022-08-25 15:38:47.775 ubuntu zigpy_znp.uart DEBUG Setting serial pin states: DTR=False, RTS=False
2022-08-25 15:38:47.775 ubuntu serialpy.serial DEBUG Clearing modem bits: 0x00000002
2022-08-25 15:38:47.775 ubuntu serialpy.serial DEBUG Clearing modem bits: 0x00000004
2022-08-25 15:38:47.926 ubuntu zigpy_znp.uart DEBUG Setting serial pin states: DTR=False, RTS=True
2022-08-25 15:38:47.926 ubuntu serialpy.serial DEBUG Clearing modem bits: 0x00000002
2022-08-25 15:38:47.927 ubuntu serialpy.serial DEBUG Setting modem bits: 0x00000004
2022-08-25 15:38:48.079 ubuntu zigpy_znp.uart DEBUG Setting serial pin states: DTR=False, RTS=False
2022-08-25 15:38:48.079 ubuntu serialpy.serial DEBUG Clearing modem bits: 0x00000002
2022-08-25 15:38:48.079 ubuntu serialpy.serial DEBUG Clearing modem bits: 0x00000004

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Can you make sense of this error?

2022-08-25` 22:07:00.143 DEBUG (MainThread) [zigpy_znp.uart] Connecting to /dev/serial/by-id/usb-Texas_Instruments_XDS110__03.00.00.20__Embed_with_CMSIS-DAP_L4300230-if00 at 115200 baud
2022-08-25 22:07:00.147 DEBUG (MainThread) [zigpy_znp.api] Connection to /dev/serial/by-id/usb-Texas_Instruments_XDS110__03.00.00.20__Embed_with_CMSIS-DAP_L4300230-if00 failed, cleaning up
2022-08-25 22:07:00.147 ERROR (MainThread) [zigpy.application] Couldn't start application
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 106, in startup
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 688, in connect
    self._uart = await uart.connect(self._config[conf.CONF_DEVICE], self)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/uart.py", line 186, in connect
    _, protocol = await serial_asyncio.create_serial_connection(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 67, in create_serial_connection
    transport = transport_factory(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 30, in __init__
    self._serial = Serial(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/serial.py", line 111, in __init__                                         self._file = os.fdopen(self._fileno, "rb+")
  File "/usr/lib/python3.10/os.py", line 1029, in fdopen
    return io.open(fd, mode, buffering, encoding, *args, **kwargs)
io.UnsupportedOperation: File or stream is not seekable.
2022-08-25 22:07:00.153 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback ZnpMtProtocol.connection_made(<serialpy.Ser...x7eff2c21ef00>):   File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 126, in new
    await app.startup(auto_form=auto_form)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 106, in startup
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 688, in connect
    self._uart = await uart.connect(self._config[conf.CONF_DEVICE], self)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/uart.py", line 186, in connect
    _, protocol = await serial_asyncio.create_serial_connection(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 67, in create_serial_connection
    transport = transport_factory(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 29, in __init__
    super().__init__(loop, protocol, path, waiter, extra)
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/descriptor_transport.py", line 50, in __init__
    self._loop.call_soon(self._protocol.connection_made, self)
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/uart.py", line 66, in connection_made
    LOGGER.debug("Opened %s serial port", transport.serial.name)
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 44, in serial
    return self._serial
AttributeError: 'SerialTransport' object has no attribute '_serial'
2022-08-25 22:07:00.155 ERROR (MainThread) [homeassistant] Error doing job: Exception in callback BaseSelectorEventLoop.add_reader(44, <bound method...7eff2c21ef00>>):   File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 126, in new
    await app.startup(auto_form=auto_form)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 106, in startup
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 688, in connect
    self._uart = await uart.connect(self._config[conf.CONF_DEVICE], self)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/uart.py", line 186, in connect
    _, protocol = await serial_asyncio.create_serial_connection(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 67, in create_serial_connection
    transport = transport_factory(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 29, in __init__
    super().__init__(loop, protocol, path, waiter, extra)
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/descriptor_transport.py", line 51, in __init__
    self._loop.call_soon(self._loop.add_reader, self._fileno, self._read_ready)
Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 256, in _add_reader
    key = self._selector.get_key(fd)
  File "/usr/lib/python3.10/selectors.py", line 193, in get_key
    raise KeyError("{!r} is not registered".format(fileobj)) from None
KeyError: '44 is not registered'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 331, in add_reader
    self._add_reader(fd, callback, *args)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 258, in _add_reader
    self._selector.register(fd, selectors.EVENT_READ,
  File "/usr/lib/python3.10/selectors.py", line 360, in register
    self._selector.register(key.fd, poller_events)
OSError: [Errno 9] Bad file descriptor
2022-08-25 22:07:00.178 ERROR (MainThread) [homeassistant.components.zha.core.gateway] Couldn't start ZNP = Texas Instruments Z-Stack ZNP protocol: CC253x, CC26x2, CC13x2 coordinator
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/core/gateway.py", line 170, in async_initialize
    self.application_controller = await app_controller_cls.new(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 126, in new
    await app.startup(auto_form=auto_form)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/application.py", line 106, in startup
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 688, in connect
    self._uart = await uart.connect(self._config[conf.CONF_DEVICE], self)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/uart.py", line 186, in connect
    _, protocol = await serial_asyncio.create_serial_connection(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 67, in create_serial_connection
    transport = transport_factory(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/__init__.py", line 30, in __init__
    self._serial = Serial(
  File "/srv/homeassistant/lib/python3.10/site-packages/serialpy/serial.py", line 111, in __init__                                         self._file = os.fdopen(self._fileno, "rb+")
  File "/usr/lib/python3.10/os.py", line 1029, in fdopen
    return io.open(fd, mode, buffering, encoding, *args, **kwargs)
io.UnsupportedOperation: File or stream is not seekable.```

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Maybe buffering=0? open("/dev/tty", "r+b", buffering=0)

From https://bugs.python.org/issue20074

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Seems like that did the trick

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Nice find, thanks!

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Okay, got your latest serialpy (starts fine now, thanks) and flashed the 20220726 firmware again. Also, I disabled my "threaded zigpy-znp stuff" just to be sure.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Just curious, was there a reason you started your own serial implementation?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Pyserial has an entirely synchronous API and pyserial-asyncio as a shim doesn't actually delegate file descriptor reads to the event loop: it still uses select() internally, after the event loop notifies it of new data.

This is more of a test than anything else. Maintaining an entire serial library that is more than just POSIX-compliant is a little out of scope for zigpy but if this alternative implementation doesn't have the same bugs, it may help narrow down where to look.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Very interesting! And a bit over my head :-D

My controller locked up again over night. These are the last lines:

2022-08-26 03:15:05.741 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.Ping.Req()
2022-08-26 03:15:05.742 DEBUG (MainThread) [homeassistant.components.zha.core.channels.base] [0xE087:1:0x0b04]: failed to get attributes '['active_power', 'rms_current', 'rms_voltage']' on 'electrical_measurement' cluster:
2022-08-26 03:15:05.812 DEBUG (MainThread) [homeassistant.components.zha.core.channels.base] [0xE087:1:0x0702]: failed to get attributes '['current_summ_delivered', 'status']' on 'smartenergy_metering' cluster:
2022-08-26 03:15:10.742 ERROR (MainThread) [zigpy_znp.zigbee.application] Watchdog check failed
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 984, in request
    response = await response_future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 709, in _watchdog_loop
    await self._znp.request(c.SYS.Ping.Req())
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 982, in request
    async with async_timeout.timeout(self._znp_config[conf.CONF_SREQ_TIMEOUT]):
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2022-08-26 03:15:10.743 DEBUG (MainThread) [zigpy_znp.zigbee.application] Connection lost:
2022-08-26 03:15:10.744 DEBUG (MainThread) [zigpy_znp.uart] Closing serial port
2022-08-26 03:15:10.745 DEBUG (MainThread) [zigpy_znp.zigbee.application] Restarting background reconnection task

I will now try again with the dedicated thread in zigpy znp

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I am not sure if restarting HA couple of times would make things better or worse, but up until now I have had no issue with serialpy + dedicated ZNP thread

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Still no crash 🙂

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Still no crash, I think you're on to something with the serialpy 👏

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Was it crashing before, using your serial thread but with pyserial instead?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Yes, this is the first setup that seems stable with a SimpleLink SDK 6 firmware

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

My controller froze over night 😟 will try to find something in the logs

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

One thing that just came to my mind: I have quite a few smart plugs witch energy measurement. (TS011F)

When I started using them in December they did not Report some Attributes on their own - and I think they still don't. I believe someone built a timer in zigpy / quirks that does that now (if I remember correctly every 30s).

One thing that MIGHT make a difference is if one of these plugs is unavailable - which was not the case the night before, but was the case tonight.

Maybe route discovery has a memory leak or something similar?
Maybe the timer is messing with IO?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

This might be a clue! I went to the device page of an unavailable device (that plug) and went to the electrical measurement cluster and requested the current power few dozen time. My controller crashed within a few hours. Until then the log was full with route discovery requests and errors

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Does someone know if/how the attribute polling could affect the controller/serial loop?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Hmmm. Maybe try dropping the request concurrency? It should be 16 for you. Maybe try 4 or 8? https://github.com/zigpy/zigpy-znp#configuration

I think we should just try to get a proper debug report from one of the TI dev kits of the state of the microcontroller when it crashes and submit a bug report to Texas Instruments.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I'll try that!

How could I do the dump? I have a headless server. Is there a open source tool in the Ubuntu repositories that could do that?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Is it possible to initiate a route discovery manually?
I would like to try to automatically issue a bunch of route discovery requests to try and see if that might be the issue.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Correct me if I'm wrong, but should the function/function name not be part of the key?

key = tuple(bound.arguments.items())

If I read the code correctly the key would be the same for both functions using @combine_concurrent_calls:

async def _discover_route(self, nwk: t.NWK) -> None:

async def _get_or_discover_device(self, nwk: t.NWK) -> zigpy.device.Device:

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

tasks is created every time combine_concurrent_calls is called (once for every decorated function) and the returned replacement function is a closure that references that same object:

def decorator(func):
    cache = {}

    def replacement(*args, **kwargs):
        print("Current cache is", cache)
        cache[func.__name__] = True

        return func(*args, **kwargs), cache

    return replacement


@decorator
def func1():
    return 1


@decorator
def func2():
    return 2


if __name__ == "__main__":
    assert func1() == (1, {"func1": True})
    assert func2() == (2, {"func2": True})
    assert func2() == (2, {"func2": True})
    assert func1() == (1, {"func1": True})

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Ah, thanks, got it =)

Another question:
I don't know if its even possible, but IF the task in combine_concurrent_calls returns early (for whatever reason) and the assertion fails..
Would that mean that the task is never removed from the array?

assert tasks[key].done()
del tasks[key]

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

had no success with excessive route discovery requests btw =/

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

It would but I don't think it's possible to not be awaiting the task and for it to not either have an exception, a result, or be cancelled. Any of the three are "done". You'd see a logged error if that happened, which I've not seen so far.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Thanks!

I'm a bit out of ideas in the mean time.. Can only wait for crahes and see if I can find a pattern..

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I could try to intercept ther serial communication and log that..?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Could the logger be causing a delay?
It's the only thing (IO) left in the thread that is reading from the serial port

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I removed all logs from the zigpy thread.. I'll see..
Is your serial port also configured with "UART unknown"?

# setserial /dev/serial/by-id/usb-Texas_Instruments_XDS110__03.00.00.20__Embed_with_CMSIS-DAP_L4300230-if03
/dev/serial/by-id/usb-Texas_Instruments_XDS110__03.00.00.20__Embed_with_CMSIS-DAP_L4300230-if03, UART: unknown, Port: 0x0000, IRQ: 0

There seems to be a Low latency setting for serial ports in Linux.. maybe that could help

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I did not retry this but I before set the max concurrent requests to 64 and it crashed and got unresponsive. Maybe you can reproduce the lock up this way?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

does the zigpy backup honor the maximum concurrent requests?
I think I just had a freeze triggered by a backup

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Backups don't send requests so that limit doesn't apply. Z-Stack also supports only a single in-flight request at a time, which zigpy-znp honors:

zigpy-znp/zigpy_znp/api.py

Lines 959 to 961 in 824c2b2

# We should only be sending one SREQ at a time, according to the spec
async with self._sync_request_lock:
LOGGER.debug("Sending request: %s", request)

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Thanks again for your fast reply.
Could it be that in the background there are incoming messages (eg attribute push mesages) piling up and consuming all the memory?

I think I'll give up for now. I flashed a V5 firmware again. Maybe I can use a spare controller to make a test setup.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Hey @puddly I just thought my coordinator crashed, but it seems like it was a backup that blocked all requests and stalled them. Is the complete backup done in one sequence without time for requests inbetween?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

It's done asynchronously, both happen at the same time and whichever one enqueues a request first goes first.

That being said, only a partial backup is taken at runtime after the complete one is taken on radio startup, and only once every 24 hours. It takes 0.02s for one to run for me.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Is the serial port supposed to be closed just before the backup?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

No, backups just send commands and receive responses, the serial port is never closed at runtime (unless the watchdog closes it after 30s of unresponsiveness).

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Would a backup be made immediately after a reconnect? I seem to have a "closing serial port" message within a few seconds before every Backup

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

A backup is taken the moment the radio starts up so if the serial port loses connection and zigpy-znp reconnects, a new backup will be taken every time.

I've modified my local setup to take a complete backup over and over in the background, with a 0 second delay between each one. I experience only a tiny delay sending requests but otherwise no noticeable impact so far in the past 10 minutes. This with the same beta firmware, on the same TI CC1352p dev kit with no flow control enabled.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Then I need to find out why my device is seemingly randomly disconnecting

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Any idea what could cause this? Last log lines before close (Did not shut down HA)

2022-08-30 20:06:34.467 DEBUG (MainThread) [homeassistant.components.zha.core.gateway] Shutting down ZHA ControllerApplication
2022-08-30 20:06:34.472 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2462](LCT003): last_seen is 11571.180652618408 seconds ago and ping attempts have been exhausted, marking the device unavailable
2022-08-30 20:06:34.472 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2462](LCT003): Update device availability -  device available: False - new availability: False - changed: False
2022-08-30 20:06:34.489 DEBUG (MainThread) [zigpy_znp.api] Sending request: SYS.ResetReq.Req(Type=<ResetType.Soft: 1>)
2022-08-30 20:06:34.490 DEBUG (MainThread) [zigpy_znp.api] Request has no response, not waiting for one.
2022-08-30 20:06:34.491 DEBUG (MainThread) [zigpy_znp.uart] Closing serial port

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Aah, its the "Update configuration" button in the UI (Integration page).
Is this to be expected?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Is the pyserial .write method thread/concurrency safe?

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

I now seem to have an extremely well performing network in comparison to before with:

  • the CC1352P7
  • firmware changes
  • setting this line to NONE rather than SUPPRESS_ROUTE_DISC_NETWORK
  • 1M baud instead of 115200
  • zigpy znp dedicated thread

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Just for completeness, here are the definitions from zigpy, z2m and z-stack:

zigpy:

    SUPPRESS_ROUTE_DISC_NETWORK = 0x20 		# dec 32
    SKIP_ROUTING = 0x80				# dec 128

z2m

DISCV_ROUTE: 32,
SKIP_ROUTING: 128

Z-Stack Stack/af/af.h

#define AF_SUPRESS_ROUTE_DISC_NETWORK      0x20   // Supress Route Discovery for intermediate routes
                                                  // (route discovery preformed for initiating device)
#define AF_SKIP_ROUTING                    0x80	#dec 128

It seems like the search does not find any usages of the option.
Could of course be in another project, though.

It seems to me from the comment in af.h that AF_SUPRESS_ROUTE_DISC_NETWORK should be used during joining only?

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

It seems to me from the comment in af.h that AF_SUPRESS_ROUTE_DISC_NETWORK should be used during joining only?

The only documentation is that single comment and from what I recall, these flags are processed by the closed-source portions of Z-Stack. My understanding is that it disables unnecessary unicast route discovery requests, since Z-Stack will be doing its own route discovery broadcasts.

There are discussions about the different approaches to routing and their use cases within the Z-Stack developer guide: Z-Stack 3.0 Developer's Guide.pdf

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

from Stack/af/af.c:

  if ( options & AF_SUPRESS_ROUTE_DISC_NETWORK )
  {
    req.discoverRoute = DISC_ROUTE_INITIATE;
  }
  else
  {
    req.discoverRoute = AF_DataRequestDiscoverRoute;
  }

from Stack/nwk/nl_mede.h:

// Route Discovery Options
#define DISC_ROUTE_NONE     0x00  // Don't discover route
#define DISC_ROUTE_NETWORK  0x01  // If a route is needed, the device (also
                                  // intermediate router) will issue  a route
                                  // disc request.
#define DISC_ROUTE_INITIATE 0x04  // Only the source router initiates route req.

Also:
AF_DataRequestDiscoverRoute seems to always be DISC_ROUTE_NETWORK

So, I would read it this way:
If the flag ist SET: Only the source router initiates route req.
If the flag is NOT SET: If a route is needed, the device (also intermediate router) will issue a route disc request.

Not sure what to do with this information, though 😂

from zigpy-znp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.