Coder Social home page Coder Social logo

Comments (31)

puddly avatar puddly commented on August 17, 2024 2

Retrying when encountering BUFFER_FULL (and similar errors) has been implemented in 031c5cb, which should reduce the number of failed requests.

If you want to test these changes out, you can install the latest commit from the dev branch by following the instructions in the README: https://github.com/zigpy/zigpy-znp#home-assistant

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024 1

@deisi you can clone your network onto the second stick with an NVRAM backup/restore: https://github.com/zha-ng/zigpy-znp#nvram-backup-and-restore

Afterwards make sure to clear the first stick and not run them concurrently (to migrate back you'll need to perform the same procedure but with the paths swapped):

$ python -m zigpy_znp.tools.nvram_reset /dev/serial/by-id/old-radio

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024
  • BUFFER_FULL means that the CC2652R's transmit buffer is full and your request will not be sent.
  • Lights being triggered but HA's state not being updated usually means that it takes more than 10-15 seconds for the light to send back a response (for whatever reason), which exceeds zigpy's internal timeout so the request is assumed to have failed.

Are you triggering a lot of individual lights simultaneously? If you enable debug logging for ZHA (it'll be quite verbose), do you see any warnings like Received an unhandled command? If you post the full debug log that contains both the buffer error and HA not updating the light state (after stripping out any sensitive info), it'd be quite useful.

As a potential stopgap solution, you may want to try to decrease the request concurrency from 16 to 8:

zha:
  zigpy_config:
    znp_config:
      # default is "auto", which is 16 for the CC2652R
      max_concurrent_requests: 8

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

So I have 12 lamps, and they are all part of the same lamp group. It is this one group that I am turning on or off. I tried collecting a debug log with the mentioned error, but I have no success so far. However even without the error showing up in the logs, I was seeing individual lamps not getting updated

I found some unhandeled commands though, but not very many:

2020-10-22 14:28:12 WARNING (MainThread) [zigpy_znp.api] Received an unhandled command: AF.DataConfirm.Callback(Status=<Status.NWK_NO_ROUTE: 205>, Endpoint=1, TSN=95),

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

log.txt

is the full log, where I turned lights on, changed brigness and off. In particular during the turning off step at 14:28:49 some lights were not updated correctly.

I really tried triggering the error again with debugging on, but for some reason its not happening any more.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

Looking through your log, all of the errors are being caused by the following two lights (0xA4F0 and 0x2B90):

2020-10-22 14:28:51 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0x2B90), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=136, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x88\x00\x00\x00'),
2020-10-22 14:28:51 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.MAC_NO_ACK: 233>, Endpoint=1, TSN=136),

2020-10-22 14:28:54 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0xA4F0), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=149, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x95\x00\x00\x00'),
2020-10-22 14:28:55 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.MAC_NO_ACK: 233>, Endpoint=1, TSN=149),

2020-10-22 14:28:12 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.NWK_NO_ROUTE: 205>, Endpoint=1, TSN=98),
2020-10-22 14:28:05 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0x2B90), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=98, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x62\x00\x00\x00'),

2020-10-22 14:28:57 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.NWK_NO_ROUTE: 205>, Endpoint=1, TSN=118),
2020-10-22 14:28:49 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0xA4F0), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=6, TSN=118, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x76\x00\x00\x00'),

Here is the average link quality for the received packets from each of your devices (this quantity is directly proportional to the RSSI):

Node LQIs Mean LQI
0x036A 0 0
0x126F 96, 99 97
0x161A 54, 57 55
0x2B90 0, 3 1
0x3676 51, 54 52
0x6752 51 51
0x7437 12, 15 13
0x80BB 33 33
0xF052 147, 144 145
0xC2C0 6, 9 7
0xD0DF 12, 9 10
0xFB18 24, 27 25

0x2B90, 0x036A, 0x7437, 0xC2C0, and 0xD0DF have very weak signals and do not seem to be routing through closer bulbs (i.e. 0xF052 and 0x126F).

0xA4F0 is not on this list because it cannot directly communicate with the coordinator and has to send packets through other devices, like 0x036A, 0x2B90, and 0xF052. As you can see, your link quality is extremely low. Your devices are not routing packets effectively for some reason.

Have you tried updating the firmware of your bulbs? If they're one of the supported brands, it'll happen automatically over-the-air after you enable the correct settings and notify the devices: https://old.reddit.com/r/homeassistant/comments/fak430/how_to_update_your_ikea_or_ledevance_firmware/

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

Looking through your log, all of the errors are being caused by the following two lights (0xA4F0 and 0x2B90)

yes but I think this is only true for the particular event I send to you. The lights are all relatively close. The furthers away is maybe 6m from the zigbee stick and there is no solid wall in between. Its all in one big room, light and zigbee stick. The stick is also on a 1m usb extension cable and there is not much 2.4 GHz WIFI interfering. The not correctly reporting lights is changing. Its sometimes this one and sometimes that one. However there are a few bulbs that always work. So silicon lottery could be part of it.

Have you tried updating the firmware of your bulbs?

Its lightify bulbs, I can check if there is a new firmware available, but I have updated it in the past. However I think the lightify bulbs are of minor quality.

Could you point me to the tool you used to get the LQis values.

What makes me doubt that it is all about link quality is, that, manual calls to the on_off property of the OnOff cluster update light states 100%

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

I managed to cache the BUFFER_FULL error wile debugging was running:
The fulllog file is 46MB, so I don't think I should put it here. I copy a generous selection around it though, tell me if you need more.

ha.log

from zigpy-znp.

MattWestb avatar MattWestb commented on August 17, 2024

For easy monitoring LQ and the network its easyest installing zha-map and the zigzag.
Looks nice and you see wath is good and weak links in the network and nicely integrated in HA as a panel (recommended mode).

PS: Take one look with one wifi scanner app so you can see if you have interference with wifi networks in the near and if needed changing the wifi / zigbee channel.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

@MattWestb

THX took two images of my network. Is really each lamp a sibling of each lamp? I always thought zigbee tries to build a star topology and only if needed it routes through other devices. Everybody connect to everybody seems not to be a good idea to me. At least that is how I understood: https://www.zigbee2mqtt.io/information/zigbee_network.html#zigbee-network

Screenshot from 2020-10-23 12-39-59

Screenshot from 2020-10-23 12-40-15

from zigpy-znp.

MattWestb avatar MattWestb commented on August 17, 2024

For my its looks like the mesh is not liking talking with other routers and only using the NCP for communicating :-(((

My test network with Xiaomi sensors and IKEA outlets as base rouers (preparing for christmas lights) is building parents between all possible routers and is using 2 hops for reaching the last router (Vorzimmer).
ZigZag01

One thing I using one EZSP (IKEA module) as NCP for the moment and have the CC-2531 not active then its too weak (The latest with the EZSP is that is not liking direct children (End devices) and kicking them from the NCP so they is connected thue routes = I like that but its one bug in the firmware).

One more thing: I was having 2 "HOMA" dimmer / LED drivers ("chinese ZB3" = old Zigbee pro) but have moving them to deCONZ because they was very bad routers and was only making things worse (no parents and only redlines for LQ). They is based on ITs CC2530 and have bad antennas / RF parts).

You can also trying installing the zha-network-visualization-card its writing out more info like LQ and device info but is to large for the screen and not so easy to install.

from zigpy-znp.

MattWestb avatar MattWestb commented on August 17, 2024

You have at least 3 router devices that is acting as parent to the NCP but all others is only siblings (children) connected as end device I think ( = not knowing).
Is the 3 parent devices real Zigbee 3 or zigbee PRO (new HA or LL) devices and is the other older HA / LL devices (old zigbee not PRO) that is working in backward compatibility mode (router is connecting as end devices) ?

All my IKEA outlets is real Zigbee 3 and the bulbs LL but is new version (Zigbee PRO) so is connecting as parents.

I'm interested wat @Adminiuga is thinking of this scenario.

Edit: One of my IKEA bulb (Opal 1000lm = New LL) has not parent to the NCP but is have children = is one router in the mesh.

Edit 2: My 2 Philips SML001 (BW Wohnzimmer X) is connected LL routers so perhaps they don't like IKEAs ZB3 routers.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

Is the LQi quantity a reliable number? Given the fact, that all devices are supposed to have bad connection I rather think it is the CP2652, that is causing the issue.

from zigpy-znp.

Adminiuga avatar Adminiuga commented on August 17, 2024

In zha map the lqi is reported as seen by devices. Often the lqi interpretation is up to the vendor, but usually same vendor reports consistently across their devices

from zigpy-znp.

MattWestb avatar MattWestb commented on August 17, 2024

I think (without knowing) its normally relevant but its depends of the chip manufacture how is implanted in the device. TI has one "normal" version but the CC-253X like My HOMAs and most OSRAM is famous for low LQI and real life not working so well. Silabs (EZSP) is using one not normal method and is recalculated in ZHA for presentation. I think EZSP is normally too high but normal working well. I have my IKEA GW 20 cm from my WiFi router and its working well and have long distance to devices.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

Just to check, I have also added a Philips LWB004 light bulb and positioned it arund 1m away from the stick. In the beginning it was showing an LQi of around 30, but now it has 141, so maybe it really is the bulbs them self.

from zigpy-znp.

MattWestb avatar MattWestb commented on August 17, 2024

Interesting to see is the other routers is attaching 2 it and and getting better LQI thru it as from the coordinator.
The routers is updating their status with the time and the end devices 2 but most of the end devices is not changing their parent so easy and using the connected one until being repaired (Not 100% true).

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

What CC2652R coordinator hardware are you using?

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

https://electrolama.com/projects/zig-a-zig-ah/

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

If lights in the same room do not have a high (>100) LQI, there is a small possibility that you have a defective ZZH stick (I'm unsure of the specifics but I think there was a bad batch?). You may want to get in touch with @omerk via email to verify if this is the case.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

I have a second one :) so I give that one a try first. But my statement of the furthest away is 6m was underexagerated. If I look more closely its rather 12m, but all within one room.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

ah thx

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

It'll take a bit for the network to stabilize (I believe because the existing routes are not preserved) so I'd power cycle the lights afterwards or just let it sit for a bit before testing. Otherwise you'll get a few routing errors.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

@puddly but you have seen, that I got a logfile with debugging on of the buffer overflow error causing this hole threat?

Of course I'm very grateful for all your help with my connection issues, but in the end this is not a support forum, so I hope the log helps with debugging the bug.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

The transmit buffer getting full in the CC2652R's firmware isn't really a bug with zigpy-znp, it's expected behavior if you trigger a lot of lights concurrently and they don't respond fast enough due to TX issues for the buffer to clear out completed requests. For comparison, I'm able to rapidly toggle close to 30 lights individually on my network once a second and do not receive a single error.

Have you tried decreasing the request concurrency?

zha:
  zigpy_config:
    znp_config:
      # default is "auto", which is 16 for the CC2652R
      max_concurrent_requests: 8  # maybe even try 4?

Retrying requests will essentially have the same effect but I think I can include an internal retry for the BUFFER_FULL status (I believe zigbee-herdsman does this?) since this isn't really a Zigbee error.

from zigpy-znp.

deisi avatar deisi commented on August 17, 2024

A, so the two issues are indeed connected I have not tested your suggestion yet, as I was A trying to reproduce the error and B trying to test the other suggestions given here first but I have not forgotten about it.

In my desperate attempt to improve Link Quality, I have flashed a C2531 with the router firmware. And added it to the Network. If I position it very close to the coordinator, it has a LQ of > 120, If i move it to the other end of the room. roughly 10m away, it has a LQ of 3.

So maybe we have very dense air in the room here ^^, is LQ sensitive to interference?

Ah and changing the zig-a-zig-ah stick didn't change much.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

You can read about how the LQI is calculated here: http://software-dl.ti.com/simplelink/esd/plugins/simplelink_zigbee_sdk_plugin/1.60.00.14/docs/zigbee_user_guide/html/zigbee/developing_zigbee_applications/z_stack_developers_guide/z-stack-overview.html#id10. It's the RSSI remapped to 0-255, where 0 is the minimum and 255 is the maximum observed value by the radio.

If you got both zzh sticks at the same time it could be that they're from the same batch?

Using the LAUNCHXL-CC26X2R1 with a trace antenna (whose LQI values are about 30% less on average than the zzh's when I compared the two) I get an LQI of 36 between the coordinator and an outdoor bulb about 15m away, where signals have to pass through an interior wall, closet doors, an outdoor wall, and a metal light fixture. If you're getting an LQI of 3 in the same room, I think something is going on with your hardware.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Hi!
I believe I might have the same problem.

TL;DR: at the end of post

Here some general information:
I am currently running ~80 devices over Conbee II + Phoscon. Pretty much flawlessly. Those devices are ~40 lights and 40 sensors. Lights are mainly TRADFRI but also some Hue bulbs and a lightstrip. The sensors are mainly Door/Window sensors and Smoke sensors. There are a few remotes (tradfri/hue) too.

Besides that, I have a zig-a-zig-ah! stick lying around (because I just had to buy one).

I started using Home Assistant about a year ago. Back then I had Tradfri Gateway and Philips Hue gateway, which I merged with a CC2531 + z2m (because somtimes lights did not respond). It worked fine at first, but the larger the network got, the more frames were lost. I then tried ZHA instead of z2m but dound the same behaviour. I then purchased a ConBee II and retried. Still, I had issues with frames being dropped and lights being reported falsely. I wrote the deCONZ support and they urged me to switch to Phoscon, which I did. I must say I was quite suprised how good things were going. I seldomly have connectivity issues so everything is pretty much fine.

And now comes yesterday.

I am a fan of "keeping stuff simple" which in this case meant getting rid of a (theoretically) unnecessary container and piece of software: replace Phoscon with ZHA. A report posted at jcallaghan/home-assistant-config#167 geve me hope everything would work out fine this time. For the sake of easy rollback I used the zzh! stick and added the ZHA integration. After a few failed attempts (It looks like it had remembered the network I set up months ago) I was able to restart with a fresh network (phoscon on channel 15, zzh! on channel 25) . I kicked out 6 lamps and 2 remotes in the office and joined them to the new network. Unfortunately a quickly saw the same behaviour I had seen earlier. Lights often did not respond, States were often not reported. I tried 3 different antennas (from small to huge) but was not able to make things better. I also saw the BUFFER_FULL messages and tried out your development branch. It did in fact fix the BUFFER_FULL message, but the lights behaved the same. To rule out the possibility that it is faulty hardware I then dumped all of it and reconfigured ZHA with my ConBee II. Lights rejoined and popped up in HA, but the experience sadly was the same: Lights were not being set and states not being reported correctly.

So finally, I had to give up and go back to Phoscon again.

Ok, sorry for the long post.

I hope to help you guys with your problem, I believe we might have the same issue.
My lifes zigbee history probably is a lot here, but if you have an idea where I could post it to get help and get things sorted out, please let me know.

TL;DR
The dev channel seems to fix the BUFFER_FULL issue, but did not change the lights behavior and false states in HA

And thanks for all work you are putting into this.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

@dumpfheimer Thanks for the info and for testing out the latest codebase.

Can you enable debug logging and upload your complete home-assistant.log file (with your PII redacted, if you wish)? The log file size doesn't really matter too much so let HA run for an hour and make this happen a few times, it'll really help clarify what the underlying issue is.

from zigpy-znp.

dumpfheimer avatar dumpfheimer commented on August 17, 2024

Hi!

I just sent you an email with the logs from my last experiment.

I think there is a tweak, that might be escalating this behaviour:
When a group of lights is toggled in HA, it seems like it only sends to on/off command to those lights, which it believes to be in the opposite state.
What I mean:

Make a group of lights A, B and C.
They are all off, and shown as off, too. Then toggle the group. Let's say all 3 lights actually turn on, but the state of C is reported falsely and it shows it is off in HA. Now toggle the group again. It seems like the "off" command is only sent to A and B, because HA things C is off anyway. This causes C to stay on, even if it could have received the message correctly.

Is there a possibility to disable this "internal state check"?

BR
CK

PS: I hope you can make sence of the log, I am unfortunately lacking the time to reproduce everything at the moment.

from zigpy-znp.

puddly avatar puddly commented on August 17, 2024

I believe this issue has been fixed or significantly mitigated in the last few releases.

from zigpy-znp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.