🌐 AI搜索 & 代理 主页
Skip to content

Conversation

@puddly
Copy link
Collaborator

@puddly puddly commented Aug 3, 2025

#1633 is a more "proper" solution but I think this one is more robust.

Essentially, the coordinator and the device maintain separate sequence numbers for outgoing requests. If these happen to overlap at some point, it's possible that a device sends an attribute report with the same sequence number as a response to a request.

To avoid this, we avoid getting too close to the device when generating new TSNs while avoiding our own pending requests.

@puddly
Copy link
Collaborator Author

puddly commented Aug 3, 2025

The new function will be much slower but it's still reasonably fast. This is an average case comparison, with the RX TSN randomly varying within the range:

In [17]: %timeit device.get_sequence()
626 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [19]: %timeit device.old_get_sequence()
41 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Caching pending TSNs will only speed things up by 50ns.

@puddly puddly marked this pull request as ready for review August 3, 2025 16:47
Copilot AI review requested due to automatic review settings August 3, 2025 16:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prevents sequence number collisions between the coordinator and devices by implementing a rotation threshold mechanism. When generating new transaction sequence numbers (TSNs), the coordinator now avoids numbers that are too close to recently received sequence numbers from devices.

  • Replaces simple sequential TSN generation with a collision-aware algorithm
  • Tracks the last received sequence number from each device
  • Implements a 20-number threshold to maintain distance between coordinator and device sequence numbers

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
zigpy/device.py Implements new TSN generation logic with collision avoidance and device sequence tracking
tests/test_zdo.py Updates test expectations to reflect new TSN generation starting at threshold value
tests/test_device.py Adjusts duplicate request tests for new TSN collision behavior and adds device sequence setup

@MattWestb
Copy link
Contributor

I have looking little and Zigbee is using sequence counters in all 4 layers: 15.2, Zigbee Network, Zigbee Application and ZCL.
I think you is looking only on ZCL layer but i think it can being important also looking at the application layer for some scenario then some times you can using both for pinpointing the unique transaction.
One example for compare: IKEA Dirigera is Zigbee PRO then have all updated firmware so if one Rodret dimmer is sending one ON its only to the coordinator and the coordinator is doing one broadcast of the ON with different ZCL Seq (no group or device bindings, all in the coordinator).
In ZHA Rodret dimmer with group binding is sending 2 ON one to 0x000 (with EP) and one to the group (without EP) and both is with the same ZCL Seq but different Seq in Zigbee application layer (unicat all "jumps" is heaving all the same and all the braodcast is having one other).

That is making me thinking you must do the same for group Seq so they is not colliding with unicast Seq but is depends if you have locked down to the device / endpoint / cluster or is fully open then the sender is the same of the command (you knowing me and reading cod. . .).

I think you is on one very good approach that can solve some strange problems in some systems if getting colliding Seq and messing things up !!

2 interesting reflations:
IKEA Inspelning is one real network spammer then reporting the power attributes (i like it then its not as some Matter devices that reporting every 5 minutes) but if having more of them its little messy then sniffing and if sending commands to them its sooner or later using the same Seq for command sent and one attribute reporting = can getting problems in the system.

And IKEA Dirigera with all updated (Ziogbe 3) devices is one pure Zigbe PRO of lat version (likely 2023) then broadcasts is not being repeated then all routers is using passive broadcast ack (the broadcasting router is listening that all nigglers is repeating OK and not rebroadcast 3 times as old Zogbee is doing).

@puddly
Copy link
Collaborator Author

puddly commented Aug 5, 2025

Thank you for the research!

That is making me thinking you must do the same for group Seq so they is not colliding with unicast Seq but is depends if you have locked down to the device / endpoint / cluster or is fully open then the sender is the same of the command (you knowing me and reading cod. . .).

Right now we only filter duplicate packets in a 10s window if they have the same source, destination (including address mode), endpoints, cluster, and profile. We ignore the APS sequence number. I think this is working and would not be affected by the issue you mentioned above where a device is reusing a TSN.

I haven't really looked too much at APS TSNs, to be honest, and I think we should look at how we do retrying: all retries of a request should use the same APS and ZCL sequence numbers, I believe.

@MattWestb
Copy link
Contributor

By the layer as long doing ZCL all shall being OK only using TSN for that.
But we also do things in the app layer (Zigbee app layer that is under ZCL and abow Zigbee network layer) outside ZCL like manufacture cluster and "extended" commands. If we handling them the same was ZCL is doing it shall working OK. If not must implanting it for there function and the same for this functions (i think implanting touch link is being all outside but i can "living its own life").
It was some time ago i was diving in it but is one very deep rabbit hole but also interesting !!

Great work !!!!!

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.29%. Comparing base (a78dd9f) to head (6e3fcf3).

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1636      +/-   ##
==========================================
- Coverage   99.31%   99.29%   -0.02%     
==========================================
  Files          63       63              
  Lines       12184    12198      +14     
==========================================
+ Hits        12100    12112      +12     
- Misses         84       86       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants