09-30-2021 02:28 PM
I have 3 Ruckus APs (720, 710, 510) connected to a switch, which is connected to a router/DHCP server/DNS server.
When connecting a client device to the WiFi network, there is a short period of time (1 to 2 seconds) immediately after the WiFi connects where the client is unable to reach the network.
The issue only happens immediately after connecting - after those 1-2 seconds, the WiFi on all my devices works great, is rock solid, and blazing fast. And - it only happens on 5G, or at least, this delay is much longer on 5G than 2G. It happens on all 3 APs.
Why does a delay of a few measly seconds matter? It matters because Android uses network unreachability detection (NUD) to determine whether it's connected to a usable network. It does so by sending ARP requests to the default gateway. If NUD discovers the network is unreachable (i.e. no responses to these ARP requests arrive), it terminates the WiFi connection. Sometimes, NUD triggers before this weird problem fixes itself.
As a result, my Android phone sometimes takes 2, 3, or 4 attempts until it successfully connects to my WiFi. This is particularly annoying during roaming, as the wifi drops when walking around the house.
Is someone else observing this behavior? (The issue might also be on the client side, or with the switch, or with the router - but the fact that it is limited to 5G makes me very, very suspicious of the AP: EDIT: Yep, the culprit is the AP/Unleashed. I'm really peeved to find bugs like that in a networking device with a $1200 MSRP.)
Thanks,
- Dave.
Solved! Go to Solution.
01-22-2022 06:51 PM
I spotted this little gem in the "resolved issues" section of the release notes for 200.12.10.5.234:
Issue ER-10340 The UE loses network connectivity when roaming between radios in the same AP, in WPA2 WLAN
So I hit "upgrade" and held my breath...
And, sure enough, it looks like the issue is fixed, or at least the behavior has massively improved compared to before!
Someone at Ruckus listened - THANK YOU!!!
10-05-2021 10:21 AM
I have not seen this behavior, however, it is worth checking why it is happening.
Could you run the client troubleshooting utility and try to track a client connection while reproducing the problem. Check exactly where this delay is happening.
Also taking packet capture will help here.
10-05-2021 12:26 PM
Thanks Syamantak!
I've noticed 2 odd things in the client debugging tool:
1. The AP sends bogus "inactivity timeout" disconnects.
In the screenshot below, I connected to AP A (on 2.4G, at 11:26:06), then roamed to AP B (on 5G, 11:27:50). 16 seconds after roaming to AP B, and merely 2 minutes after initially connecting to the network, the device gets booted due to inactivity. My inactivity timeout is 500 minutes, and "force DHCP" is disabled, so this most certainly looks like a bug in Unleashed.
2. There are multiple DHCP requests from the client on 5G
After initially connecting to the wifi, the client sends 3-4 DHCP requests. Per screenshot below, the AP sees DHCP ACKs, but the ACKs get dropped either on the client, or between AP and client. This only happens on 5G, on 2.4G the DHCP request isn't repeated. So this is consistent with what I initially observed with ARP requests.
10-05-2021 12:35 PM
@david_fuchs if we don't see any delay in wifi connection (eapol) then DHCP can be culprit.
Please take a capture on AP's eth interface and on DHCP server.
Compare the send and receive delay between DHCP server and AP.
For inactivity timeout, it can also trigger if client is not responding to block ack frames.
10-05-2021 01:27 PM
The DHCP server is not the culprit.
It's a bit hard to see as the forum mangles and downscales my screenshots a bit aggressively, but here's the relevant exchange, as seen by the AP:
Note that the client sends 3 separate requests - at 11:19:15, 11:19:16, and 11:19:18 - and the DHCP server immediately responds to all 3. The AP sees these DHCP ACKs - the client does not. This indicates that they get lost, either on the client or between AP and client. And it's not just DHCP - ARP behaves the same way.
So, the bug clearly lies with either the client or the AP. So I decided to test with a different client (a Lenovo X1 with an intel wifi chipset, running linux). And guess what? I'm seeing the exact same behavior. So unless there is a client bug that affects both Intel and Qualcom, across different operating systems, and only on 5G... This is a bug in Ruckus access points.
So I guess the next question is - is there some particular setting that triggers this bug? Any settings worth trying to work around it?