Looking for a little help or some ideas here.
Setup:
Just deployed a ZD3025 (FW: 9.12.1.0 build 148) with 16 AP's; mix of R300, T300, T301s.
4 of 5 VLAN's currently active
4 SSID's currently being broadcast campus wide, each with it's own VLAN/Subnet for segregation
All WLAN's are set to use WPA2-AES-PSK
NetGear smart switches to facilitate the VLAN's (M4100's, GS110TP's, GS510TP's, few others)
Linux ISC DHCPd server running on CentOS 7 minimal, using NetworkManager for the VNIC/VLAN to service all subnets
All VLAN's trunk to the DHCP server and the DHCP server services all requests in a timely manner without issue, with the exception I'm posting about now.
I've been getting reports that some users are failing to get an IP address from the server, but it's sporadic from what I've been able to determine.
I turned on "Disconnect is fails to get DHCP" with a 30 second timeout, and started getting the following message in the ZD3000 monitoring logs:
User[] disassociated from WLAN[] at AP[] due to force DHCP timeout. User IP [0.0.0.0], VLAN[3], DHCP-assigned-IP [0.0.0.0], DHCP lease time [0].
It happens on renewals and initial requests. (Renewals obviously list the current User IP)
I've checked the syslog (messages) in CentOS and see plenty of DHCPDISCOVER and DHCPOFFER, but not the expected amount of DHCPREQUEST and DHCPACK messages being logged, but many are still logged and clients are getting IP's on all VLAN's in general.
I've also noted that my Sonicwall is seeing "malformed or unhandled" UDP packets on port 67/68 on the 'X' interfaces that service the corresponding VLAN/Subnet. I have yet to setup a Wireshark or packet capture to find out what's actually malformed or broken in the packet, but ultimately, there's really nothing that should be breaking it. My best guess thus far has been the proprietary thing Ruckus does at the AP level, converting between broadcast/multicast/unicast to cut down on traffic.
The issue doesn't seem to happen on any wired clients, or from the Ubiquiti UniFi AP's that are being phased out. It's only being reported by people that are connecting to the areas serviced by the Ruckus AP's, which granted are currently the heavier traffic areas, with the most users.
The only real change I've done to the AP settings is to set the override in the group for the R300's to max users of 220 (110 per radio, using WPA2-AES, per Ruckus documentation)
So the bulk of users get DHCP just fine and function just fine, but a small number of users with varied devices fail to get DHCP via the Ruckus, and it doesn't make a whole lot of sense to me why most would work, but some don't and it's random. Once they get an IP address they also function fine.
I haven't setup any kind of DHCP relays since the DHCP server listens and services each VLAN/Subnet directly.
It doesn't appear to be related to a weak signal problem either.
Any help or ideas would be appreciated. Thanks!