Re: Malformed DHCP sporadically coming from client...

Anonymous · ‎03-22-2017

Looking for a little help or some ideas here.

Setup:
Just deployed a ZD3025 (FW: 9.12.1.0 build 148) with 16 AP's; mix of R300, T300, T301s.
4 of 5 VLAN's currently active
4 SSID's currently being broadcast campus wide, each with it's own VLAN/Subnet for segregation
All WLAN's are set to use WPA2-AES-PSK
NetGear smart switches to facilitate the VLAN's (M4100's, GS110TP's, GS510TP's, few others)
Linux ISC DHCPd server running on CentOS 7 minimal, using NetworkManager for the VNIC/VLAN to service all subnets
All VLAN's trunk to the DHCP server and the DHCP server services all requests in a timely manner without issue, with the exception I'm posting about now.

I've been getting reports that some users are failing to get an IP address from the server, but it's sporadic from what I've been able to determine.

I turned on "Disconnect is fails to get DHCP" with a 30 second timeout, and started getting the following message in the ZD3000 monitoring logs:
User[] disassociated from WLAN[] at AP[] due to force DHCP timeout. User IP [0.0.0.0], VLAN[3], DHCP-assigned-IP [0.0.0.0], DHCP lease time [0].

It happens on renewals and initial requests. (Renewals obviously list the current User IP)

I've checked the syslog (messages) in CentOS and see plenty of DHCPDISCOVER and DHCPOFFER, but not the expected amount of DHCPREQUEST and DHCPACK messages being logged, but many are still logged and clients are getting IP's on all VLAN's in general.

I've also noted that my Sonicwall is seeing "malformed or unhandled" UDP packets on port 67/68 on the 'X' interfaces that service the corresponding VLAN/Subnet. I have yet to setup a Wireshark or packet capture to find out what's actually malformed or broken in the packet, but ultimately, there's really nothing that should be breaking it. My best guess thus far has been the proprietary thing Ruckus does at the AP level, converting between broadcast/multicast/unicast to cut down on traffic.

The issue doesn't seem to happen on any wired clients, or from the Ubiquiti UniFi AP's that are being phased out. It's only being reported by people that are connecting to the areas serviced by the Ruckus AP's, which granted are currently the heavier traffic areas, with the most users.

The only real change I've done to the AP settings is to set the override in the group for the R300's to max users of 220 (110 per radio, using WPA2-AES, per Ruckus documentation)

So the bulk of users get DHCP just fine and function just fine, but a small number of users with varied devices fail to get DHCP via the Ruckus, and it doesn't make a whole lot of sense to me why most would work, but some don't and it's random. Once they get an IP address they also function fine.

I haven't setup any kind of DHCP relays since the DHCP server listens and services each VLAN/Subnet directly.

It doesn't appear to be related to a weak signal problem either.

Any help or ideas would be appreciated. Thanks!

Anonymous · ‎03-23-2017

Hi

See below posts..you can start from here: My hunch is that directedDHCP feature is causing the issue

https://danielkuchenski.wordpress.com/2014/09/22/ruckus-dhcp-filteringmanipulation/

https://forums.ruckuswireless.com/ruckuswireless/topics/devices-behind-wireless-bridge-do-not-gettin...

Posts may not be 100% relevent however solution will indeed help your case.

Best of luck

Anonymous · ‎03-23-2017

Thank you for the reply. I ended up trying that yesterday and it doesn't seem to have helped at all.

ruckus(debug)# remote_ap_cli -A "set qos directedDHCP disable"
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command 'rkscli -c "set qos directedDHCP disable "' executed at
Direcet DHCP is Disabled
OK
---- Command Execution Summary:
         success: 16
         failure: 0
           total: 16

Still getting failed DHCP on random clients going through the Ruckus AP's, all models.

Just as information, I have no storm control currently setup, and the switches see DHCP traffic passing, as I turned on "DHCP Relay", or basically the IP helper for it, to sniff the traffic. Obviously it's not actually relaying anything, just seeing the packets go through. I really need to get a Wireshark setup for this one I think, maybe PTRG with MIB's, to try and find out what the heck is going on.

Anonymous · ‎08-17-2017

Hi guys,

today we are facing same issues on our network,please suggest the how to resolve this issues.

michael_brado · ‎08-17-2017

Ruckus' Directed-Multicast, that will convert the first 5 IGMP group members traffic from UDP to TCP that you wondered about, would have no effect on DHCP.

The overall volume of DHCP requests the server needs to handle, across all VLANs is where my mind is drifting, and it almost would take WireShark to definitively
determine who is or isn't sending or answering DHCP request/renewals.

Do all of your APs typically have over 100 clients on them? Can you tell if the problem reports come from APs that might have more clients than other APs?
Are the intermediate switches able to switch the Ethernet traffic fast enough, any buffering or overflows? How about the DHCP server switch port?

Malformed DHCP sporadically coming from clients connected via Ruckus AP, REQUEST/ACK