Afternoon team, really struggling and need some solid advice.
We are randomly experiencing near enough complete network outages because of end user equipment. In most cases it’s a copier or printer. It’s a long story but I’ll keep it short.
The network is as below:
Third Floor (core switch)
ISP Internet Router (GATEWAY DEVICE for all 30 vlans)
Stack 1 : 5 x IXC 7150POE, 10gb single link to first floor and Ground Floor Stack
6 x ICX 7150, linked using 1gb daisy chain. Switch 6 has a 10gb to the third floor stack.
Stack 2 : 5 x IXC 7150POE Switch 5 has a 10gb to the third floor stack.
Each client device port setup as admin edge port and bpdu enabled.
Rstp configured for all vlans.
Internet router stack one port 1/1/1 , 30 access Points,
This is a multi dwelling office for shared clients. Each is a small office maximum 30 devices. Comprising off wireless and floor ports. Secured using vlans and DPSK for Wireless.
We do not have control of the floor ports; we cannot offer any mac address registration.
Ramdomly network ginds to a hault, we can see this is caused by flooding of multicast packets. Example last night a device (hp printer) sent billions of packets which in turn brought the lan down. After removing the printer from the network returned to normal after around 20 mins.
We can see when we ran the command “de” the na_learn was jumping up in hundreds.
Show int e x/x/x shows 1549600 multicast and constantly growing.
So to sum up, a HP MFC printer in one VLAN brought every vlan down!
What can we do to protect or minimise this occurring again. Please note we have no control in what these offices can do. We also cannot disable any LAN ports. I’m sure there must be something we can enable on these ICX switches? Please shout if you need anything more info.
Cheers in advance! But I’d love to know what others do to stop multicasting doing this?
I have a couple of questions for you, by any chance, do you have a high CPU while you have the multicast issue? If the NA_LEARN is growing, have you checked if you have duplicated mac-address? -When did you start to have this issue? -Any new changes in the network like a new device added or an upgrade in any of the involved switches?
I think that in your particular scenario, it would be better to open a support case with us so we can troubleshoot and find the root cause.
A little more info, we went on site Sunday 02:00 to carry out tests and confirmed that the printer is sending billions of requests that cause the entire switch system to grind to a freeze. During the outage and with the network cable in the printer plugged into the network we can see all the switches led lights are now all flickering exactly at the same time. We can also confirm random switches are having high cpu. We have also checked that there is no duplicated max address with ruckus support.
Now when we remove the network cable from the printer the network settles down after around 15mins. We carried out a port mirror and using wireshark we sew the attached billion multicast packets are being sent.
We have also confirmed, if we enable IGMP Snoop on the vlan where the printer is connected it resolves the issue. No we need to know how and why is it resolving it?
What I'm thinking could be the issue is that the printer is probably using BONJOUR protocol. Bonjour uses Multicast DNS (mDNS) to provide the ability to perform DNS-like operations on the local link in the absence of any conventional Unicast DNS server. So, in my experience, when we have devices that use mDNS, we experience a lot of multicast issues, which could also explain why IGMP snooping fixes the issue, as you know IGMP snooping provides multicast containment by forwarding traffic to only the ports that have IGMP receivers for a specific multicast group (destination address). A device maintains the IGMP group membership information by processing the IGMP reports and leave messages, so traffic can be forwarded to ports receiving IGMP reports. In resume, the printer was probably sending a lot of mDNS packets and using IGMP snooping helped. You can verify this by using dm raw commands, when the switch is experiencing high CPU, the multicast traffic is seen hitting the CPU as well. However, this is only an "orientated guess" based on the symptoms and my personal experience. In order to provide an accurate root cause you need to open a support case with us.
Ayleth Alvarez Sr Technical Support Engineer | L2 TAC Wired