09-08-2023 03:38 PM
Last night we experienced several systems on our three-member ICX7450 stack:
Everything pointed to a high CPU.
We failed to get proper serial interface in place, so we power cycled unit #1 and then took the opportunity to upgrade from 09.0.10c to 09.0.10f. Unfortunately a few hours later we lost SSH, saw ICMP ping failures, and SNMP-based graphing, though we haven't seen any BGP sessions bouncing.
Via the serial interface we could see CPU 0 was high, and these are the processes that high:
2564 root -20 1 678.8m 381.2m 8.0 19.1 48:00.38 R `- bcmINTR
2703 root 26 6 678.8m 381.2m 36.0 19.1 44:34.67 S `- bcmCNTR.0
2742 root 21 1 678.8m 381.2m 12.0 19.1 41:20.12 R `- bcmRX
2807 root -18 6 678.8m 381.2m 28.6 19.1 131:01.63 R `- ZMQbg/1
2988 root 26 6 678.8m 381.2m 16.0 19.1 89:58.03 S `- os_pkt_intx_tx
Best guess is that something is hitting the CPU pretty hard.
Is there a way to see what packets are hitting the CPU?
09-11-2023 01:26 AM - edited 09-11-2023 01:41 AM
Hi frnkblk ,
Thank you for posting you query !!!
I comprehend that you were experiencing multiple issue on ICX7450 STACK and that points towards high cpu
and you would like to know what packets are hitting the cpu .
Below is the command where you can see what packets are hitting cpu .
- Dm raw mode brief
- Dm raw filter none
- Dm raw max 10
- Dm raw
But i would suggest you to run the above under tac supervision on a session .
Moving Forward If this issue is not resolved , Please log a ticket with the below link so that we will help you further
https://support.ruckuswireless.com/contact-us
I hope this information helps you
Please feel free to leave us a message if any concerns
09-11-2023 07:08 PM
Thanks for that information.
Based on the output, I set the arp-age timeout for the VLAN used for the router's inband management to "0" (which is infinite) and immediately the CPU dropped.
Any idea why?
09-12-2023 01:00 AM
Hi Frnkblk,
Thank you for the update .
As I can see after changing arp-age timeout for the vlan the cpu dropped . I am suspecting ARP cache were retaining the learned entries that are no longer valid that's what triggering the High cpu.
Moving Forward If this issue is not resolved , Please log a ticket with the below link so that we will help you further.
https://support.ruckuswireless.com/contact-us
I hope this information helps you
Please feel free to leave us a message if any concerns
Thanks
01-14-2025 07:05 AM
I believe the issue is finally resolved. Using the "dm raw" command I looked at packets hitting the CPU. About 10 to 15% were from several different Brazilian IPs in the same /22. I blocked that traffic using a simple ACL on the uplink and instantly the CPU dropped. Looking more carefully at the packet output, I noticed they were all hitting port 443 of some of the router's L3 interfaces -- that suggested that some kind of bot was likely trying to connect to the router's web interface.
I used my web browser to connect to the router's web interface and while the web page struggled to fill in, the router's CPU went right up.
Turns out that even though "web" is not listed in the "management access src-ip" command, the web interface is still accessible by default! I then explicitly disabled the web interface:
no web-management http
no web-management https
web-management disable
Afterwards I removed the ACL on the uplink.
We've been CPU stable for over week.