09-08-2023 03:38 PM
Last night we experienced several systems on our three-member ICX7450 stack:
Everything pointed to a high CPU.
We failed to get proper serial interface in place, so we power cycled unit #1 and then took the opportunity to upgrade from 09.0.10c to 09.0.10f. Unfortunately a few hours later we lost SSH, saw ICMP ping failures, and SNMP-based graphing, though we haven't seen any BGP sessions bouncing.
Via the serial interface we could see CPU 0 was high, and these are the processes that high:
2564 root -20 1 678.8m 381.2m 8.0 19.1 48:00.38 R `- bcmINTR
2703 root 26 6 678.8m 381.2m 36.0 19.1 44:34.67 S `- bcmCNTR.0
2742 root 21 1 678.8m 381.2m 12.0 19.1 41:20.12 R `- bcmRX
2807 root -18 6 678.8m 381.2m 28.6 19.1 131:01.63 R `- ZMQbg/1
2988 root 26 6 678.8m 381.2m 16.0 19.1 89:58.03 S `- os_pkt_intx_tx
Best guess is that something is hitting the CPU pretty hard.
Is there a way to see what packets are hitting the CPU?
09-11-2023 01:26 AM - edited 09-11-2023 01:41 AM
Hi frnkblk ,
Thank you for posting you query !!!
I comprehend that you were experiencing multiple issue on ICX7450 STACK and that points towards high cpu
and you would like to know what packets are hitting the cpu .
Below is the command where you can see what packets are hitting cpu .
- Dm raw mode brief
- Dm raw filter none
- Dm raw max 10
- Dm raw
But i would suggest you to run the above under tac supervision on a session .
Moving Forward If this issue is not resolved , Please log a ticket with the below link so that we will help you further
https://support.ruckuswireless.com/contact-us
I hope this information helps you
Please feel free to leave us a message if any concerns
09-11-2023 07:08 PM
Thanks for that information.
Based on the output, I set the arp-age timeout for the VLAN used for the router's inband management to "0" (which is infinite) and immediately the CPU dropped.
Any idea why?
09-12-2023 01:00 AM
Hi Frnkblk,
Thank you for the update .
As I can see after changing arp-age timeout for the vlan the cpu dropped . I am suspecting ARP cache were retaining the learned entries that are no longer valid that's what triggering the High cpu.
Moving Forward If this issue is not resolved , Please log a ticket with the below link so that we will help you further.
https://support.ruckuswireless.com/contact-us
I hope this information helps you
Please feel free to leave us a message if any concerns
Thanks