cancel
Showing results for 
Search instead for 
Did you mean: 

ICX 6610 high cpu usage

tadesxz
New Contributor II
Hi guys,

We have a pair of ICX 6610 stacked and both of them are running version 08.0.30tT7f3 (latest).

We are facing a very strange behavior with these boxes. The latency for all IPs configured in it, when we ping, returns with high latency even locally.

Below you can find two examples of the problem that we are facing.

##########################
marceloaraujo@CT-REDES-02[19:55][~]: ping 191.252.191.1
PING 191.252.191.1 (191.252.191.1) 56(84) bytes of data.
64 bytes from 191.252.191.1: icmp_seq=1 ttl=60 time=90.6 ms
64 bytes from 191.252.191.1: icmp_seq=2 ttl=60 time=230 ms
64 bytes from 191.252.191.1: icmp_seq=3 ttl=60 time=1.69 ms
64 bytes from 191.252.191.1: icmp_seq=4 ttl=60 time=3.59 ms
64 bytes from 191.252.191.1: icmp_seq=5 ttl=60 time=0.753 ms

PING 191.252.203.1 (191.252.203.1) 56(84) bytes of data.
64 bytes from 191.252.203.1: icmp_seq=1 ttl=60 time=1.00 ms
64 bytes from 191.252.203.1: icmp_seq=2 ttl=60 time=1.49 ms
64 bytes from 191.252.203.1: icmp_seq=3 ttl=60 time=121 ms
64 bytes from 191.252.203.1: icmp_seq=4 ttl=60 time=106 ms

Another strange thing is the CPU usage. The 1 second statistic show us spikes, may be this is the reason of latency.

###########################
65 percent busy, from 1 sec ago
1   sec avg: 65 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
Less than a second from the last call, abort
1   sec avg:  1 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
1 percent busy, from 1 sec ago
1   sec avg:  1 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

Sspcrdc2ita001#sh cpu-utilization 
1 percent busy, from 39 sec ago
1   sec avg: 73 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
Less than a second from the last call, abort
1   sec avg:  7 percent busy
5   sec avg:  3 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

When we run a "show cpu tasks" the number seems to be good, as below.

spcrdc2ita001#show cpu tasks       
... Usage average for all tasks in the last 1 second ...
==========================================================
Name %

idle                9               
con                  0               
mon                  0               
flash                0               
dbg                  0               
boot                0               
main                0               
stkKeepAlive        0               
keygen              0               
itc                  0               
poeFwdfsm            0               
tmr                  0               
scp                  0               
appl                91              
snms                0               
rtm                  0               
rtm6                0               
rip                  0               
bgp                  0               
bgp_io                                  0                         
ospf                0               
ospf_r_calc          0               
openflow_ofm        0               
openflow_opm        0               
mcast_fwd            0               
mcast                0               
msdp                0               
ripng                0               
ospf6                0               
ospf6_rt            0               
mcast6              0               
ipsec                0               
dhcp6                0               
snmp                0               
rmon                0               
web                  0               
acl                  0               
flexauth            0               
ntp                  0               
rconsole            0               
console              0               
ospf_msg_task        0               
ssh_0                                   0           

All the interfaces are fine (bandwidth consumption is low) and we have no problems with PPS (Packets per second) rate.

You can find an image from our monitoring tool with CPU,Memory usage and response time.

Did anynone experience this? Do you have any suggestions for tourbleshooting/fix that?

Image_ images_messages_5f91c3f5135b77e2478faafe_e3600462fba16a8519530844f8104f65_RackMultipart201911041234121ui-d839e27b-e937-4ea5-9c4d-8d6e511d4887-1389148431.png1572894779
Image_ images_messages_5f91c3f5135b77e2478faafe_3e189622314ec2da2105051229ddf5c0_RackMultipart20191104121050vpk-b019d020-e732-4198-bf5b-792cc770f582-1959949053.png1572894910

Thanks,
Marcelo Tadeu



8 REPLIES 8

ala_ma
New Contributor II
I will open a new case after christmas vacancys.  I saw that we lost communcation with many interface when the cpu reach 100%.

Likewise, if it happens again I'll open a ticket as well.. I ended up getting a maintenance window since we jumped from a v7.3 code to the 8.0.30 code, to pull the power and do a cold-boot.  Still no crypto key or SSH...  So maybe I won't experience it again.  SSH isn't that important on our LAN at this time anyway, telnet will work.

ala_ma
New Contributor II
Hello!
I replaced the ICX 6610 and it not solve the problem.  We have always a high CPU 100% on my stack.  I have SSH and crypto key on all my others 10 routeurs and there is no problem with that.  I found a documentation that it could help, I will take a look at that.  Here's the link:
https://support.ruckuswireless.com/articles/000007306

If you find a solution please share.
Thank you!

tadesxz
New Contributor II
Hi guys,

Just an update: In our case the problem was due to a mismatch config at the servers side. We opened a TAC and we found that if we have a NIC teaming in the servers side, but doesnt use LACP 802.3ad and it causes the issue. Our servers are running XCP (Citrix Xenserver Open) and if they are not running vSwtich (802.3ad is only possible if running vSwitch) we got a mess.

The solution that we`ve found is turning the NIC teaming in a passive-active mode. In that way, the switches only learns the mac address from server in one port at time. 

If the same mac came from two differents ports at the same time, CPU goes high and cause all the problems.

Hope i can help you.

Regards,