cancel
Showing results for 
Search instead for 
Did you mean: 

ICX 6610 high cpu usage

tadesxz
New Contributor II
Hi guys,

We have a pair of ICX 6610 stacked and both of them are running version 08.0.30tT7f3 (latest).

We are facing a very strange behavior with these boxes. The latency for all IPs configured in it, when we ping, returns with high latency even locally.

Below you can find two examples of the problem that we are facing.

##########################
marceloaraujo@CT-REDES-02[19:55][~]: ping 191.252.191.1
PING 191.252.191.1 (191.252.191.1) 56(84) bytes of data.
64 bytes from 191.252.191.1: icmp_seq=1 ttl=60 time=90.6 ms
64 bytes from 191.252.191.1: icmp_seq=2 ttl=60 time=230 ms
64 bytes from 191.252.191.1: icmp_seq=3 ttl=60 time=1.69 ms
64 bytes from 191.252.191.1: icmp_seq=4 ttl=60 time=3.59 ms
64 bytes from 191.252.191.1: icmp_seq=5 ttl=60 time=0.753 ms

PING 191.252.203.1 (191.252.203.1) 56(84) bytes of data.
64 bytes from 191.252.203.1: icmp_seq=1 ttl=60 time=1.00 ms
64 bytes from 191.252.203.1: icmp_seq=2 ttl=60 time=1.49 ms
64 bytes from 191.252.203.1: icmp_seq=3 ttl=60 time=121 ms
64 bytes from 191.252.203.1: icmp_seq=4 ttl=60 time=106 ms

Another strange thing is the CPU usage. The 1 second statistic show us spikes, may be this is the reason of latency.

###########################
65 percent busy, from 1 sec ago
1   sec avg: 65 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
Less than a second from the last call, abort
1   sec avg:  1 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
1 percent busy, from 1 sec ago
1   sec avg:  1 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

Sspcrdc2ita001#sh cpu-utilization 
1 percent busy, from 39 sec ago
1   sec avg: 73 percent busy
5   sec avg:  1 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

spcrdc2ita001#sh cpu-utilization 
Less than a second from the last call, abort
1   sec avg:  7 percent busy
5   sec avg:  3 percent busy
60  sec avg:  1 percent busy
300 sec avg:  1 percent busy

When we run a "show cpu tasks" the number seems to be good, as below.

spcrdc2ita001#show cpu tasks       
... Usage average for all tasks in the last 1 second ...
==========================================================
Name %

idle                9               
con                  0               
mon                  0               
flash                0               
dbg                  0               
boot                0               
main                0               
stkKeepAlive        0               
keygen              0               
itc                  0               
poeFwdfsm            0               
tmr                  0               
scp                  0               
appl                91              
snms                0               
rtm                  0               
rtm6                0               
rip                  0               
bgp                  0               
bgp_io                                  0                         
ospf                0               
ospf_r_calc          0               
openflow_ofm        0               
openflow_opm        0               
mcast_fwd            0               
mcast                0               
msdp                0               
ripng                0               
ospf6                0               
ospf6_rt            0               
mcast6              0               
ipsec                0               
dhcp6                0               
snmp                0               
rmon                0               
web                  0               
acl                  0               
flexauth            0               
ntp                  0               
rconsole            0               
console              0               
ospf_msg_task        0               
ssh_0                                   0           

All the interfaces are fine (bandwidth consumption is low) and we have no problems with PPS (Packets per second) rate.

You can find an image from our monitoring tool with CPU,Memory usage and response time.

Did anynone experience this? Do you have any suggestions for tourbleshooting/fix that?

Image_ images_messages_5f91c3f5135b77e2478faafe_e3600462fba16a8519530844f8104f65_RackMultipart201911041234121ui-d839e27b-e937-4ea5-9c4d-8d6e511d4887-1389148431.png1572894779
Image_ images_messages_5f91c3f5135b77e2478faafe_3e189622314ec2da2105051229ddf5c0_RackMultipart20191104121050vpk-b019d020-e732-4198-bf5b-792cc770f582-1959949053.png1572894910

Thanks,
Marcelo Tadeu



8 REPLIES 8

BenBeck
Moderator
Moderator
I wouldn't test latency by pinging the ICX itself as we do not prioritize replying to ping. Do you still see latency spikes pinging through the ICX? With that said, if you are seeing CPU spikes, that could certainly lead to issues if the ICX is overwhelmed momentarily. I would search for any kind of control traffic that could be hitting the CPU or any logs that might show some kind of trigger at moment of CPU spike. Your best best may be to open a support case and have our TAC team try to help track it down. 
Ben Beck, RCNA, Principal Technical Support Engineer
support.ruckuswireless.com/contact-us

tadesxz
New Contributor II
Yes, when we pass through the box our latency gets increased. It's not bad as when we ping ICX but it affects our machines behind ICX.

This is a MRT from the machine behind ICX going to our customer IP (inside -> outside).

Image_ images_messages_5f91c44b135b77e247a0c3ac_33882f6e37cf32aad33579cec82411ea_d53415d0db78841427bc7fc88c9619-c0adb653-fb0b-45e1-90ec-b3c8f5a9c2c7-1700625113.iix

From outside to inside:Image_ images_messages_5f91c44b135b77e247a0c3ac_c2817669a1e7ee7c3c22c7049a717c0f_RackMultipart20191104120416czk-3307433f-d9da-4cf9-a8cd-4a8da93c4f35-301892634.png1572896600
Image_ images_messages_5f91c44b135b77e247a0c3ac_043e1ea67ad5fcb90fc31ceea14bf177_RackMultipart2019110499395e80l-4a85c0e5-a0ae-4c15-a88a-fb2bddea88b8-400490760.png1572896636
We'll open a TAC to check it out and will post the fix here.

Thanks for attention.

Regards,
Marcelo Tadeu



ala_ma
New Contributor II
Same problem with a ICX 6610.  Console is very slow in SSH.

sam_abbott
New Contributor
Same issue with a stack of two ICX6610s.  Happened after crypto key generate for SSH use...  Showing CPU tasks before reloading appl and tmr were the only ones causing the high CPU.  Basically all network traffic was dropping... Uptime was less than 14 days on the stack.

Since the only change I made was generating the crypto keys for SSH, I zeroize'd that out...  Similar network load testing and it's not happening now.  Not sure what the deal is.  Anyone else have this issue or have a resolution?

Copyright (c) 1996-2016 Brocade Communications Systems, Inc. All rights reserved.

    UNIT 1: compiled on Feb 13 2019 at 18:30:50 labeled as FCXR08030t

(10545807 bytes) from Primary FCXR08030t.bin

        SW: Version 08.0.30tT7f3 

    UNIT 2: compiled on Feb 13 2019 at 18:30:50 labeled as FCXR08030t

  (10545807 bytes) from Primary FCXR08030t.bin

        SW: Version 08.0.30tT7f3 

  Boot-Monitor Image size = 370695, Version:10.1.00T7f5 (grz10100)