cancel
Showing results for 
Search instead for 
Did you mean: 

How to Troubleshoot High CPU utliization on the Ruckus vSZ-D virtual machine

scott_crace
New Contributor II
We are currently running our environment in two virtual machines: Ruckus vSZ-E and vSZ-D
According to the webUI we're running the following:
Controller Version 3.5.1.0.296
Control Plane Software Version 3.5.1.0.205
AP Firmware Version 3.5.1.0.419

We have a total of 10 x R720 and R610 APs in 1 zone with 3 AP Groups. Not a large deployment. Our subject matter expert left the company recently and I'm trying to walk through the training/documentation to become familiar with Ruckus.

Recently we've started experiencing our vSZ-D instance running at 98% CPU for long periods of time. I'm at a loss on where to begin troubleshooting this. I am able to SSH to the device and login/enable.

I've tried a few things on various posts and the tried and true graceful shutdown/restart. Could someone point me to a KB article or troubleshooting steps to start digging deeper?
11 REPLIES 11

scott_crace
New Contributor II
As a followup, from the CLI on the vSZ-D virtual machine, running show stats while VMWare is showing 98% only shows around 75%. The syslogs aren't showing anything out of the ordinary either.

I have opened a case and provided some initial information. I'll post more once I've worked with support a little more. However, I suspect the old 'upgrade the version' approach will be the recommended steps. That may indeed solve the issue but I wanted to find out why it suddenly started especially since the solution hasn't changed much since the initial deployment.

Interested hearing if support has some solution to this issue. I've been told it is normal that CPU is ~100% , it is how Intel DPDK based VSZ-D is supposed to work. I've been wondering what are the risks if you try to limit CPU resources on the vmware, especially in small low user density networks it feels a bit waste of CPU resources.

I got the same sort of response so you're not alone. Posting an overall reply as well.

scott_crace
New Contributor II
The final answer from support is that the VM is behaving as anticipated and the CPU is expected to be near 100% based on how the Intel DPDK poll mode driver performs.

It doesn't necessarily explain why vmware wasn't complaining about this previously or what might suddenly cause it to start alarming if it was running this way previously. We did observe that the VM stopped alarming for several hours after we migrated it to a different host due to maintenance activities on the host. A few hours later on the new host it started alarming again.

I plan on dropping the priority on this for my work load but will keep fiddling and possibly escalate to VMware support. I expect their answer will be that it must be the lack of vmware tools or consult Ruckus.

The code in 6.0 does the same, when I add DPDK card it consumes 100% of CPU. But if  remove passthrough dataplane card and fall back to virtio card, the problem is not better either, the whole guest consumes 64% steady from 3 processors. 

It is just bad code, most programmers have not learned computational complexity and algorithm design and analysis or they are too busy with deadlines. Many closed libraries provide very bad code too. I have seen so much linear search in telecom code even !!!