In this self-help discussion, I will explain the precautions that needs to be taken before planning for an upgrade activity.
There can be many reasons for an upgrade failure to occur during the upgrade activity. It can be a network related issue/it can also be a hardware related issue.
Firstly, we need to validate that the issue is not related to the network.
How to check throughput between two nodes.
It is very important to check throughput between two nodes when we plan for an Upgrade activity.
Reasons to check the throughput:
1)If the throughput is more between the nodes in different datacenters, we may end up with upgrade failure and result in network down situation.
In most of the cases we have noticed the below error during the upgrade.
Configuration backup would fail with message "Unable to back up the configuration. Reason: Fail to backup configuration, reason: Fail on sync configuration backup file to the nodes in cluster"
As per engineering, the minimum link speed should be between 15mbps to 20mbps for normal operation between the nodes.
vSZ is throwing an error OPERATION_BLOCK_AFTER_UPGRADE_FAIL_EXCEPTION
The logs revealed that Elasticsearch KSP was applied simultaneously while the cluster was upgrading. We strongly recommend to not to perform such operations during the upgrade. This was the root cause of Upgrade_Fail_Exception.
Steps to recover:
Enter CLI on both nodes
execute "reload now" command on both nodes at the same time
After 2 nodes get reboot successfully and all apps in service
Customer should do "cluster backup" manually
Customer can upgrade cluster again (make sure cluster backup and cluster upgrade done separately)
Wait for upgrade complete.
In case of Virtual SmartZone, it is mandatory to have resources allocated to the instance as per the user guide. Ruckus offers Benchmark tool to validate the resource allocation before upgrade.
How to install the tool (this procedure needs to be done in each controller where it will be executed):
If the controller is already in service the tool can be installed from WebUI:
Go to Diagnostics -> Scripts
Click on Browse and select the KSP file from your local computer
Click Upload to install it in the controller
The script should be seen in 'System Uploaded Scripts' section:
If the controller is not yet setup, the tool can be installed from CLI:
Place the file in an FTP server reachable from the controller
In controller CLI execute the following commands to upload it:
How to execute the tool (this procedure needs to be done in each controller separately):
In controller CLI, go to the menu where the tool is executed:
LAB-SZ-01# debug LAB-SZ-01(debug)# diagnostic
Execute it with this command, and you will see 5 possible options:
1.) System Performance Qualification(CPU/IO) 2.) CPU benchmark 3.) IO benchmark
Optional (WARNING: The option to stop all services can help for checking system capability more precisely, but this action will impact overall controller operations and new client connections. Please remember to start services again when finished.) 4.) Stop all services with maintenance mode 5.) Start all services with clear maintenance mode Select Option (1-5):
Option 1 is to execute an overall qualification both of CPU and I/O, including a summary result if the tests pass or fail Option 2 is to execute CPU performance testing only, and it will provide more detailed results. To be run if requested by Ruckus TAC Option 3 is to execute I/O performance testing only, and it will provide more detailed results. To be run if requested by Ruckus TAC Option 4 is to stop all services in this controller, in case benchmark needs to be run without any load in the controller Option 5 is to resume all services in this controller
Note: This KSP can be installed and executed in vSZ controllers running software 3.4.2, 3.6.2 or 5.1.1. For later releases a similar tool is already part of the software.