09-30-2021 02:36 AM - last edited on 09-21-2022 04:31 AM by Anusha_Vemula
In this self-help discussion, I will explain the precautions that needs to be taken before planning for an upgrade activity.
There can be many reasons for an upgrade failure to occur during the upgrade activity. It can be a network related issue/it can also be a hardware related issue.
Firstly, we need to validate that the issue is not related to the network.
How to check throughput between two nodes.
It is very important to check throughput between two nodes when we plan for an Upgrade activity.
Reasons to check the throughput:
1)If the throughput is more between the nodes in different datacenters, we may end up with upgrade failure and result in network down situation.
In most of the cases we have noticed the below error during the upgrade.
Configuration backup would fail with message "Unable to back up the configuration. Reason: Fail to backup configuration, reason: Fail on sync configuration backup file to the nodes in cluster"
As per engineering, the minimum link speed should be between 15mbps to 20mbps for normal operation between the nodes.
vSZ is throwing an error OPERATION_BLOCK_AFTER_UPGRADE_FAIL_EXCEPTION
The logs revealed that Elasticsearch KSP was applied simultaneously while the cluster was upgrading. We strongly recommend to not to perform such operations during the upgrade. This was the root cause of Upgrade_Fail_Exception.
Steps to recover:
Enter CLI on both nodes
execute "reload now" command on both nodes at the same time
After 2 nodes get reboot successfully and all apps in service
Customer should do "cluster backup" manually
Customer can upgrade cluster again (make sure cluster backup and cluster upgrade done separately)
Wait for upgrade complete.
In case of Virtual SmartZone, it is mandatory to have resources allocated to the instance as per the user guide. Ruckus offers Benchmark tool to validate the resource allocation before upgrade.
LAB-SZ-01# debug
LAB-SZ-01(debug)# diagnostic
LAB-SZ-01(debug-diagnostic)# upload ftp://<username>:<password>@<ftp-host>/<file-path>
LAB-SZ-01# debug
LAB-SZ-01(debug)# diagnostic
1.) System Performance Qualification(CPU/IO)
2.) CPU benchmark
3.) IO benchmark
Optional
(WARNING: The option to stop all services can help for checking system capability
more precisely, but this action will impact overall controller operations and
new client connections. Please remember to start services again when finished.)
4.) Stop all services with maintenance mode
5.) Start all services with clear maintenance mode
Select Option (1-5):
Option 1 is to execute an overall qualification both of CPU and I/O, including a summary result if the tests pass or fail
Option 2 is to execute CPU performance testing only, and it will provide more detailed results. To be run if requested by Ruckus TAC
Option 3 is to execute I/O performance testing only, and it will provide more detailed results. To be run if requested by Ruckus TAC
Option 4 is to stop all services in this controller, in case benchmark needs to be run without any load in the controller
Option 5 is to resume all services in this controller
Note: This KSP can be installed and executed in vSZ controllers running software 3.4.2, 3.6.2 or 5.1.1. For later releases a similar tool is already part of the software.