Re: SZ Cluster node rename - Page 2

Greg_WiGuy · ‎01-13-2020

I'm planning some overdue architectural updates to our SZ124 deployment

1-Migrating our current only SZ124 from 1G to 10G

Requires new Managment IP
Would like to update the hostname (sz01.locationA.mycompany.com)

2-Deployment of a new SZ124 with 10G uplink for failover

Will be added to the same cluster as current SZ
Will be hosted at a second location for geo-redundancy (sz01.locationB.mycompany.com)

My concern is, without having much experience with clustering, the hostnames and a possibility of conflict. As far as I can tell it looks like only the hostname(first word before .) is considered as part of the node name and not the full domain.

If I went with our current deployement standards, I'd end up with two sz01 hostnames. Should I be concerned about this? Or should I try deploying unique hostnames like sz01a.locationA and sz01b.locationB...

also wondering if there's a way to rename a cluster? It was deployed before my time using cisco "wlc" as the convention that kinda irks me 😉

albert_pierson · ‎01-15-2020

Hi 80211WiGuy,

Just for your better understanding when you cluster 2 SZ nodes together they are both active and both will be managing Access Points. The Access Points are configured with a list of available controllers in a pseudo random order so they should be spread out between all nodes.

There is no concept of a primary node in this configuration, both nodes are considered peers. One node will elect to become the leader primarily to act as single point of time synchronization and also to prevent data base conflicts. The leader node may change if the present one stops responding. So labeling one as "primary" and the other "backup" could be misleading.

To be correct this is not 1+1 redundancy (where one device backs up another) but better described as Active/Active or peer clustering.

While supported, you need to be careful with geographically separated nodes in a single cluster. The Data Base is shared between the two nodes and communication between the nodes needs to be impeccable with minimal latency to prevent introducing errors into the data base. Typically the latency should be <50msec for newer versions.

It is certainly up to you to name your nodes anyway that makes sense to you, but I wanted to describe the operation for you information.

I hope this information is helpful.

Cheers,

Albert

Greg_WiGuy · ‎01-15-2020

Thanks Albert I appreciate your detialed response.
Our SE informed us that the SZ cluster functions in an N+1 manner.

They way I understood it is that in a 2 node cluster, one node takes on the full AP load with the other in sync but not controlling any APs unless the first node fails. In a 3 node cluster, there is still 1 node in sync but acting as a standby unit while the other two spread the AP load. Am I mistaken?

Regarding latency, we have dedicated fiber between the two with very low latency >1ms.

Thanks,
Greg

albert_pierson · ‎01-15-2020

Hi Greg,

Yes, I believe the SE's misuse N+1 to describe the SZ cluster feature. I have been supporting SZ since it was first envisioned and it has always operated a Active/Active Peer operation where the AP's where the AP's are spread out between the nodes automatically.

The SZ will send a list of available controllers to the AP's and the AP's try to connect to the first node in the list. If that SZ does not respond then the AP's will (within about 60 seconds) automatically try to connect to the next node in the list. This list is sent in a psudo random order to try to spread out the load. When a new node is added or a node is deleted to the cluster then the AP's are re-configured with C-list (controller list) of available nodes.

In a 3 node cluster, all 3 nodes are active and the AP's should be spread out between the nodes. Any of the nodes can be the leader which has no special operational activities beyond providing synchronization (it acts as NTP client to network and NTP server for other nodes and the AP's so all devices are running same time).

The SZ cluster system is designed to survive a single node failure. There are two copies of all data in two nodes. If one node fails the data is still available. In the case of a 3 or 4 node cluster it is important to delete a failed node as soon as it is detected so the data base can be reconfigured with 2 copies of each data in case another node fails. If 2 nodes fail at the same time then there will be serious data base issues which may require restoring a backup.

In vSZ-H or SZ-300 there is a AP balance feature that will re-balance AP's across all nodes if over time or due to outage they have become unbalanced. There is also a node affinity feature (I think also in vSZ-H or SZ-300) that allows you to tie AP Zones to a particular node. This is useful if your nodes are geographically separated.

I hope this information clears up your doubts,

Cheers

Albert

Greg_WiGuy · ‎01-15-2020

Thanks Albert,
This has some serious implications to our design and I'm glad you brought them up before we had a chance to deploy. I'll have to go back and spend some time thinking on this.

Is there a way to design a SZ implementation with geo-redundant failover? 95% of our deployments are sporadic hotspots which use the GRE dataplane feature. We don't mind if the clients undergo re-association+DHCP on a new subnet in the case of major failure which is how we were going to originally do this.

albert_pierson · ‎01-15-2020

Hi Greg,

Ruckus has had a Geo-Redundancy feature since version 3.6 but as far as I can tell it still only support vSZ-H and SZ-300 and not SZ-100 or vSZ-E. The idea is to have a separate cluster that can act as Standby Cluster in case of catastrophic failure of a NOC where an entire cluster becomes unavailable. The configuration is automatically synced from Active to Standby. As of SZ 5.1 this also support Active::Active redundancy and Many to one standby modes.

I have been searching Ruckus internal documents and as far as I can tell this feature has not yet been implemented in SZ-100 or vSZ-E controllers.

If I find differently I will update this forum.

Thanks

Albert