11-16-2023 12:54 PM
Recently noticed a recurring issue where numerous Access Points were experiencing random disconnections from the control interface of the vSZ-H. The state of these APs frequently involves disconnection and hopping across nodes. Refer to the checklist below to gain insights into this issue and discover a workaround to address and overcome the problem.
Issue noticed:
Numerous APs experiencing disconnections from control planes and bouncing to different cluster nodes. The frequency of these node changes was notably high, with the same APs observed hopping between nodes every minute and, at times the issue appeared after hours. The issue was very random.
Issue observed in vSZ-H version 6.1.1.0.959.
Checklist:
Observed Impact:
The impact can range from minimal to severe, contingent on the total number of APs or APs associated with the cluster node affected. A change in cluster nodes triggers a restart of SSH tunnels, enabling APs to download the updated configuration from the zone. This AP bounce and configuration download can have ramifications for WIFI operation, with the impact escalating significantly when dealing with a larger number of APs.
Logs to confirm:
Observe below logs from the AP support info files;
Sep 14 23:14:05 xxxxxxx-AP daemon.err idm: httpRecv receive fail
Sep 14 23:14:05 xxxxxxx-AP daemon.info rsmd_func[9098]: SSH Tunnel Stopped
Sep 14 23:14:05 xxxxxxx-AP daemon.notice rsmd[45]: sshclient ....... [stopped] (0.686)
Sep 14 23:14:05 xxxxxxx-AP daemon.err rsmd_func[9115]: SSHtunnel: Cannot start SSH-Tunnel. rsm_ip6_sgetSettingsWrapper() function failed to execute. RSM API Return value = 35 : unknown err code 35
Sep 9 00:08:55 xxxxxxx-AP user.err syslog: dbclient - Restarting SSH tunnel due to dbclient restarting itself
Sep 9 00:08:55 xxxxxxx-AP user.info syslog: /usr/bin/dbclient: Connection to sshtunnel@172.25.207.192:22 exited: No auth methods could be used.
From the SZ logs, we observed the error for authorizedKey and the connection status was closed by AP.
2023-09-27T18:22:18+00:00 vszxxxxx sshd[12236]: error: AuthorizedKeysCommand /usr/bin/sshtunnel_auth_key.sh sshtunnel ssh-rsa
2023-09-27T18:22:18+00:00 vszxxxxxx sshd[12236]: Connection closed by 10.x.x.x.x port 49404
2023-09-27T18:29:54+00:00 vszxxxxxx sshtunnel_auth_key[29977]: Can not find key ssh-rsa
Based on the aforementioned logs and conditions, a support engineer can decipher the cause of the "Can not find key" error by exploring various combinations. In our specific case, the analysis revealed that a potential trigger for this issue could be a situation where the Command Line Interface to keycached is shorter than the timeout duration set for keycached.
Direction to resolve:
If you observe a similar behavior, please adhere to the checklist below and gather the following information: