We have multiple ICX 7750 and 7450 in our DCs. Randomly we get SSH failing after approx 2 to 4 weeks. To fix we have to go to DC and console onto device and apply crypto zeroise then crypto generate CLI commands. This is impacting and time consuming. Any ideas why it is doing this? Code version is 8.0.40 and IP SSH idle-time has also been set to 10 mins but still the issue occurs?
It is probably a bug. We have NOT encountered this though years ago there was a bug where we would not be able to connect to something that hadn't been connected to in a long time. When you would open your SSH session it would hang for a while then fail... but a subsequent attempt worked fine because the process started up on the switches.
I do not remember what version that was presumably some ancient 08.0.30 code from years ago. Honestly, everything has been rock solid for us on all our 64xx, 6610, 7150's and 7450's.
We are running 08.0.80ca on everything that supports it, and it has been bug-free for us as is 08.0.30sa on everything that doesn't support the 08.0.80ca.
Here is how we do our SSH configuration (I am not saying you are doing yours wrong on that collectively this is the configuration options we use that have something to do with SSH...)...
Then we make a list where device management can come from... edit to suit your taste:
ip access-list standard 99 permit host 10.1.2.3 permit 10.1.0.0 0.0.255.255 ! exit
Without Radius we use this block:
aaa authentication web-server default local aaa authentication enable default local aaa authentication login default local aaa authentication login privilege-mode
enable aaa console
console timeout 30
no telnet server
no web-management http web-management https ! ssh access-group 99 web access-group 99 !
ip ssh authentication-retries 2 ip ssh timeout 30 ip ssh idle-time 30 ip ssh scp disable ip ssh encryption disable-aes-cbc
I would probably try a code update and consider adding or changing some of your configuration options to include some of the above arguments... certainly there is quite a bit more than SSH going on here, but I included related material.
We have most of this configured. For a bit more detail. the ssh works fine for a period of time then just stops so we cannot use ssh to connect. This is time consuming to attend a remote DC to connect on console to fix. We had a suspicion a security monitoring system we have which runs multiple connections to each switch might be causing this problem hence we dropped the IP ssh idletime to 10 mins in the hope we do not overload the amount of ssh sessions on one device. The real confusing part is on another network we have with the same setup and same security monitoring tool, and ICXs used for its OoB network and running the same version of code never has the issue..