So here's my latest findings on this issue.
I recently migrated from a single node vSZ to a two node cluster running 5.1.1.0.598. While working with support to get the licenses migrated we chatted about this issue. Two things came out of the conversation which I am in the process of implementing.
First, We switched from Proxy (controller) based RADIUS to AP based RADIUS. I had opted against this in our original deployment years ago because I did not want to create ~200 RADIUS Clients on each RADIUS server. Turns out, if all your APs are on the same subnet, you can specify that subnet CIDR as the IP and cover all APs in a single entry. Then you configure your WLANs to use an AP RADIUS auth method.
Secondly, 5.1.1.0.598 apparently has a known issue in which the RADIUS service restarts unexpectedly. This is supposedly resolved in 5.1.2. I am planning a maintenance window to perform this update after I let the AP auth changes settle.