Problem solved. For me, it was a router problem. Normally, my infrastructure has reserved IP addresses and specific host name configurations. As the result of a few router issues, I’ve been swapping some routers around, got lazy and my infrastructure’s been pulling IP addresses from the DHCP pool with no special configuration. I reserved IP addresses for my two problematic APs and configured their host names as I usually do. I haven’t had a heartbeat problem since.
I don’t set a preference on which AP serves as the master, primarily for failover purposes. What I notice in this configuration is that they mask themselves as each other. For example, if I try to connect to the IP address of the non-master, it will connect me to the master. Looking at my router, I noticed two things.
- About the time the APs would lose their heartbeat, they would also disappear out of the DHCP table.
- When the APs were present in the DHCP table (in the pool), they had the same host name.
My theory is that as the two APs try to mask themselves as each other, at least from the router’s perspective, the router got confused. When the APs would attempt a heartbeat, the packets would occasionally go to the wrong AP and the heartbeat would fail.
So far so good. I’m still running 200.7.