03-26-2025 02:24 PM - edited 03-26-2025 02:26 PM
I recently joined a new company where they use ICX7450 as core and distribution switches.
Distribution switch is connected by trunk to 6 3rd party field switches.
Additionally these field switches have the following connections.
1 and 2 connected together
3 and 4 connected together
5 and 6 connected together
All field switches are L2 switches so routing is not an option unfortunately.
Distribution switch has per vlan 802-1w enabled, all field switches have standard RSTP enabled.
Core switch has STP completely off (no idea why)
distribution switch has priority set to 0 on all vlans.
The problem we have:
We always need to have one port shut down for example to switch 1 or switch 2 (also either 3 or 4 and either 5 or 6) on distribution switch. When we enable both links creating a loop we get random behavior. Sometimes broadcast storm starts right away, we see distrib 7450 get CPU spike to 20% but surprisingly we had many situations where everything was working fine for hours. I logged in to field switches where I saw they correctly elected alternating route and put one port in discarding mode. No CPU spike on distribution at all. I went home only to wake up to hundreds e-mail from Solarwinds and network was completely down. Turns out for whatever reason one field switch turned it's discarding port to forwarding.
Thinking that it's a problem with classic RSTP on field switch unable to talk to per vlan RSTP on 7450 I enabled single instance of 802-1w on distrib switch. In theory everywhere except for Core we have classic RSTP. This makes problem even worse. Broadcast storm right away when we create a loop.
I have already spent 2 weeks fighting with this and I pull my hair on this. Why is it happening.
Oh I should add that our distribution switch is running SPR code because we have VE interfaces on it. Does SPR not work with STP?
Anyway please help me identify the problem.