Reboot of RSTP master causes continued inaccessibility of RSTP "slave" (2 XMG1915-10Es)

cbeckstein
cbeckstein Posts: 17  Freshman Member
10 Comments
edited March 24 in Switch

Latest firmware V4.80(ACGO.1)20240305 | 03/05/2024 on both XMG1915-10Es

Both switches setup analogously behind a Fritz!Box in standalone mode:

IPV4 setup with static IPs (VLAN 1).

IPV6 configured for DHCPV6 (VLAN 1), Fritz!Box as DHCPV6 server which assigns
DNS, IA-PD and IA-NA; Fritz!Box also assigning ULAs for fd00::

IPV6 client setup for the XMG1915-10Es with IA-NA, Rapid Commit, DNS and Domain List set.

IPV6 router discovery configured for VLAN 1 with both: M and O flags set

The RSTP master (let us call it AZ) is connected to the slave (let us call it WZ) via SFP+-DAC10G; net "topology" is Fritz!Box <—> AZ <—> WZ

Without user intervention, both switches and the net run fine and do their duty.

Now, when I manually reboot the RSTP master AZ via its Web interface (Maintenance —> Reboot System), in short time, AZ is back to normal operation,

BUT even after the RSTP master AZ is back to normal, the RSTP slave WZ remains inaccessible for login or pings:

the RSTP slave WZ must be rebooted (or power cycled) by hand to become accessible again.

Does not make sense for a RSTP setup…

100% reproducable

Should not happen, or does it?

Accepted Solution

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    Answer ✓

    No, I did not disconnect all the devices from the two switches when I did the tested as described in my previous msg…

    Cannot do that at the moment because they are in heavy use. Maybe some time during the coming weekend, but not now during the work week…

«1

All Replies

  • Zyxel_Judy
    Zyxel_Judy Posts: 907  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer

    Hi @cbeckstein ,

    RSTP slave WZ remains inaccessible for login or pings

    Could you let us know if this issue occurs with IPv6 only, or with both IPv4 and IPv6?

    If the issue is isolated to IPv6, please reproduce it, then log in to the RSTP slave WZ using IPv4. Afterward, download tech-support and share it with us here, or send it privately by clicking on my account > Message

    Judy

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    edited March 27

    I do not really understand your question "happens with IPV4 or IPV6?"

    The switches have set both IPV4 and IPV6 active, and when the inaccessibility of Switch WZ happens, I cannot access it via IPV4 and not via IPV6

    I will send you the Support Downloads for both switches right away privately.

    They may also be relevant for my other 3 yesterday posts in this community which also address strange XMG1915-10E behaviour

    Maybe you as an expert, can immediately make sense of them

  • Zyxel_Judy
    Zyxel_Judy Posts: 907  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    edited March 28

    Hi @cbeckstein ,

    Based on the typology and configuration details you provided, we attempted to replicate the issue but were unsuccessful with our SFP+ DAC cable. After rebooting the AZ, the laptop connected to the AZ was able to ping/access the WZ. To further diagnose the problem, please perform the following tests:

    Case 1: It appears you were using an SFP+ transceiver (FS vendor with Part Number SFP-10G-T100) instead of an SFP+ DAC. in case there is no SFP+ DAC cable, we recommend connecting your AZ and WZ devices using an RJ45 port. After making this connection, restart the AZ device and allow it to complete its boot process. Then, verify if the PC connect to the AZ can successfully ping or access the WZ using IPv4, IPv6.

    Case 2: Disable IPv6 on both XMG1915 devices, then restart the AZ. Wait for the AZ to fully boot up and verify if the PC connected to the AZ can successfully ping/access the WZ via IPv4.

    Case 3: Turn off the cluster management on the AZ, restart the AZ, and wait for it to boot up. Check if the PC connected to the AZ can ping/access the WZ via IPv4 and IPv6.

    Please conduct these tests and share the results with us to assist in resolving the issue.

    Judy

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    edited March 28

    Hi Judy,

    This is driving me nuts… I replaced my two older XS1250-12 Zyxel switches with the XMG1915-10Es because I needed a more flexible but nevertheless reliable network solution…

    Wrt Case 1 —- I probably was not precise enough:

    Both switches have an optical SFP+ 10GBASE-T RJ-45 transceiver in port 10 and are connected via these ports with a regular Cat 7 copper Ethernet cable (SFP+ ports configured for media type DAC10G as suggested by support for such a setup).

    The specs of the two receivers can be read here: https://www.fs.com/de/products/154925.html

    As you can see from these specs, these transceivers are very low-power transceivers (no more than 1.65W) —- expensive ones, but I chose them because both the switches are mounted in hard-to-reach little ventilated places where I do not want excess heat.

    These transceivers —- as long as they are not rebooted —- work like a charm, maintaining full 10G speed between the switches while becoming no more than hand-warm.

    Connecting the switches via regular, non-SFP+ RJ-45 ports is not an alternative because those ports are limited to 2.5G. This is not enough for my applications (regular multi-TB backups from devices at the slave to a NAS at the master in parallel with several video streams).

    Having read your recent msg, I performed further experiments with my setup and simply could not find a pattern:

    This morning, e.g., I first rebooted the RSTP-master (Switch_AZ) via its Web maintenance page: no problem, the RSTP-slave (Switch_WZ) was accessible once the master was back on; did this repeatedly, no problem

    Then I rebooted the RSTP-"slave" Switch_WZ: it was inaccessible afterward; power cycling it brought things back to normal;

    then I rebooted the slave again: at first sight, all seemed fine, Switch_WZ received data (a WLAN Radio stream) from its master Switch_AZ, but no longer was pingable or accessible for a login; a closer look at the cluster management status page in the RSTP-master (Switch_AZ) revealed it as Offline (with cluster member number greyed out), and it was also greyed out in the neighborhood table of the RSTP-master; another reboot of the slave brought things back to normal…

    A complete mess… suitable for publication in the Journal of irreproducible results… :-(

    I suspect the DHCPV6 implementation of the switches could be one of the things that might contribute to this nerving behavior —- see also my other DHCPV6-related post ("Strange IPV6 related behavior shown by a XMG1915-10E configured for DHCPV6" from March 24th) in this community.

    But unfortunately, I depend on IPV6 working in my network…

    And I have to stay with the RSTP setup due to a SONOS multi-room WLAN radio installation where several (but not all) of the SONOS devices are connected to the two switches via LAN.

    Hoping that you can still make sense out of this, I will send you in a few minutes another copy of the Tech Support data of the two switches, which I downloaded after the above-mentioned experiments. Maybe they also help to clarify the observations from my other above mentioned DHCPV6 related post…

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    edited March 28

    One more idea:

    Could it make a difference with my setup whether the slave Switch_WZ is power-cycled or just rebooted via its Web interface? Is there anything different in its reconfiguration during restart, maybe wrt initializing its SFP+ ports (configured for media type DAC10G) when it is just rebooted?

  • Zyxel_Judy
    Zyxel_Judy Posts: 907  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer

    Hi @cbeckstein ,

    This morning, e.g., I first rebooted the RSTP-master (Switch_AZ) via its Web maintenance page: no problem, the RSTP-slave (Switch_WZ) was accessible once the master was back on; did this repeatedly, no problem

    Then I rebooted the RSTP-"slave" Switch_WZ: it was inaccessible afterward; power cycling it brought things back to normal;

    It seems the issue has moved to a situation where the "slave" WZ reboots and becomes inaccessible. Could you please confirm if any devices in the network topology were moved between the two testing periods?

    But unfortunately, I depend on IPV6 working in my network…

    To further diagnose, I suggest disabling IPv6 on the WZ devices, followed by a restart. After they fully boot up, please check if you can successfully ping or access the WZ.

    Disabling IPv6 will only affect the IPv6 management interface and not the IPv6 traffic forwarding, so there’s no need to worry about disrupting IPv6 traffic.

    And I have to stay with the RSTP setup due to a SONOS multi-room WLAN radio installation where several (but not all) of the SONOS devices are connected to the two switches via LAN.

    From our experience, it’s not necessary to enable RSTP on both the SONOS and switch simultaneously. Please verify if RSTP is enabled on your SONOS device and disable it if it is.

    Moreover, could you list the devices connected to the WZ? As a diagnostic step, try disconnecting all devices from the WZ, then connect just one PC or laptop and attempt to replicate the issue. This will help determine if the problem occurs in the most basic network setup.

    Judy

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    edited April 1

    OK:

    1. No change in network topology between the two testing periods, no device moved…
    2. Disabled IPV6 on both AZ and WZ (by disabling the IPV6 interface, otherwise anything IPV6 related unchanged like in the Tech support files I sent you a few days ago)
    3. Switched off STP on the two switches for the ports that are wired directly to SONOS devices (SONOS just supports and uses STP, not RSTP like the XMG1915-10Es, see e.g. https://en.community.sonos.com/advanced-setups-229000/sonos-unifi-vlans-and-rstp-clarification-6850552
    4. Devices on my RSTP-"slave" WZ: Port 1: Fritz SmartGateway (Zigbee- and DECT ULE Hub), Port 2: Nvidia Shield TV, Port 3: Panasonic Blue Ray, Port 4: Fritz WLAN Repeater as AP, Port 5: Smart TV by Samsung, Port 6: GigaBlue Linux SAT Receiver/Recorder (a ka Dreambox), Port 7: Denon AVR, Port 8: Sonos Connect, Port 9: nothing, Port 10: XMG1915-10E (the RSTP-master) via SFP+ 10GBASE-T RJ-45 transceiver configured for DAC10G
    5. Devices on my RSTP-master AZ: Port 1: raspberrypi, Port 2: Epson printer, Port 3: Sonos One, Port 4: Sonos One, Port 5: Qnap NAS, Port 6: Laptop, Port 7: Internet-Router FritzBox, Port 8: nothing, Port 9: nothing, Port 10: XMG1915-10E (the RSTP-"slave") via SFP+ 10GBASE-T RJ-45 transceiver configured for DAC10G

    (for details of the two SFP+ receivers used to connect the switches, see my previous post)

    The problem still persists:

    Sooner or later, when rebooting one of the two switches, the RSTP-"slave" WZ becomes inaccessible (no ping, no log-in).

    WZ must be power-cycled to get things straight again…

    AND, this is very interesting, did not notice that before:

    even after power cycling WZ, I still have to reboot the RSTP-master AZ in order for the RSTP-slave WZ to become accessible again and have an overall functioning network; as long as AZ is not rebooted afterwards, it does not see the freshly power-cycled RSTP-slave WZ

  • Zyxel_Judy
    Zyxel_Judy Posts: 907  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer

    Hi @cbeckstein ,

    Thank you for the detailed information.

    Could you explain why you needed to reboot both the AZ and WZ? Is a daily reboot necessary?

    Please note that we are unable to test with various devices at our location, so we can only replicate the issue using a basic network setup as described below.

    We have managed to access the WZ after rebooting both AZ and WZ. We would appreciate it if you could conduct the following test on your end.

    Port 10: XMG1915-10E (the RSTP-master) via SFP+ 10GBASE-T RJ-45 transceiver configured for DAC10G

    We suggest changing the 10 port configuration on both XMG1915-10E units to SFP+ to see if it resolves the issue.

    As an additional troubleshooting step, we recommend disconnecting all devices from the WZ, then connecting only a single PC or laptop to see if the problem persists. This will help identify if the issue is present in a basic network configuration.

    If it is not feasible to disconnect all devices from the WZ, could you try disconnecting devices of the same type to pinpoint the source of the problem?

    Judy

  • cbeckstein
    cbeckstein Posts: 17  Freshman Member
    10 Comments
    edited April 2

    I usually do not reboot the two XMG1915-10Es… no reason to do that because…

    as long as I leave them alone, they run like a charm —- one or the other strange behaviour excluded, see my other post "Strange IPV6 related behaviour shown by a XMG1915-10E configured for DHCPV6" from March 24th in this community…

    I only noticed the strange reboot behaviour some two weeks ago after having powered down the whole net for a few holidays. That made me curious why the XMG1915-10Es behave that way and whether it might be (another) FW problem.

    Following your recent mail, I switched the media type of the SFP+ 10GBASE-T RJ-45 ports 10 on both (in this way copper cable connected) switches from DAC10G back to the default SFP+.

    That made no change (network and testing setup like in your diagram):

    1. rebooted AZ, everything fine, WZ still accessible
    2. rebooted WZ, WZ no longer accessible, the entry in the AZ-Neighbourhood table belonging to WZ (port 10) still shows the data as it was before the WZ-reboot but now is greyed out instead of green
    3. rebooted AZ and everything is back to normal, WZ accessible again

    Looks as if AZ, after the reboot of WZ, somehow does not become aware of the freshly awakened WZ in the same way as it (successfully) does when it (AZ) itself is rebooted

  • Zyxel_Judy
    Zyxel_Judy Posts: 907  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    edited April 8

    HI @cbeckstein ,

    That made no change (network and testing setup like in your diagram):

    Are you indicating that you disconnected all devices from the two switches to replicate the issue? Thus, your network topology should resemble the one depicted (ignore the port number), correct?

    To assist you more effectively, we would like to obtain the packet file and attempt to reproduce the issue. So, please use the above topology (with no other devices connected to the switches) and utilize Wireshark to capture packet for about 10 minutes. The provided image illustrates how to configure port mirroring for packet capture:

    Port 5 on Switch AZ (connected to the Fritzbox router) and Port 10 on Switch AZ (connected to Switch WZ) are mirrored by Port 1 (connected to the laptop). Then, capture packets, reboot the WZ switch, and access its interface.

    Note: The packet file might be too large for transferring directly via Community; please upload it to Google Drive or another cloud storage service and share the download link with us.

    Judy