[NEBULA] Switch port in blocking state

flottmedia
flottmedia Posts: 56  Ally Member
First Anniversary Nebula Gratitude Friend Collector First Answer
edited April 2021 in Nebula

We have a client installation with two Hyper-V servers on an 1920-24HPv2, each of them with two NICs with LACP (dynamic, based on Windows Server setting). After working fine for weeks, one of the servers now continuously loses the LACP-state and gets completely disconnected by the switch without any obvious reasons. The LACP-Port goes to "blocking" state without any further entries in the event logs and no visible way to set back that state somehow (manually) over Nebula. The only way to bring the server back up is to disable one of the team NICs. Afterwards the server is immediately reachable again and also LACP is working file for a certain while when re-adding the second NIC to the Team, although the port stays in blocking state. The second (absolutely identically configured) server does not have / cause this issue.

Now, the questions are:

  1. How can we find out, what causes the blocking (event logs, debug, ...)?
  2. How can we manually unblock the port group without restarting the switch over Nebula?
  3. Can the issue have somthing to do with different MAC addresses coming for the Hyper-V-Host and the VMs over the teamed NIC?
  4. Why don't teamed NICs appear correctly in the client list(s)? When using the NICs without the Windows Server Teaming mode the MAC addresses of each of the NICs appear correctly. After adding teaming on the two NICs (with a different MAC) there is no more entry in the client list for the servers, neither for the Hyper-V host or one of its NICs, nor for the VMs "behind" it.

All Replies

  • Zyxel_Jason
    Zyxel_Jason Posts: 394  Master Member
    First Anniversary 10 Comments Friend Collector First Answer

    Hi @flottmedia ,

    For your questions,

    Currently, we don't have event logs when the port change its state from forwarding to blocking. I will post a new topic for this in the Idea section.

    The port change to blocking state should be related to the port doesn't receive LACP packet, it will recover to forwarding state when it receive LACP packets again.

    Since you mentioned that you may bring it back by disable the NICs, I think the problem may be on the NICs not on the Switch, you can verify that by using Wireshark to mirror/capture the LACP-ports packets to see if windows server keep sending LACP packets to Switch.

    When Switch ports are running LACP, it needs LACP packets to maintain the port state. Perhaps you can use static mode instead LACP mode to resolve the problem. In static mode, Switch doesn't need any protocol to maintain the link.


    Hope it helps.

    Jason
  • flottmedia
    flottmedia Posts: 56  Ally Member
    First Anniversary Nebula Gratitude Friend Collector First Answer

    Thanks for the detailled answer, @Nebula_Jason!

    Regarding LACP packets, I assume the Nebula switch does try to keep the port up when only one of the server NICs is left up in the team? Otherwise the redundancy simply wouldn't make sense, right? In our case one of the NICs on the Windows Server got (somehow) disconnected during the copy process of a larger file (> 1 TB) from the other identically configured server that also uses teaming and that stayed up. The other NIC on the disconnected server did appear to be connected on Windows OS level (Team configuration). So, the team connection was still alive for Windows, but the whole traffic seemed to be blocked by the NSW completely. Regarding your statement on the requirement of LACP packets above I would assume that a Windows Team NIC with one disconnected member should still send LACP packets, shouldn't it?

    We currently try to reproduce the issue, but I couldn't find a way to capture any LACP packts with Wireshark.

    We have the following NICs:


    The two physical HP NICs are teamed with LACP:

    In Wireshark the interfaces appear as:

    "Behind" the vSwitch there are a few more VMs on the server (Hyper-V host).

    With this configuration what NIC(s) would you capture with which Wireshark filters in order to see wheather LACP packets are sent or not?

  • flottmedia
    flottmedia Posts: 56  Ally Member
    First Anniversary Nebula Gratitude Friend Collector First Answer
    edited July 2019

    Maybe one further addition: we already tried the capture / display filter(s) from https://wiki.wireshark.org/LinkAggregationControlProtocol on ALL NICs simultaneously without ANY LACP packet beeing captured by Wireshark on both servers.

    @Nebula_Jason: Could you perhaps try that with your Windows Server from the screenshot above (Server 2019 dynamic LACP-Team, Hyper-V vSwitch enabled on the Team-NIC with Shared parent partition for Host access)?

  • Zyxel_Jason
    Zyxel_Jason Posts: 394  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited August 2019

    Hi @flottmedia ,

    In my local test, even the port is in blocking state, Switch should still send LACP packets to the peer side(In your case is the Windows server), so if you don't see any LACP packet, maybe you choose the incorrect NIC to monitor on Wireshark.

    If you are not sure which NIC to choose to see the packet on your Windows Server, maybe you may have another PC connect to the Switch and mirror the packets from the Switch side.

    You may follow the steps below:

    1.Login to Nebula CC and access your site.

    2.Go to "SWITCH > Configure > Switch configuration > Port mirroring".

    3.Choose your Switch MAC address, configure Destination port which you connect to PC and Source port which you connect to your server.

    4.Run Wireshark on your PC and choose the Ethernet NIC of your PC.

    5.You should be able to see LACP packets on Wireshark.


    BTW, you may also connect your PC to your server directly when the issue happen to see if you PC can capture the LACP packet from your server.

    Hope it helps.

    Jason

Nebula Tips & Tricks