XGS3700 LACP link aggregation issue

bzzt
bzzt Posts: 6
First Anniversary Friend Collector First Comment
edited August 2022 in Switch
The issue: LACP LAG loses connectivity when one specific link of the group goes down

We have the following setup:

Two XGS3700-24 switches in a stack (top one on the scheme is the master) connected to Cisco WS-C3750E-24TD via LACP LAG, all devices are in management VLAN 1, the stack address is 10.1.253.7, the admin PC address is 10.1.253.10

Network topology:


LACP status on XGS3700:
LACP config on XGS3700:

lacp
trunk T1
trunk T1 lacp
trunk T1 interface 1/4
trunk T1 interface 2/4

LACP config on Cisco:

interface Port-channel3
 switchport trunk encapsulation dot1q
 switchport mode trunk
end

interface GigabitEthernet1/0/3
 switchport trunk encapsulation dot1q
 switchport mode trunk
 channel-protocol lacp
 channel-group 3 mode active
end

interface GigabitEthernet1/0/4
 switchport trunk encapsulation dot1q
 switchport mode trunk
 channel-protocol lacp
 channel-group 3 mode active
end

The experiment:
PC continuously pings XGS3700 stack ip address to monitor connectivity.
From the Cisco switch CLI we first put 1/4 port in a down state, connection to switch stack persists
We put 1/4 back in up state.
From the Cisco switch CLI we put 1/3 port in a down state, connection to switch stack fails.

We have tried this experiment with two Cisco switches and it worked just fine (connection between switches persists as long as at least one link in the LACP LAG is up)


Accepted Solution

All Replies

  • Zyxel小編 Lucious
    Zyxel小編 Lucious Posts: 278  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    Answer ✓
    Hi @bzzt

    This is a known issue that LACP working incorrectly in stacking operation.
    Please use attached datecode to resolve the problem.

    Zyxel_Lucious
  • imaohw
    imaohw Posts: 123  Ally Member
    First Anniversary 10 Comments Friend Collector First Answer
    @Zyxel_Lucious - I worked on a similar XGS3700 LACP LAG Issue (between a USG1100 and XGS3700 in a four switch stack) with several Zyxel staff and no one ever offered a date code fix.  The issue still remains unresolved. Is this recent date code firmware? Do you have an explanation of exactly what Known LAG issue this date code fixes.

    i am currently using V4.30(AAGC.2)_2020131 (AAGF.2 on the XGS3700-48HP) which resolves other issues. I have three XGS3700-24 and one XGS3700-48HP in the stack.

  • imaohw
    imaohw Posts: 123  Ally Member
    First Anniversary 10 Comments Friend Collector First Answer
    @bzzt - when both of your links were up did you verify that traffic was flowing on both links (in both directions) successfully.  

    What I was seeing on my stacked XGS3700s LACP links was that most of the traffic was only on one link and that the second link was not always working in both directions.  This was causing intermittent network issues even when both links were up.

    With only the first link up everything worked fine.  With only the second link up nothing worked. 
  • Zyxel小編 Lucious
    Zyxel小編 Lucious Posts: 278  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    edited September 2020
    @imaohw

    The datecode 430AAGC2C0_20200306 is based on V4.30(AAGC.2)_2020131.
    It contains new bugfixes involving system crash issue, but not LACP issues.

    The LACP issue I mentioned is just like @bzzt described, connection failure when intentionally disconnect one of LACP links.
    This issue had been fixed in earlier datecode in 2019, so the code you have in hand should include it.

    After checking your discussion(PM) with my colleague @z@Zyxel_Derrick , the issue you have is probably caused by USG but not switch, which is a fact indicated by both our and your test results.

    Zyxel_Lucious
  • bzzt
    bzzt Posts: 6
    First Anniversary Friend Collector First Comment
    Hi @bzzt

    This is a known issue that LACP working incorrectly in stacking operation.
    Please use attached datecode to resolve the problem.

    Zyxel_Lucious
    Works like a charm, the LACP issue is resolved. Why isn't this firmware available in download section for the switch?

    imaohw said:
    @bzzt - when both of your links were up did you verify that traffic was flowing on both links (in both directions) successfully.  

    What I was seeing on my stacked XGS3700s LACP links was that most of the traffic was only on one link and that the second link was not always working in both directions.  This was causing intermittent network issues even when both links were up.

    With only the first link up everything worked fine.  With only the second link up nothing worked. 
    Same, most of the traffic was flowing through a single link, the one that downs entire LACP group when you disable it. I didn't see any network issues with both links up, likely because it was a test network segment that did't have any workload on it.
  • imaohw
    imaohw Posts: 123  Ally Member
    First Anniversary 10 Comments Friend Collector First Answer
    @Zyxel_Lucious - thanks for the info.  I was hopeful that there might be a solution that I had missed. For the time being I am not using the LAGs as I need 100% uptime and can’t experiment with the USG1100 or the XGS3700s.

    When the pandemic is in the past I will get back to trying to resolve the LAG issue.
  • Zyxel小編 Lucious
    Zyxel小編 Lucious Posts: 278  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    edited September 2020
    @bzzt

    Glad to hear your result.
    As for your question:
    Why isn't this firmware available in download section for the switch?
    Because it's merely datecode for issues on particular cases. All the bugfix will be included into next official patch firmware for XGS3700 series.

    Zyxel_Lucious
  • Zyxel_Derrick
    Zyxel_Derrick Posts: 126  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    edited September 2020
    Hi @imaohw

    Good day
    I have tried to reproduce the issue with the config you had provided to me again and it seems like I can locally reproduce the issue that you have encountered.
    I have already informed our USG team and they are investigating it.
    Therefore, they will help you to resolve the issue
    Thanks

    Best regards,
    Zyxel_Derrick
  • imaohw
    imaohw Posts: 123  Ally Member
    First Anniversary 10 Comments Friend Collector First Answer
    @Zyxel_Derrick - great news.  Thank you for following up.