LACP LAG between USG1100 and XGS3700 not working properly

2»

All Replies

  • imaohw
    imaohw Posts: 124  Ally Member
    First Comment First Answer Friend Collector Sixth Anniversary
    @Zyxel_Emily - thanks, I will give that a try.

    Just to be clear for a LAG whose components are eth3 and eth4 I can down both eth3 and eth4 and the lag will continue to function using both physical ports?
  • Zyxel_Emily
    Zyxel_Emily Posts: 1,396  Zyxel Employee
    Zyxel Certified Network Administrator - Security Zyxel Certified Sales Associate 100 Answers 1000 Comments
    Since Ethernet P4-Home and P5-Home2 already joined LAG, you can inactivate both interface P4-Home and P5-Home2. 
    Only ethernet interface IP setting is disabled

    Inactivate interface P4-Home and P5-Home2. Reboot USG1100.
    No matter ge4 or ge5 is disconnected, all traffic is still working.

    See how you've made an impact in Zyxel Community this year!
    https://bit.ly/Your2024Moments_Community

  • imaohw
    imaohw Posts: 124  Ally Member
    First Comment First Answer Friend Collector Sixth Anniversary
    @Zyxel_Emily - I had a chance to test the configuration you suggested with the LAG configured using P4-Home and P5-Home.  I inactivated P4-Home and P5-Home and rebooted the USG1100.

    As soon as the USG1100 came back up I started to see network issues on vlan2 (the vlan which is on the LAG).  Intermittent communication on some devices, inability to resolve DNS on others, inability to get an IP address on others.  If I "downed" the second port of the LAG on the XGS3700 everything started to work normally.

    I will do some more testing over the weekend.  However it does not appear that your workaround resolves the issue. 
  • imaohw
    imaohw Posts: 124  Ally Member
    First Comment First Answer Friend Collector Sixth Anniversary
    edited October 2020
    @Zyxel_Emily - additional testing results:

    As reported above, the recommended configuration did not work.  With the recommended  configuration, with only GE4 or GE5 connected ("up") traffic seemed to flow normally.  As soon as both GE4 and GE5 were both connected (both ports of LAG being used) intermittent traffic issues started to occur.

    To determine if Device HA was causing the issue I removed the "Backup" USG from the network and disabled Device HA on the "Primary" USG.  I rebooted the USG and started testing. All additional testing was done with no Device HA on the USG.

    I started a continuous ping to the USG from a PC on vlan2 (the vlan on the LAG).  With either just GE4 or just GE5 connected the ping worked.  As soon as I connected both GE4 and GE5 the ping started timing out.  If I disconnected either GE4 or GE5 the ping would start again. I did not see network issues on every device on vlan2 but on many of them.

    To gather additional information for you, I decided to do a packet capture on the USG of ports GE4 and GE5. My goal was to perform the packet capture when both ports GE4 and GE5 were up and there were network traffic flow issues occurring.  Before I started the packet capture the ping of the USG from the PC on vlan2 was timing out. 

    As soon as I pressed the Capture button on the USG to start the packet capture the ping on the PC started working again.  As soon as I pressed the Stop button on the USG to stop the packet capture the ping on the PC started timing out.

    This condition was reproducible whenever I started or stopped the packet capture.  It was reproducible if I was performing the packet capture on both GE4 and GE5 or just GE4 or just GE5.

    I have no idea how starting a Packet Capture would change how network traffic is flowing on the USG but that is what is happening.

    I also tried changing the type of LAG (on both the USG and the XGS3700) from 802.3ad (LACP) to balance-alb (Static).  After changing the LAG configuration I re-did all of the tests above.  The results with a Static LAG were exactly the same as the results with a LACP LAG.

    I think, with my additional testing,  I have ruled out the theory that Device HA was causing the issue.

    Let me know what additional information you would like me to gather for you.

Security Highlight