50% packet loss after cable modem upgrade???

Options
itxnc
itxnc Posts: 98  Ally Member
First Anniversary 10 Comments Friend Collector

This is a REALLY weird one. So our cable company came out and upgraded our old router. Nothing wrong with the old one - just they're slowly migrating to new hardware. Within 5-10 minutes after the upgrade, we start losing voice path on VoIP (one direction) and the network gets really shaky. I start up a few pings to Google and CloudFlare, and I'm losing every other ping - without fail.

I reboot the Zyxel (ATP200) and Cable Modem (Hiltron EN2251). Works fine, then 5-10 minutes and it starts again. I disconnect the WAN network cable between the Zyxel and Cable Modem, even for just a second, starts working then 5-10 minutes and starts again.

I called our cable company and they sent the tech back out. He replaced the new modem with another (it's in bridge mode like it should be). SAME PROBLEM

This is what PingPlotter shows:

The packet loss definitely seems to be happening between the Zyxel and the EN2511. I'm at a loss how to fix it. We've never had ANY trouble before they switched the modem. It's clearly not the hardware - we've had two brand new modems do the exact same thing. I plugged my laptop into the modem directly for 10-15 minutes, couldn't reproduce it.

These modems have 2.5gigabit WAN connections. Is that somehow making the Zyxel flip out? Has anyone else seen anything like this??

Accepted Solution

  • PeterUK
    PeterUK Posts: 2,730  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited March 2023 Answer ✓
    Options

    Maybe another thing to test is set port to 1000Mbps-Full Duplex/Auto Negotiate instead of Auto Negotiate

«13

All Replies

  • smb_corp_user
    smb_corp_user Posts: 161  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options

    Sorry for not having any real answers, but logic suggests that if you lose half the speed, it could indicate that parts of your hardware is uncapable of that speed and wants to go back down to 1GHz, or that there is something wrong in config that needs a firmware upgrade to handle this issue.

  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    edited March 2023
    Options

    Well we're not losing half the speed - we're losing half the packets on and off (the red bars in the graph above). The ATP200 shows the WAN port negotiated at 1000M/Full like it should be.

    Just for kicks I'm about to throw a mini gigabit switch in between them to see what happens. (UPDATE - Didn't help. Worked OK for 20 minutes then started acting up again)

    Oh and here's another really weird thing we've started to see. We're getting an IP conflict on the WAN side. With a 00:00:00:00:00:00 MAC address and the modem's Gateway IP. Not the modem's IP, but the next hop gateway (xx.xx.xx.1) I can't correlate the logs with the start of the problem though.

    This message happens when the WAN link comes up. I should mention we use a failover trunk with an LTE modem on WAN2. I don't know if that matters. I've tried pulling the LTE modem completely and connected the cable modem to WAN1 and WAN2. Get the same behavior.

    I just reproduced it again. Swapped cable modem back to WAN1 (had been on WAN2 over the weekend) at 9:52. Only logs besides the weird IP conflict the normal policy route trunk logs. The problem starts again at 9:55:09 and there's NOTHING in the logs around that time - next log is a DHCP assignment at 9:59.

    At this point I'm about to fail over to the LTE backup (we already maxed out our charges for the month so it's free now LOL) and going to use a dedicated laptop stremaing YouTube or somethign and PingPlotter to see if the modem does this on its own.

  • PeterUK
    PeterUK Posts: 2,730  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited March 2023
    Options

    Its likely a problem because the modem has a 2.5Gb port and either ATP or modem is at fault.

    Put a switch between them and it will likely start working try with a 2.5Gb port switch.

    Test with a PC to the modem port

  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    Options

    Switch in between made no difference. We're running a ping test with a dedicated laptop now with the office on LTE backup

  • PeterUK
    PeterUK Posts: 2,730  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options

    But connect the laptop to the modem for testing.

    Does WAN have a ping check to some where? disable it.

  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    edited March 2023
    Options

    That's what we're doing. Laptop is fine so far. Clearly an issue between this model modem and our ATP200. I'm going to try to get logged into it to see if there's anything we might be able to set in the modem to avoid the issue. Used to be you coudl get to Spectrum/Charter modems at 192.168.100.1 even in bridged mode but… Not this one. Sigh.

    The WAN has to have a ping check for failover.

  • PeterUK
    PeterUK Posts: 2,730  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited March 2023
    Options

    Maybe the ping is failing and doing failover?

    As for 192.168.100.1 you might be able to get that working by adding a virtual interface on WAN 192.168.100.254/24 that should then ARP to 192.168.100.1

  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    edited March 2023
    Options

    No indication of failover on the Zyxel. Because SOME pings are making it out (the whole on then off then back on cycle we've seen) it never exceeds the timeout threshold that triggers a failover. But we've pulled the WAN2 connection to prevent any attempt to failover and we still see the issue happening. I think the next thing we're going to try is disable the failover trunk and use the native WAN1, but the lack of ANY logs or activity on the Zyxel makes me doubt any configuration change is going to affect it.

    We're also going to try a Gigabit GBIC in port P1 just for kicks to see what happens (UPDATE: Using P1/opt interface as the primary WAN connection worked for about 5 minutes and… tripped out again.)

    Once interesting thing - it seems like active VoIP calls trigger it faster. But I can't say with any certainly. We're on the phone a lot. But it seems like we're often on active calls and it'll happen earlier than if we aren't. Suddenly clients can't hear us, but we can hear them and the packet loss starts up.

  • lalaland
    lalaland Posts: 90  Ally Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited March 2023
    Options

    It sounds like there may be a compatibility issue between the Zyxel ATP200 and the Hiltron EN2251 cable modem.
    You can update to the latest firmware first; it will solve the IP confliction warning message.
    https://community.zyxel.com/en/discussion/15957/zld-v5-35wk06-firmware-release#latest

    Does the Hiltron EN2251 have packet capture tool/CLI to trace where the packets drops?

  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    edited March 2023
    Options

    That's what I was leaning towards. Late yesterday we left the Zyxel on LTE backup (WAN2) and connected a laptop directly to the EN2251 with pingplotter going. Now, clearly not an apples to apples test since the laptop just had a YouTube HD test playing, not everything else. But still… Came in this morning - no packet loss. So you figure it's Zyxel + EN2251.

    But on a whim, I had tossed a Ubiquiti Edgerouter-X inbetween the laptop and the EN2251, just to see if another router would trigger the issue. That's the setup we ran overnight. No packet loss.

    The EN2251 is locked in bridge mode. The usual 192.168.100.1 bypass access is not available. It's circled back to the old school days with a simple cable modem without all the wifi/router stuff you have to disable (Excellent!). So, of course, nothing to really login to and poke around.

    ANYway - this morning, I plugged the WAN1 port of the ATP200 into the EdgeRouter. FIgured we'd see what happend in a double NAT scenario where the 'visible' router to the EN2251 is the EdgeRouter. I just knew it would work fine - nope. Packet loss kicked in a few minutes later. More importantly - both the Zyxel and the test laptop (which are peers on the EdgeRouter) were seeing the packet loss)

    So whatever it is, I'm 99% sure it's not hardware related (which I suspected when we tried the GBIC in P1). Something from our networks and/or the Zyxel itself is triggering the modem into whatever mode it's in where we see cyclic packet loss (we're not seeing random packet loss. (And before you say throw everything over to the EdgeRouter - we have over 10 networks and a number of firewall rules plus VPN, so not an easy trick right now)

    It cycles between 0% and 100% packet loss every 5 seconds. This is a 60 second ping plot:

    I'm applying the WK06 firmware now. We'll see what happens. My backup LTE just maxed out it's data use for the month so we're throttled. A bit more urgent now! 😬

    UPDATE: Even more interesting - when we applied the WK06 firmware update - as soon as the Zyxel went offline to update, the laptop stopped seeing the cyclic packet loss. That was unexpected.

    We're 30 minutes in post WK06 upgrade (with Zyxel connected directly to the EN2251 again) and the packet loss hasn't kicked in yet so, fingers crossed, it may have fixed our problem.

    HAHAHAHA Nope! I literally clicked submit on the edit above and…

    I really hate vendor hot potato, but SOMEbody isn't happy with what someone else is doing.

    Next step (though I'm open to all ideas that we haven't tried yet!) I'm going to switch back to the double NAT scenario and setup a mirror port and wireshark capture.

Security Highlight