USG60. Very slow web Interface

Pavel · February 2020

SSH work fine. No internet, no clients,1 admin connected (Firefox) . All other intranet services work fine. 1GB connection

Firmware V4.35(AAKY.2)ITS-WK01-r91300

warwickt · March 2020

Hi Zyxel_Jerry yes the consistent symptoms are from 2 x USG20WVPN, 1 x USG60 and 1 x USG40.......

Symptoms

immediately obvious at first are DNS requests timeout for any LOCAL DNS or upstream: these are made from
1. local hosts on any LANs on the router : all timeout
  1. e.g. host zyxel.com ; nslookup zyxel.com , web name service reqs... etc
  2. e.g host/nslookup/ dig local-host01.mylocallan1.mylab1
2. from the USGxxx appliance ZYOS cli when the router itself (nslookup <host>) even from own DNS
3. DNS requests to DNS forwards lock up.
4. subsequent cli interaction from 3-120secs
5. however eventually the cli is usable however is extremely slow.
  1. a restart(reboot) can be performed with this cli .. eventual the unit will restart.
  2. however (refer 3.d) the latency then the DNS lockup
USG appliance WEB UI https:// always timeout.
cli ssh ZYOS access: extremely slow responses from USG appliance
1. a cli password résponse can be 1-2 minutes but eventually responds
2. login to USG over HttpS sometimes will take 1-2 mins
3. then once login is accepted, the default EXPERT mode page with dynamic components will languish as the updates never complete: spinning icons etc
4. eventually failing with CLI timeout error or similar.
5. basically the WEB UI cannot be used to get to restart or further configure the router when in this state.
after said restart VTI tunnels can't be established due to DNS lookup failures .
1. can't get them going again after a series if cli restarts...
Becomes slow again after restart - 60-90 seconds
1. using 2.c.i above, the USG appliance will resume its slow state after 60-90 seconds .. basically unusable again

WHAT still works in this slow down mode:

Actual traffic OK :
- LAN_SUBNET to LAN_SUBNET using IP address (no DNS)
- upstream access using IP address (no DNS)
- VTI traffic if established prior to 5.a above works ok
- L2TP VPN access ok - but no DNS
- port fowarding to specific hosts on LANs work ok.

Other Commonality

the affected USG routers have been in constant 24/7 service since 2016.?
1. these routers are 2 x USG20WVPN, USG40 and USG60
2. recently deployed routers don't seem to be affected (2018+)
we initially noticed this problem back at Firmware V4.32.
1. symptoms initially at 3-5 month intervals
2. becoming more frequent at 4.33 then
3. lately at firmware V4.34 and V4.35 weekly
  1. one 2016 USG20WVPN was every 3-4 days.. DNS locks up
    1. could restart and would be ok
    2. Router(config)# ip dns server cache-flush does nothing.
  2. one USG40 deployed in 2016 at firmware V4.35 locked up immediate after firmware V4.35 applied
    1. we began a fall back to V4.34 then began investigating.
  3. this has 4 VTI tunnels and and very large local DNS and DNS forwarding nd is very busy with file transfers and request L2TP requests... however this has been solid at V4.35.
  4. our team, is sceptical that this will stay like this ... however let's see ..... ?
to reiterate, we have not experienced with with USG appliances the have been deployed in late 2018 to current.
OF further significance: ALL these routers have 1 or more (up to 5 VTI) office to office tunnels ... with OSPF enabled and use of up to 4 x Forwarded DNS's
1. however one exception is a 2016 USG60 that has not yet failed in the above. -- strange .

WorkAround

the usual workaround was to RESTART (reboot) the effected router
- however in one case (a 2016 deployed USG20VPN) we had to do this 3 x times in succession before the DNS would process requests locally o other wise.
as stated above this restart workaround began not to resolve this issue.

Diagnostics

Presently we have no diags available for you since we were able to workaround this issue by RESETTING the routers and restoring from good saved .conf.

This was consistent and the above routers are fully operational with no issues with V4.35 software after a RESET and RESTORATION.

however I'm confident that this will show up again in some of our clients older USG appliances.
the following JIT diagnostics showed nothing abnormal:
- debug system ps; debug system show cpu all ; debug debug system show cpu status
  - at these slow down times, the routers were essentially idle (5% at best) when the slowdown occurred.
unable to determine an stale state caches used by the units - is there a disk facility for this?
file systems were less than 20% used (dir ...)
- used by previous saved .conf files back to 2016 and some PAckEt .cap files..

Conclusion:

Our team crudely concluded that because an appliance RESET and RESTORATION of the longer < year 2018 deployed USGxxx model that the RESET action may have cleared out set old state caches. ?
1. A somewhat laxative affect ? ?
is there a progressive slowdown of OSPF due to stale caches? - don't know .. can't measure it . may be others on this forum might know how to measure...
the symptoms are not caused by excessive CPU busy.

We do have a client with USG20VPNW deployed in 2017 the we are watching closely.

Should this happen again we will get you a diag-info collect from the effected USG appliance.

Zyxel_Jerry please advise if you would like some specific information.

no need to DM .. best let fellow forum members see what's in progress.

HTH

warwick

Hong Kong

Zyxel_Jerry · February 2020

Hi @Pavel

We haven’t heard about this before and the symptom sounds strange. Can you private message your configuration to us for checking further?

Pavel · February 2020

I reset device to factory default, restore configuration and no problem detected.

warwickt · March 2020

Hi Pavel yep that's the way. works great.

I just saw this whilst looking for other stuff.

I had updated https://businessforum.zyxel.com/discussion/4005/dns-resolution#latest wit the same thing issue

Zyxel_Jerry this seems to be reproducible ... worth a look by your lads in TaiPei Zyxel.

FYI: we've had this issue with usg40's usg20vpn's and USG60s since Firmware V4.32 came out. The instability was annoying .

Only seems to be with routers the have been in production for a few years.

No issue with new USG20VPNW delayed in Feb this 2020.... however its VTI partner usg20wVPN 2016 regularly locked up such that remote SMB file server req's VTI session from other would fail with DNS failures....

thanks for your confirmation mate!

warwick

Hong kong

Zyxel_Jerry · March 2020

Hi @warwickt

Can you describe more details about the symptoms?

Is the issue always happened on the USG20W-VPN 2016, or it happens after upgrade to the latest firmware?

warwickt · March 2020

Hi Zyxel_Jerry yes the consistent symptoms are from 2 x USG20WVPN, 1 x USG60 and 1 x USG40.......

Symptoms

immediately obvious at first are DNS requests timeout for any LOCAL DNS or upstream: these are made from
1. local hosts on any LANs on the router : all timeout
  1. e.g. host zyxel.com ; nslookup zyxel.com , web name service reqs... etc
  2. e.g host/nslookup/ dig local-host01.mylocallan1.mylab1
2. from the USGxxx appliance ZYOS cli when the router itself (nslookup <host>) even from own DNS
3. DNS requests to DNS forwards lock up.
4. subsequent cli interaction from 3-120secs
5. however eventually the cli is usable however is extremely slow.
  1. a restart(reboot) can be performed with this cli .. eventual the unit will restart.
  2. however (refer 3.d) the latency then the DNS lockup
USG appliance WEB UI https:// always timeout.
cli ssh ZYOS access: extremely slow responses from USG appliance
1. a cli password résponse can be 1-2 minutes but eventually responds
2. login to USG over HttpS sometimes will take 1-2 mins
3. then once login is accepted, the default EXPERT mode page with dynamic components will languish as the updates never complete: spinning icons etc
4. eventually failing with CLI timeout error or similar.
5. basically the WEB UI cannot be used to get to restart or further configure the router when in this state.
after said restart VTI tunnels can't be established due to DNS lookup failures .
1. can't get them going again after a series if cli restarts...
Becomes slow again after restart - 60-90 seconds
1. using 2.c.i above, the USG appliance will resume its slow state after 60-90 seconds .. basically unusable again

WHAT still works in this slow down mode:

Actual traffic OK :
- LAN_SUBNET to LAN_SUBNET using IP address (no DNS)
- upstream access using IP address (no DNS)
- VTI traffic if established prior to 5.a above works ok
- L2TP VPN access ok - but no DNS
- port fowarding to specific hosts on LANs work ok.

Other Commonality

the affected USG routers have been in constant 24/7 service since 2016.?
1. these routers are 2 x USG20WVPN, USG40 and USG60
2. recently deployed routers don't seem to be affected (2018+)
we initially noticed this problem back at Firmware V4.32.
1. symptoms initially at 3-5 month intervals
2. becoming more frequent at 4.33 then
3. lately at firmware V4.34 and V4.35 weekly
  1. one 2016 USG20WVPN was every 3-4 days.. DNS locks up
    1. could restart and would be ok
    2. Router(config)# ip dns server cache-flush does nothing.
  2. one USG40 deployed in 2016 at firmware V4.35 locked up immediate after firmware V4.35 applied
    1. we began a fall back to V4.34 then began investigating.
  3. this has 4 VTI tunnels and and very large local DNS and DNS forwarding nd is very busy with file transfers and request L2TP requests... however this has been solid at V4.35.
  4. our team, is sceptical that this will stay like this ... however let's see ..... ?
to reiterate, we have not experienced with with USG appliances the have been deployed in late 2018 to current.
OF further significance: ALL these routers have 1 or more (up to 5 VTI) office to office tunnels ... with OSPF enabled and use of up to 4 x Forwarded DNS's
1. however one exception is a 2016 USG60 that has not yet failed in the above. -- strange .

WorkAround

the usual workaround was to RESTART (reboot) the effected router
- however in one case (a 2016 deployed USG20VPN) we had to do this 3 x times in succession before the DNS would process requests locally o other wise.
as stated above this restart workaround began not to resolve this issue.

Diagnostics

Presently we have no diags available for you since we were able to workaround this issue by RESETTING the routers and restoring from good saved .conf.

This was consistent and the above routers are fully operational with no issues with V4.35 software after a RESET and RESTORATION.

however I'm confident that this will show up again in some of our clients older USG appliances.
the following JIT diagnostics showed nothing abnormal:
- debug system ps; debug system show cpu all ; debug debug system show cpu status
  - at these slow down times, the routers were essentially idle (5% at best) when the slowdown occurred.
unable to determine an stale state caches used by the units - is there a disk facility for this?
file systems were less than 20% used (dir ...)
- used by previous saved .conf files back to 2016 and some PAckEt .cap files..

Conclusion:

Our team crudely concluded that because an appliance RESET and RESTORATION of the longer < year 2018 deployed USGxxx model that the RESET action may have cleared out set old state caches. ?
1. A somewhat laxative affect ? ?
is there a progressive slowdown of OSPF due to stale caches? - don't know .. can't measure it . may be others on this forum might know how to measure...
the symptoms are not caused by excessive CPU busy.

We do have a client with USG20VPNW deployed in 2017 the we are watching closely.

Should this happen again we will get you a diag-info collect from the effected USG appliance.

Zyxel_Jerry please advise if you would like some specific information.

no need to DM .. best let fellow forum members see what's in progress.

HTH

warwick

Hong Kong

Zyxel_Vic · March 2020

Hi @warwickt

Thanks for the detail examination and the findings. Regarding to the problematic device "2016 deployed USG20VPN", Had you ever tried to replace the device with the same model USG20VPN "deployed in late 2018" with totally the same configuration file? Are things getting better after swapping by a new manafactured model?

If so, can you try to do firmware recovery and db recovery by following procedure on your device (remember to save your configuration file locally rebore doing this). The "FW recovery" and "DB recovery" procedure are the same, the only difference is the uploading file (firmware file: xxx.bin, db file: xxx.db)