Dual ATP800 HA configuration keeps freezing

Omnia
Omnia Posts: 39  Freshman Member
First Anniversary 10 Comments Friend Collector
Hi!

I have two ATP800s in HA configuration that keep freezing. We noticed that happens all the time when a night backup starts (we monitor the backup VLAN with PRTG):

I noticed that a core goes up to 100% and then the interface becomes irresponsive, but the CPU graphic doesn't show the CPU saturation:


When the device crashes the HA switches the devices and the passive remains freezed:

I tried to log into it via SSH, but PuTTY shows the warning "Router> ERROR: Not connected to ZySH daemon."

I tried upgrading to firmware 5.35, but the issue is still present.
We've also another pair of ATP800 in HA in our datacenter, and they're also having some issues as well, like going both up at the same time. When this occures, they both freeze and we have to restart them. Sometimes it also happens when nothing is overloading the CPUs, so we don't have any clues about the cause.

In the same datacenter we have also a dual ATP500 HA system, and it's working fine even under heavy stress conditions.

Can you help me? Do you have any other similar cases?

We're connecting to the firewalls two serial cables to monitor with PuTTY the console output, maybe it will show something useful in the log file.

Thank you.

Accepted Solution

  • Zyxel_Kevin
    Zyxel_Kevin Posts: 741  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Hi @Omnia
    Thanks your contact by private message. The issue have been resolved after installed firmware. 
    We will merge the fix into next FCS. 
    Thank you
    Kevin

All Replies

  • Zyxel_Jerry
    Zyxel_Jerry Posts: 1,026  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @Omnia

    We need the diagnostic file to analyze the issue on ATP800
    Please help to collect the diagnostic file
    Go to MAINTENANCE > Diagnostics > Diagnostics > Collect and click the button “Collect Now”.
    Then download the collected file in “Files”.

    Please also type the commands below on both devices and collect the logs.
    show device-ha2 device-status
    show device-ha2 trace-log
    show device-ha2 sync summary


    Jerry

  • Omnia
    Omnia Posts: 39  Freshman Member
    First Anniversary 10 Comments Friend Collector
    hi we sent you everything by private message, including the commands you sent us, the debug logs on usb, and the console output with verbosity 8. we have a pc with 2 serial ports connected which is collecting data. Sunday night (Rome time zone), it is possible the problem occurs again.
    Please try to investigate quickly
  • Omnia
    Omnia Posts: 39  Freshman Member
    First Anniversary 10 Comments Friend Collector
    News??
  • Zyxel_Kevin
    Zyxel_Kevin Posts: 741  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @Omnia
    We're investigating the issue, We will give feedback ASAP. 
    Thank you
    Kevin
  • Omnia
    Omnia Posts: 39  Freshman Member
    First Anniversary 10 Comments Friend Collector
    Thanks @Zyxel_Kevin
  • Zyxel_Kevin
    Zyxel_Kevin Posts: 741  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Hi @Omnia
    Thanks your contact by private message. The issue have been resolved after installed firmware. 
    We will merge the fix into next FCS. 
    Thank you
    Kevin
  • Omnia
    Omnia Posts: 39  Freshman Member
    First Anniversary 10 Comments Friend Collector
    Thanks @Zyxel_Kevinwhen is the firmware release expected?

    In  datacenter we can't install beta release...

    Thanks for the support
  • Zyxel_Kevin
    Zyxel_Kevin Posts: 741  Zyxel Employee
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @Omnia
    Next FCS will be released in mid-March. 
    Thank you
    Kevin

Security Highlight