Advanced Threat Protection Tab hung/CPU spike

itxnc
itxnc Posts: 98  Ally Member
First Anniversary 10 Comments Friend Collector
edited April 2021 in Security
We're planning on swapping out most of our client's USGs for FLEX equivalents later this year, so we're testing the process/equipment out on our own at the office and at home. But we ran into a strange issue that's really causing issues. 

Converted a fairly complex USG40 config over to FLEX 100 using the tool. Converted and loaded fine (OMG App Patrol configuration is BRUTAL on FLEX. For another post). So it's up and humming along - UTM throughput jumped from high 70s w/USG40 to well over 225mbps on the FLEX100 (which was the max of my Internet service, not the router) The ATP dashboard showing plenty of scans. Even got SSL Inspection going (we use it in the office already on a USG110, so had an exclude list ready to go) Next day I VPNed in to see how it was running and ... I could not get past the login screen. I couldn't even SSH into it. Well more specifically it timed out and said it was unable to connect to the ZySH daemon and left me at a useless Router> prompt. When I got home, I power cycled the FLEX and everything came back. But when I went to login, it went back to the last place I had been - the ATP dashboard - and the 'Loading' widget just spun forever. Never let me in. Finally figured out if I opened an Incognito Window (no cookies) it would throw me to the main dashboard. Logged in fine. Click ATP - locks up. Every time.

Worse, CPU usage was pegged to 90%+. SSH into router and user space CPU is VERY high, most of it being used by utm_dashboard_daemon. 



First time it happened the CPU usage of that one process showed 20% (I didn't grab a screen shot)  In this one it's around 3%, which is the highest in the ps dump, but doesn't account for the usage. Nothing else using much, yet the CPU usage is sky high


So I reset the router again and go to sleep. It's happily idle. I login to the web GUI again, and here is the result:

It just stays there, even when the GUI is closed. Once you reboot the router, the CPU stays low as long as you don't login to the GUI:

Router humming along currently - post reboot with traffic but no GUI login and CPU usage is normal...


Not really sure how to fix this. Also not really sure what triggered it. I was able to use the ATP tab after the bulk of the configuration migration. I made a couple tweaks here and there, but can't really recall a specific thing that was changed. I *think* I ran into the issue before activating SSL Inspection, but can't recall 100%. Happy to gather logs, debug info, etc if it'll help. We're ready to move our USG110 to an ATP200 at the office. We'll see if same thing happens as the configurations are pretty similar.

Best Answers

  • Zyxel_Tobias
    Zyxel_Tobias Posts: 200  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    Answer ✓
    Hi @itxnc,

    thank your for your long and detailed post.

    Today we encounter an issue with one signature, can cause in rare condition abnormal behavior on your ATP firewall.

    And you was one of the lucky few people, which seems run into this issue today.

    We already roll out a fix via Cloud signature update, but let me share the steps with you.

    -> Reboot the device if you can´t login anymore
    -> Go to Configure -> Licensing -> Signature Update
    -> Update App Patrol signature from 2021 version to 2020 version (just execute)
    -> Reboot device after signature is sync finished

    The issue is gone then, and the CPU will continue work smoothly as expected.

    We are very sorry, that you run today into these issue and we have already adjust our QA to include the mention "UTM Dashboard" process deeper into signature release testing.

    Have a good weekend.

    Kind Regards,

    Tobias
  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    Answer ✓
    AWESOME - thank y'all for such quick response. Fixed the CPU hog issue AND I can get back into the ATP tab. Excellent!

All Replies

  • Zyxel_Tobias
    Zyxel_Tobias Posts: 200  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    Answer ✓
    Hi @itxnc,

    thank your for your long and detailed post.

    Today we encounter an issue with one signature, can cause in rare condition abnormal behavior on your ATP firewall.

    And you was one of the lucky few people, which seems run into this issue today.

    We already roll out a fix via Cloud signature update, but let me share the steps with you.

    -> Reboot the device if you can´t login anymore
    -> Go to Configure -> Licensing -> Signature Update
    -> Update App Patrol signature from 2021 version to 2020 version (just execute)
    -> Reboot device after signature is sync finished

    The issue is gone then, and the CPU will continue work smoothly as expected.

    We are very sorry, that you run today into these issue and we have already adjust our QA to include the mention "UTM Dashboard" process deeper into signature release testing.

    Have a good weekend.

    Kind Regards,

    Tobias
  • Had the same issue. But in my case the dashboard_daemon run up to 99% cpu. Switching to standby Firmware and updating to new signature solved the issue. 
  • Zyxel_Tobias
    Zyxel_Tobias Posts: 200  Zyxel Employee
    First Anniversary Friend Collector First Answer First Comment
    Hi @AlexRiviera

    Welcome to our Forum!

    Nice to hear, that it works for you.

    Have a great day.

    Kind Regards,

    Tobias
  • itxnc
    itxnc Posts: 98  Ally Member
    First Anniversary 10 Comments Friend Collector
    Answer ✓
    AWESOME - thank y'all for such quick response. Fixed the CPU hog issue AND I can get back into the ATP tab. Excellent!

Security Highlight