XGS4600 - 100% CPU Usage every 3 Minutes

Bash
Bash Posts: 8
First Comment Friend Collector
edited August 26 in Switch
We are seeing 100% CPU usage on our XGS4600 stack running 4.40(ABBI.2) as shown below.

stack-1# show cpu-utilization 
  CPU usage status:  16.35%
   baseline 7903907 ticks
   sec   ticks   util sec   ticks   util sec   ticks   util sec   ticks   util
   --- ------- ------ --- ------- ------ --- ------- ------ --- ------- ------
     0 1292905  16.35   1 1287120  16.28   2 1092512  13.82   3 1363933  17.25
     4 1384941  17.52   5 4911451  62.13   6 7903907 100.00   7 7903907 100.00
     8 7903907 100.00   9 7903907 100.00  10 7903907 100.00  11 4285640  54.22
    12 1239354  15.68  13 1218278  15.41  14 1327014  16.78  15 1198134  15.15
    16 1176452  14.88  17 1335672  16.89  18 1166890  14.76  19 1241865  15.71
    20 1333383  16.86  21 1274148  16.12  22 1322128  16.72  23 1240243  15.69
    24 1277536  16.16  25 1320061  16.70  26 1541875  19.50  27 1211225  15.32
    28 1223255  15.47  29 1381283  17.47  30 1194776  15.11  31 1107645  14.01
    32 1381248  17.47  33 1267580  16.03  34 1187810  15.02  35 1333827  16.87
    36 1363198  17.24  37 1162439  14.70  38 1256766  15.90  39 1159174  14.66
    40 1143666  14.46  41 1334200  16.88  42 1247239  15.78  43 1139834  14.42
    44 1356287  17.15  45 1325621  16.77  46 1062607  13.44  47 1388991  17.57
    48 1294098  16.37  49 1166652  14.76  50 1323464  16.74  51 1329905  16.82
    52 1206596  15.26  53 1279128  16.18  54 1077675  13.63  55 1217872  15.40
    56 1296931  16.40  57 1299900  16.44  58 1220372  15.44  59 1237258  15.65
    60 1248616  15.79  61 1049844  13.28  62 1267867  16.04

ime :     2:53:55 ========== show logging                   ================= msclock :10435670

     1 Feb 19 13:35:08 IN authentication: Telnet user admin login [IP address = 192.168.50.253]
     2 Feb 19 13:33:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 127.
     3 Feb 19 13:30:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 206.
     4 Feb 19 13:29:32 IN authentication: Telnet user admin login [IP address = 192.168.50.253]
     5 Feb 19 13:27:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 183.
     6 Feb 19 13:24:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 226.
     7 Feb 19 13:21:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 279.
     8 Feb 19 13:18:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 146.
     9 Feb 19 13:17:38 IN authentication: Telnet user admin logout [IP address = 192.168.50.253]
    10 Feb 19 13:15:49 IN authentication: Telnet user admin login [IP address = 192.168.50.253]
    11 Feb 19 13:15:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 111.
    12 Feb 19 13:12:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 206.
    13 Feb 19 13:09:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 197.
    14 Feb 19 13:06:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 228.
    15 Feb 19 13:03:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 236.
    16 Feb 19 13:00:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 165.
    17 Feb 19 12:59:19 IN authentication: Telnet user admin login [IP address = 192.168.50.253]
    18 Feb 19 12:57:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 216.
    19 Feb 19 12:54:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 252.
    20 Feb 19 12:51:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 205.
    21 Feb 19 12:48:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 155.
    22 Feb 19 12:45:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 152.
    23 Feb 19 12:42:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 172.
    24 Feb 19 12:39:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 151.
    25 Feb 19 12:36:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 412.
    26 Feb 19 12:33:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 132.
    27 Feb 19 12:30:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 121.
    28 Feb 19 12:27:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 228.
    29 Feb 19 12:24:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 249.
    30 Feb 19 12:21:14 IN authentication: SSH user admin logout [IP address = 192.168.50.51]
    31 Feb 19 12:21:11 IN authentication: SSH user admin login [IP address = 192.168.50.51]
    32 Feb 19 12:21:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 233.
    33 Feb 19 12:18:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 215.
    34 Feb 19 12:15:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 169.
    35 Feb 19 12:12:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 147.
    36 Feb 19 12:09:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 305.
    37 Feb 19 12:06:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 187.
    38 Feb 19 12:03:03 IN system: CPU utilization is over 100 and keep 5 seconds, driver count = 174.

This is affecting monitoring of our switch as it will not poll from SNMP during high CPU load.
How can we find out what process is causing these CPU spikes?

Best Answers

All Replies

  • Bash
    Bash Posts: 8
    First Comment Friend Collector
    Hi @Zyxel_Lucious

    Many thanks for the answer. Going forward is there any other commands we can run besides "show cpu-utilization" and looking through the logs?

    It would be handy to know exactly was process is causing the high load and do a but more self diagnosis.

    Thanks,
    Ben
  • telta
    telta Posts: 1
    First Comment

    Hi,

    we have the same output to syslog as told above with version 4.60 like

    2019-06-26T13:59:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 33.

    2019-06-26T13:59:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 36.

    2019-06-26T13:58:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 12.

    2019-06-26T13:58:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 13.

    2019-06-26T13:58:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 8.

    2019-06-26T13:58:31+02:00 xyz.telta info -- system: CPU utilization is over 100 and keep 5 seconds, driver count = 75.

    The version is:

     Current ZyNOS version : V4.60(ABBI.0) | 11/26/2018

     Image 1 ZyNOS version : V4.60(ABBI.0) | 11/26/2018

     Image 2 ZyNOS version : V4.50(ABBI.1) | 09/11/2017

    Btw. an instance of Observium is running against the Switch

  • Zyxel_Derrick
    Zyxel_Derrick Posts: 76
    5 Answers First Comment Friend Collector First Anniversary
     Zyxel Employee

    Hi @telta


    Welcome to Zyxel community

    May I know if you have connected many transceivers on the switch but without fiber cable or the link is down?

    And, when you use Observium to poll switch, the switch's CPU rises to 100%?

    Also, may I know the frequency of polling switch?

    Thanks


    Best regards,

    Zyxel_Derrick