USG310 self reboot problem

Hi all,

My USG310 have a self reboot problem with starting around April 2021 after upgraded to firmware version V4.62(AAPJ.0). The self reboot happened every 6 to 20 days randomly, but, interestingly, always at the scheduled time when it started generating the daily report. I have two units of USG310 configured in HA pair, and strangely only the primary unit has the self reboot problem. The slave unit has never self rebooted although it ran the same version of firmware and also generating the same e-mail report at the same schedule time everyday.

Then, I have upgraded both units to firmware version V4.65(AAPJ.0) in August, but the reboot problem still persisted with similar interval. Then, I replaced the primary unit with another spared USG310 running the same V4.65(AAPJ.0) firmware about two weeks ago. But the new unit has also self rebooted just several days ago, also at the same time as the scheduled report generation timing.

Any suggestions ?

Thanks.

All Replies

  • USG_User
    USG_User Posts: 369  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    Did you adjust any scheduled reboot under MAINTENANCE > SHUTDOWN/REBOOT ?
  • mMontana
    mMontana Posts: 1,298  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Is your device connected to an UPS?
  • Hi USG_USER,

    If it is a scheduled reboot, then it should occur in regular interval. But the case I encountered is just random in date. Anyway, I have counter checked the "MAINTENANCE" -> "SHUTDOWN/REBOOT" setting, and confirmed that it has not been enabled.
  • mMontana said:
    Is your device connected to an UPS?

    Hi mMontana,

    Yes, the device is connected to UPS. In fact, it is installed in the data center, which provide the redundant power supply to all the equipment. Besides, I have tried swapping the power cord, and thus the power connection point, between the HA pair of USG-310. And it is still that same primary unit that got rebooted, while the secondary unit nor any other equipment in the rack has similar reboot event occurred.
  • mMontana
    mMontana Posts: 1,298  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Hi @raymondf, thanks a lot for explaining all the steps and the care that is necessary to avoid the lack of power to the device.
    The details of the diligence you had to start troubleshooting the issue can be really important for people who's lacking the experience for managing this kind of devices :-)

    Moreover...
    The presence of a secondary unit in HA tells me that also the temperatures of the rack are compatible with running an USG device without issues, erasing also the temperature problem out of the list.

    I am no electronic expert, but my personal suspect is that one of the components of the board (condenser, PWM or PFM, ram chip) might have some issues. Other possible option could be the configuration, which seems... odd. Due to the HA partner available which should share almost 100% of the configuration with the self-rebooting one. And does not have the issue.

    So, a little... headache test: are you in condition to swap the devices and the configurations for use the secondary as primary device? Incidentally, the translation from secondary to primary device should avoid the problems you're having when producing the daily report.
  • CHS
    CHS Posts: 177  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    You can connect serial cable to collect message, and share it to us.
    In almost case, system will bump message before unexpected reboot.
  • To CHS,

    Thanks for the suggestion, and I shall arrange so and post the output later.

    Besides, as further information, I have changed the daily report generation time of both units to 30 minutes earlier starting last Friday. Then, just on last weekend, the primary unit has rebooted again at exactly the timing of the new report generation time. This shows the reboot problem is closely related to the report generation task. Just don't understand why the secondary unit doesn't have this problem though.
  • (Seems missing message from CHS)

    "hmm....that could be the reason. But how do you change daily report generation time? (It seems fixed at 00:00. so I always received daily report before 00:05AM…"

    Yes, I can configure the time of generating the daily report from "Configuration" -> "System" -> "Notification" -> "Mail Server" -> "Schedule".



  • CHS
    CHS Posts: 177  Master Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited November 2021
    In previous message, I would like to ask how to configure schedule of daily report.
    But I already found that, so I removed message.     lolololol......
    Thanks for your hint again.  :)
    Did your device reboot again during system generating daily reporter?
  • Hi all,

    The device has just rebooted again last Saturday, with the following console log being recorded :
    Oops[#1]:
    <br>CPU: 4 PID: 0 Comm: swapper/4 Tainted: P         C O 3.10.87-rt80-Cavium-Octeon #2
    <br>task: 800000010f4c6040 ti: 800000010f4d4000 task.ti: 800000010f4d4000
    <br>$ 0   : 0000000000000000 0000000000000010 0000000000000000 ffffffffc04634c0
    <br>$ 4   : ffffffffc04634d0 0000000000000000 ffffffffc04634d0 0000000000000000
    <br>$ 8   : ffffffffc04631e0 ffffffffc04634d0 c00000000f0b41aa 0000000000000000
    <br>$12   : 0000000000000020 ffffffff80a452ec ffffffffc07304f8 ffffffffc04a0000
    <br>$16   : ffffffffc04634d0 0000000000000000 ffffffffc04634d0 c00000000efc3ea0
    <br>$20   : 000000001202b906 0000000000000000 0000000000000000 80000000fce5dc18
    <br>$24   : ffffffffc04a0000 ffffffff8093f8c0                                  
    <br>$28   : 800000010f4d4000 800000010f4d6fa0 ffffffffc0460000 ffffffff80a4d324
    <br>Hi    : 0000000000045600
    <br>Lo    : 0000000000001c01
    <br>epc   : ffffffff80a4d208 __list_add_debug+0x18/0xb8
    <br>    Tainted: P         C O
    <br>ra    : ffffffff80a4d324 __list_add+0x24/0x58
    <br>Status: 10009ce3        KX SX UX KERNEL EXL IE 
    <br>Cause : 00800008
    <br>BadVA : 0000000000000000
    <br>PrId  : 000d9202 (Cavium Octeon II)
    <br>Modules linked in: fastpath_prearray(PO) adt7463(O) option cdc_acm huawei_cdc_ncm cdc_mbim qmi_wwan cdc_wdm cdc_ncm rndis_host cdc_ether sierra usb_wwan usbserial cls_user(O) kbwm(PO) bonding kuser_info(PO) zy_mss(O) xt_zy_TCPMSS(O) zld_wdt(O) nf_nat_sip(O) nf_conntrack_sip(O) conntrack_flush(O) ZyAntiVirus(O) zld_fileidentify(PO) zld_utm_action(PO) zld_cloud_query(O) as_kmodule(O) qsearch_bm(O) qsearch_skeleton(O) qsearch(O) zld_vti(PO) zld_ioctl(PO) fqdn_object(PO) zld_ftps_alg_helper(O) nf_nat_ftp(O) nf_conntrack_ftp(O) conn_check(O) quicksec(PO) ilb_llf(PO) ilb_wrr(PO) ilb_dns(O) broadweb_turnkey(PO) IDP(PO) zld_av_module_wbl(PO) zld_adp(PO) ADP(PO) broadweb_turnkey_debug(PO) zyav_statistics(PO) app_statistics(PO) idp_statistics(PO) xt_dns(O) xt_ZYRELOGIN(O) arpt_proxy(O) xt_zydns_passthrough(O) iptable_zynac(PO) iptable_nat_over_ipsec(O) vpn_concentrator6(O) vpn_concentrator(O) ip6table_vpnid(O) iptable_vpnid(O) zy_ipsec_conn(O) xt_TUNNELID(O) xt_tunnelid(O) xt_zysession_limit(O) xt_zysession_login(O) ta_block(O) xt_zyzone(PO) ip6table_zyfilter6(O) iptable_zyfilter(O) xt_ZYDROP(O) xt_ZYACCEPT(O) xt_ZYFIRE(O) xt_SECURE_POLICY(O) xt_zyislocal(O) xt_asymmetrical_route(O) xt_zyfromlocal(O) policy_reset(O) routing_alive(PO) doll_netdev(PO) klink_updown(O) cfilter_kmodule(PO) zld_sslinsp(PO) cryptosoft(O) zld_aead_ciphers(O) zld_utm_manager(PO) configfs_utm(PO) configfs(PO) zld_oom_guard(PO) ZyParser_POP3(PO) ZyParser_SMTP(PO) ZyParser_FTP(PO) ZyParser_HTTP(PO) ZyParser(PO) zypktorder(PO) zyinetpkt(O) zld_arp_seal(PO) zld_arp(PO) ipmacbinding(PO) xt_zysso_nonhttptarget(PO) xt_zysso_httptarget(PO) xt_ZYSSO6(PO) xt_ZYSSO(PO) zy_reset(O) hook_zyfrag_ipv6(O) hook_zydefrag_ipv6(O) ip6table_zymark(O) iptable_zymark(O) xt_MARKBWM(O) usb_reset(O) hook_zyping(O) iptable_zyssu(O) xt_zyvpnid_check(PO) xt_zysession_status_update(PO) xt_zydev(PO) xt_BUILTIN_SERVICE(O) zld_forward_hook(O) zld_route_multipath(PO) ipt_ZYDNAT(O) ipt_ZYNETMAP(O) ipt_ZYNOLSNAT(O) xt_nat_loopback(O) xt_set(O) ip_set_zyport(O) ip_set_zyip(O) ip_set_list_set(O) ip_set_hash_netportnet(O) ip_set_hash_netport(O) ip_set_hash_netnet(O) ip_set_hash_netiface(O) ip_set_hash_net(O) ip_set_hash_ipportnet(O) ip_set_hash_ipportip(O) ip_set_hash_ipport(O) ip_set_hash_ipmark(O) ip_set_hash_ip(O) ip_set_bitmap_port(O) ip_set_bitmap_ipmac(O) ip_set_bitmap_ip(O) ip_set(O) xt_geoip(O) sslvpn(O) zld_devinet(O) nf_traffic_detect(O) nf_report(O) xt_traffic_flow(O) fastpath_kmodule(PO) zld_pkt_manager(PO) zyiface_lib_module(O) zy_bridge_iface(PO) cryptocteon(PO) zld_alg_sip_log(PO) zld_disklog(O) zld_conntrack_data(O) zyklog_kmodule(PO) geoip_database(PO) zld_mrd(PO) sw_cn60xx(PO) switchdev_char(PO) switchdev(PO) platform_support(PO)
    <br>Process swapper/4 (pid: 0, threadinfo=800000010f4d4000, task=800000010f4c6040, tls=0000000000000000)
    <br>Stack : ffffffffc0460000 ffffffff80a4d324 ffffffffc0460000 80000000bf650380
    <br>          c00000000f0b41aa ffffffffc0468090 000000006197ee94 00000000263b81db
    <br>          0000000000000000 0000000000008020 000000006197ee94 00000000263b81db
    <br>Call Trace:
    <br>[<ffffffff80a4d208>] __list_add_debug+0x18/0xb8
    <br>[<ffffffff80a4d324>] __list_add+0x24/0x58
    <br>[<ffffffffc0468090>] app_statistics_update+0x6d8/0x8b0 [app_statistics]
    <br>[<ffffffffc072f520>] action_handler+0x310/0x12e8 [broadweb_turnkey]
    <br>[<ffffffffc0735bd8>] FH_bw_turnkey_forward_hook+0x10a0/0x1b38 [broadweb_turnkey]
    <br>[<ffffffffc00af4c4>] rtcompl_hook+0x1cc/0x498 [fastpath_kmodule]
    <br>[<ffffffff80bc697c>] nf_iterate+0xf4/0x4f8
    <br>[<ffffffff80bc6e0c>] nf_hook_slow+0x8c/0x200
    <br>[<ffffffff80bec724>] ip_forward+0x45c/0x488
    <br>[<ffffffffc00af628>] rtcompl_hook+0x330/0x498 [fastpath_kmodule]
    <br>[<ffffffff80bc697c>] nf_iterate+0xf4/0x4f8
    <br>[<ffffffff80bc6e0c>] nf_hook_slow+0x8c/0x200
    <br>[<ffffffff80beaba8>] ip_rcv+0x2f8/0x410
    <br>[<ffffffff80b8cd68>] __netif_receive_skb_core+0x500/0x618
    <br>[<ffffffff80b908f0>] process_backlog+0xa8/0x190
    <br>[<ffffffff80b91184>] net_rx_action+0x14c/0x228
    <br>[<ffffffff808a3768>] __do_softirq+0x1c8/0x210
    <br>[<ffffffff808a3878>] do_softirq+0x68/0x70
    <br>[<ffffffff808a3ec8>] irq_exit+0x68/0x78
    <br>[<ffffffff8080a4b4>] plat_irq_dispatch+0x3c/0xb8
    <br>
    <br>
    <br>Code: dcc20008  1445000d  00a0382d <dc480000> 14c8001c  7086282a  0080182d  7044202a  00a42025 
    <br>---[ end trace de16b73d1b669eef ]---
    <br>Kernel panic - not syncing: Fatal exception in interrupt
    <br>panic_notify_sys Log Date: 1637346965
    <br>Firmware Version:   4.65(AAPJ.0)|2021-07-04 01:16:57
    <br>Kernel Info Collector: detect system crashed, store information in disk.
    <br>Mem-Info:
    <br>DMA32 per-cpu:
    <br>CPU    4: hi:  186, btch:  31 usd: 163
    <br>Normal per-cpu:
    <br>CPU    4: hi:  186, btch:  31 usd: 111
    <br>active_anon:33971 inactive_anon:15934 isolated_anon:0
    <br> active_file:18235 inactive_file:37848 isolated_file:0
    <br> unevictable:12 dirty:0 writeback:0 unstable:0
    <br> free:717552 slab_reclaimable:4638 slab_unreclaimable:78002
    <br> mapped:10925 shmem:17715 pagetables:1205 bounce:0
    <br> free_cma:0
    <br>DMA32 free:2866940kB min:17860kB low:22324kB high:26788kB active_anon:109260kB inactive_anon:15268kB active_file:38548kB inactive_file:104604kB unevictable:48kB isolated(anon):0kB isolated(file):0kB present:3658728kB managed:3516596kB mlocked:48kB dirty:0kB writeback:0kB mapped:30508kB shmem:15876kB slab_reclaimable:11780kB slab_unreclaimable:169916kB kernel_stack:5088kB pagetables:3776kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    <br>lowmem_reserve[]: 0 502 502
    <br>Normal free:3268kB min:2616kB low:3268kB high:3924kB active_anon:26624kB inactive_anon:48468kB active_file:34392kB inactive_file:46788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:515064kB managed:515064kB mlocked:0kB dirty:0kB writeback:0kB mapped:13192kB shmem:54984kB slab_reclaimable:6772kB slab_unreclaimable:142092kB kernel_stack:2976kB pagetables:1044kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    <br>lowmem_reserve[]: 0 0 0
    <br>DMA32: 881*4kB (UEM) 589*8kB (UEM) 1587*16kB (UEM) 387*32kB (UEM) 475*64kB (UEM) 240*128kB (UEM) 58*256kB (UEM) 35*512kB (UEM) 17*1024kB (UM) 15*2048kB (UM) 654*4096kB (UMR) = 2866812kB
    <br>Normal: 501*4kB (UEM) 150*8kB (UE) 4*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3268kB
    <br>73811 total pagecache pages
    <br>0 pages in swap cache
    <br>Swap cache stats: add 0, delete 0, find 0/0
    <br>Free swap  = 0kB
    <br>Total swap = 0kB
    <br>1045760 pages RAM
    <br>49578 pages reserved
    <br>63317 pages shared
    <br>213537 pages non-shared
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>writeLogtoDisk [534]: Failed to read CB sector: ret = 4
    <br>Rebooting in 5 seconds..
    <br>
    <br>
    <br>U-Boot 2011.03 (Development build, svnversion: u-boot:438:439M, exec:exported) (Build time: Jun 16 2014 - 17:35:14)
    <br>
    <br>
    <br>BootModule Version: V1.10 | Jun 16 2014 17:35:14
    <br>DRAM: Size = 4096 Mbytes
    <br>
    <br>
    <br>Press any key to enter debug mode within 3 seconds.
    <br>............................................................
    <br>Start to check file system...
    <br>/dev/sda6: 512/20480 files (0.8% non-contiguous), 73378/81920 blocks
    <br>/dev/sda7: 212/23040 files (12.7% non-contiguous), 64857/92160 blocks
    <br>Done
    <br>Kernel Version: V3.10.87 | 2021-07-04 00:46:02
    <br>ZLD  Version: V4.65(AAPJ.0) | 2021-07-04 01:16:57
    <br>
    <br>INIT: version 2.86 booting
    <br>Initializing Debug Account Authentication Seed (DAAS)... done.
    <br>Setting the System Clock using the Hardware Clock as reference...System Clock set. Local time: Fri Nov 19 18:36:59 UTC 2021
    <br>
    <br>INIT: Entering runlevel: 3
    <br>Insmod ZYKLOG Module. Starting zylog daemon: zylogd  zylog starts.
    <br>Starting syslog-ng secu-reporter.
    <br>Starting ZLD Wrapper Daemon....
    <br>Starting uam daemon.
    <br>Starting myzyxel daemon.
    <br>Starting periodic command scheduler: cron. 
    <br>Start ZyWALL system daemon....
    <br>Starting link_updown daemon.
    <br>Check signature package
    <br>Check av signature package
    <br>ADP version 3.12 loaded
    <br>Cloud Query Daemon Start!
    <br><br>
    Any suggestions ? Sounds like just typical kernel panic to me. As I could receive the daily report from that day, suppose the crash should occur just after the report generation.

    Thanks.




Security Highlight