APs randomly resetting despite config "up to date"

Markus_H Posts: 8
First Comment Third Anniversary
edited April 2021 in Nebula
My NWA1123-AC PRO APs seem to randomly reset every few hours for no reason (no config changes, config up to date).

I see lots of "ZON", "System" and "Cloud-auth" events like the following:

No response from NCAS over 15 seconds: NCAS disconnected
NCAS connected: 204 Server is alive

Netconf will do the DHCP renew for recovery.
Netconf connection is disconnected. Relevant to system network issue.

ZDP: initial port 1 control block

Radios are reset/restarted, Clients disconnected and reconnected, initial DCS takes place and lots of things like those. 

Just annoying at the moment.
Happening on 2 NWA1123-AC PRO APs with the same config and latest firmware.

Any ideas?

All Replies

  • Zyxel_Dean
    Zyxel_Dean Posts: 237
    25 Answers First Comment Friend Collector Fourth Anniversary
     Zyxel Employee
    Hi Markus,

    Could you provide the eventlogs in for the last 12 hrs(with no filters applied)? you can either export or copy them into an excel file if you're not pro pack.

    Aside the 2 APs you mentioned, are there more APs in your site that is working normal?

  • Logs attached as .xlsx File, no pro pack.
    Unfortunately no other APs in my organisation.
    I also need to emphasize that this was definitely not the case before the FW upgrade applied to the APs yesterday at about 11:30.
  • Zyxel_CSO
    Zyxel_CSO Posts: 337
    5 Answers First Comment Friend Collector Sixth Anniversary
     Zyxel Employee
    Hey Marcus,

    Did you have smart mesh enabled?
    I noticed that there was DFS logs detecting radar signals, I think you have to try avoid using dfs channels first because according to regulations WIFI has to be stopped in order to not cause interference in that fequency. 
    Could you try to configure in radio settings page to avoid DFS channels?
  • Hi!
    1) No, smart mesh has never been enabled.
    2) Yes, sometimes one of the APs detects DFS radar signals, but this has never caused that strange behaviour before. 

    I have seen that another FW upgrade has been released meanwhile, so for the moment I have updated to the latest FW in order to see if this solves the issue. 
    Setting the APs to avoid DFS channels will be the net step. I will update the thread as soon as I have any further information. 

    Is there a changelog for the latest FW version (released after 27 Sept available?
  • Zyxel_Dean
    Zyxel_Dean Posts: 237
    25 Answers First Comment Friend Collector Fourth Anniversary
     Zyxel Employee
    edited October 2018
    Hi Markus,

    The cloud-auth logs , NCAS related are an indication of the connection to the authentication server. If you are not using cloud auth related features, you can just think of it as connectivity check.

    For the system releated ones , the dhcp recovery you mentioned are the AP triggering DHCP renewal and trying to connect back to NCC. It's a recovery feature triggered if the AP cannot establish a connection to NCC for 3 min. 
    Usually this relates to internet connectivity issues for that the AP is not available to dial a session back to NCC. But in your case as I saw some mesh related logs, you should check your LAN connectivity. The AP will try to find an uplink through mesh if the LAN connection to the gateway is unavailable for 3 min, meaning there are sometimes LAN network issues in the environment.

    My suggestions are, you should check the LAN first ensuring the uplink to the gateway is valid and stable. We had a customer recently having similar logs and symptoms leading him to find a faulty cable in the uplink.  

    and lastly the zon logs are irrelevent you can just ignore them.
  • Re the system related ones - I have the same issues on two different APs as you can see from the logs. These are connected to different parts of my LAN not sharing any cabling segment between themselves and the router. Re internet connectivity - there is definitely no issue at all, I have NEVER EVER any issues connecting to anywhere, including audio and video streaming services etc. 

    Apart from that, the issues only appeared after the firmware / Nebula upgrade 2 weeks ago, I have never had any issues like those before. 

    What I can see is that the issues happen rather often for quite a while and then disappear for hours. No other connectivity issues during both periods of time, with or without these recovery issues.

    To be honest, to me this looks a bit like an availability problem of the Nebula / NCC servers ...
  • Zyxel_Dean
    Zyxel_Dean Posts: 237
    25 Answers First Comment Friend Collector Fourth Anniversary
     Zyxel Employee
    Well, lets exclude things one at a time.
    Lets try turn on "avoid DFS" channel to see if clients get disconnected afterwards.

    Some additional info, what are you using as uplink? did you have anything that is blocking ARP ping to the gateway?
  • Avoiding DFS channels has not changed anything. 
    No ARP ping blocking in place, everything is allowed in my internal network. 
    About one hour ago connection was lost again and recovered about 15 mins afterwards according to my logs. 
    Traffic in my wired network was not affected. 
  • Zyxel_Dean
    Zyxel_Dean Posts: 237
    25 Answers First Comment Friend Collector Fourth Anniversary
     Zyxel Employee
    The disconnection was probably something with our server at that time, my test kit also got interrupted, seem recovered now.

    I suggest try giving the APs static IP just to resolve the dhcp logs we're seeing. Maybe this can stable the management of the AP.

    BTW, did you happen to have a fritzbox as the uplink/gateway? 
  • I think there are some stability / availability issues with your servers, this is what I have mentioned above. 
    The uplink router is a OpenSuse Leap 15 box which runs like a charm so I do not suspect this as an issue. 
    Basically I have not seen any other resets over the past few days.

    The only remaining issue now is the one handled in another thread (not created by myself) regarding "Station: xxx has deauth by STA Leave(L2UPFrame)". All my clients to so from time to time, all types mixed, i.e. Android phones, iPhones, Windows 10 PCs, smart home devices with unknown OS ... I am almost convinced that this has something to do with the actual AP firmware as I have seen that before only sometimes when clients roamed from one AP to another. At the moment I see that almost constantly for different clients, even the ones not moving at all ...
    Any ideas on that?

Nebula Tips & Tricks