IPSec with IKEv2: Connection drops when rekey IKE

thom_ko
thom_ko Posts: 16  Freshman Member
First Comment Friend Collector Second Anniversary
edited April 2021 in Security

I configured a VPN connection using this article

As client I'm using the native VPN-Client on the Mac. I'm able to open the VPN tunnel on the Mac and use the connection without any issues. Extactly 8 Minutes (480s) after establishing the connection the Mac request a rekey. This causes the connection to drop.

I'm having the same issue when I connect from my iPhone instead from my Mac. As mentioned in the following link I used the same proposals as for the Mac:

I tried setting the SA-Lifetimes of both Phase 1 as well as Phase 2 to 480s without success; The connection still drops after 8 minutes.

I'm expecting the connection to be established until I disconnect.

I can confirm that I haven't had this issue before. Unfortunately I can't tell wether it got introduced with an update of MacOS/iOS or the router firmware.

--

Router: USG20W-VPN

Firmware-Version: 4.35(ABAR.3)

Client: Mac OS Catalina 10.15.3 or iOS 13.3

«1

All Replies

  • Zyxel_Jerry
    Zyxel_Jerry Posts: 1,324  Zyxel Employee
    Zyxel Certified Network Administrator - Security Zyxel Certified Sales Associate 50 Answers 1000 Comments

    Hi @thom_ko

    Can you help to create a user and private message your certificate for us?

    I would like to build up a tunnel with the device and have a check on it. 

  • warwickt
    warwickt Posts: 111  Ally Member
    5 Answers First Comment Friend Collector Third Anniversary
    edited March 2020

    Hi @thom_ko assuming you are using the native  MAcOS client for IKEv2 , you might consider diagnosis from the  MacOS 10.15 client first by gathering and examining the MacOS logs in detail when the failure.

    Secondly, also setting debug login for the VPN server in the USG20VPNW appliance.

    1)  Mac OS IKEv2 client logging:

    These logs have been buried in the unified log data base since  MAcOS 10.12....

    The logs are numerous and have a great deal of detail if the --info and --debug options are used to get the specific range of logs.

    This may be of assistance.

    Example to get  MacOS IKEV2 nesessionmananger client logs (not racoon !!...)
    • For demonstration purposes, the IKEV2 connection drops out at local time 2020-03-25 at 17:50 +/- for example.
    • you want to example all the MacOS logs before and after this event.
    • open a Terminal.app session and use this format of the MacOS log command
    lab-macpro-14:~ warwick$ log show --start '2020-03-25 17:49:00' --end '2020-03-25 17:51:00' --info --predicate 'senderImagePath contains[cd] "NetworkExtension"'
    

    Optionally redirect it to a .log file...

    lab-macpro-14:~ warwick$ log show --start '2020-03-25 17:49:00' --end '2020-03-25 17:51:00' --info --predicate 'senderImagePath contains[cd] "NetworkExtension"' >> ~/Desktop/ikev2_failure.log
    

    Additionally for even more detail add the --debug operand to the log command for the most verbose detail. Here's an example: .. the parameters are not positional...

    log show --start '2020-03-25 17:49:00' --end '2020-03-25 17:51:00' --info --predicate 'senderImagePath contains[cd] "NetworkExtension"' --debug
    

    2) Gather/Examine ZYxel USG20VPN IKE logs system-logging

    Temporarily set debugging logs in the usg20W-vpn.. IKE DEBUG logs in the system logs.

    You can use the WEB UI Configuration/ Log/ to enable debugging for IKE or better .. use a CLI via ssh ....

    Router# logging system-log category ike level all 
    

    Assuming this error is reproducible, then prepare to gather the logs from the USG20VPNW router when this happens . Care as there is a limit of internals logs and these are purges/over written..

    • ?TIP: You may consider using an external syslog server to catch all router logs..

    When the error occurs use the cli to gather the IKE logs & IKE DEBUG then

    Router# show logging entries category ike 
    

    then gather the IKE DEBUG logs if there any .

    Router# show logging debug entries category ike
    

    You'll need to be quick about it!

    Get these logs on your mac and examine them at and around the time of the issue.

    Look for nebulous issues that may indicate the CERTIFICATE is crook .. sadly these come in all distracting shapes however PHASE 1 is where to look first.

    Lastly:

    1. Remember to set reset the USG20VPNW logging back to normal of completely disable it.
    2. We have noticed that the LOGGING may consume noticeable resources when all enabled a the router is very busy- especially if there are a few tunnels in use AND reasonable data transfers.

    FWIW, our  iOS 13.3.1 device that use IKEv2 User/Password and Certificate maintain a solid connection with several USGXX appliances.

    No data on  MacOS 10.15 Catalina ... (the later internally may be a crock. .)

    I would be interested in your resolution - please post

    HTH

    Warwick

    Hong Kong

  • thom_ko
    thom_ko Posts: 16  Freshman Member
    First Comment Friend Collector Second Anniversary

    Thanks @warwickt for your help in order to debug it.

    I was looking at the logs on the Mac and found the log-entry that causes the disconnection:

    2020-03-28 19:53:05.328339+0100 0x8838  Error   0x0         1003 0  NEIKEv2Provider: (NetworkExtension) [com.apple.networkextension:] IKEv2Session[1, F620146EAF9E4FEB-9D6E364F8B0511ED] last request still outstanding

    But I don't really understand why this happens.

    I attached the full log showing the rekey process as well as the disconnection. Please note that I replaced the Zyxel's URL to myown.domain.tld and the public IP-address to 12.34.56.78.

    Who can assist?

  • warwickt
    warwickt Posts: 111  Ally Member
    5 Answers First Comment Friend Collector Third Anniversary

    Hi thom_ko just a heads up.. thanks for the reply. I am looking at your log from  MacOS's NEIKEv2Provider . Interestingly I have this rekeying also from any MacOS VPN IKEv2 client that happens as you note EVERY Eight (8) Minutes!

    I had not noticed it in our own USG routers nor those of our clients. (oops!) . ?

    for example in the USG appliance this is from a  MAcOS 10.12.5 client (interesting!)

    The rekeying is actually at 8 minute interval rekey is actually deliberately implemented by Apple in the MacOS IKEv2 client end by the looks.

    Mar 30 16:28:06 usg40 src="21X.XX2.14.X0: 4500" dst="XX4.0xx.xx6.x33:44251" msg="Tunnel: Client02_IKEv2_LAB_04_CONNECTI(#9) rekeying" note="IKE_LOG" user="unknown" devID="1c740dfec31c" cat="IKE"
    Mar 30 16:28:06 usg40 src="21X.XX2.14.X0: 4500" dst="0.0.0.0:44251" msg="[Rekey][In]Tunnel[Client02_IKEv2_LAB_04_CONNECTI](#9), lifetime; 28820, lifebytes: 4294967295" note="IPSec" user="unknown" devID="1c740dfec31c" cat="IPSec"
    Mar 30 16:28:06 usg40 src="0.0.0.0: 0" dst="0.0.0.0:0" msg="IPsec SA [Responder, rekey, NAT-T, tunnel] negotiation completed:" note="IKE_LOG" user="unknown" devID="1c740dfec31c" cat="IKE"
    Mar 30 16:28:06 usg40 src="21X.XX2.14.X0: 4500" dst="0.0.0.0:44251" msg="[Rekey][In]Tunnel[Client02_IKEv2_LAB_04_CONNECTI](#9), lifetime: 28820, time:1585556886, hard_expire: 1585585706" note="IPSec" user="unknown" devID="1c740dfec31c" cat="IPSec"
    Mar 30 16:28:06 usg40 src="0.0.0.0: 0" dst="0.0.0.0:0" msg="This rule is a rekey (new: #9, old: #9)" note="IPSec" user="unknown" devID="1c740dfec31c" cat="IPSec"
    Mar 30 16:28:06 usg40 src="21X.XX2.14.X0: 4500" dst="XX4.0xx.xx6.x33:4500" msg="Dynamic Tunnel [Client02_IKEv2_LAB_04_GATEWAY:Client02_IKEv2_LAB_04_CONNECTI:0x0ee4acbf] rekeyed successfully" note="IKE_LOG" user="unknown" devID="1c740dfec31c" cat="IKE"
    

    Further checking on the credible Apple resources points to a code "flaw" or some such in  MacOS since 10.11 .... who knows.. however it looks like "by design"..


    the MacOS VPN clients have/had/do have this symptom

    The Cause?: the problem could be a mis-configuration at phase2 IKEV2 in USG that MacOS VPN clients and their VPN client brethren utilise incorrectly.


    Your DH14 (mod2048) is correct as MacOS and iOS offer this in the client.

    What's your

    Phase2 (crypto-map connection) using for PFS ?... just curious (refer below)

    Your Log: I'm not an expert and may have incorrectly interpreted the following ..

    1) however I noticed this reply that found an existing (security Association id) SAID (7) that replies the Security Association is known... the id (SPI) of this is x'05F2622A' for SAID 7 which matches SA I understand that this should be sufficient to rekey on the existing ID. to continue normally ..

    see next line of the  MacOS debug log below

    2020-03-28 19:53:05.327272+0100 0x8838  Info    0x0         1003 0  NEIKEv2Provider: (NetworkExtension) [com.apple.networkextension:Large] <NEIPSecDB 0x7fae68410f80 [0x7fff8ed20690]> {UniqueIndex = 1} KernelSASession[1, IKEv2 Session Database] received notification 0x2200  
    


    2) Here is the existing Child SA (7) for this that are interested...

     Internal SAID = 7
      SPI = 05F2622A
      Direction = Inbound
      Mode = Tunnel
      Protocol = ESP
      Local = 192.168.8.100:4500
      Remote = 12.34.56.78:4500
      Bound Interface = en6
      Tunnel Interface = ipsec0
      Replay Window Size = 4
      Lifetime Seconds = 600
      Encryption Algorithm = 5
      Encryption Key Length = 32
      Authentication Algorithm = 3
      Authentication Key Length = 32
      NAT Traversal Enabled = YES
      NAT Detected on Peer = NO
      NAT Keepalive Enabled = NO
      NAT Keepalive Offload Enabled = NO
      NAT Keepalive Interval Seconds = 0
      NAT Keepalive Offload Interval Seconds = 0
      NAT Traversal Port = 4500
      NAT Traversal Source Port = 4500
      Sequence Per Traffic Class = NO
    

    3) However the next lot of lines request ANOTHER child SA by EXPIRING the current SAID 7 then it fails for what looks like a time out (as you point out)

    2020-03-28 19:53:05.327466+0100 0x8838  Default  0x0         1003 0  NEIKEv2Provider: (NetworkExtension) [com.apple.networkextension:Large] IKEv2Session[1, F620146EAF9E4FEB-9D6E364F8B0511ED] KernelSASession[1, IKEv2 Session Database] expire SA 
    

    Interesting list SAID7 EXPIRE request contains Sequence Per Traffic Class = NO childSA ChildSA[1, 05F2622A-CB4205AD]

    Again unsure, one may postulate the "NO childSA .." that the proposal for renewal did not match or it found the existing SAID 7 05F2622A corrupt?? (dunno)

    4) then it goes to get a replacement .. ... and timeout.. and boom..!


    What to do? - get some more diags..

    Would you be able to rerun this scenario and post the following for the error?

    • the new  MacOS client logs as you did above for the test
    • your Zyxel USG appliance's "IKE" logs for the event for the test.
      • enable VPN debug in the USG system logs ...
      • Router> show logging entries category ike
    • your VPN's ike GATEWAY and its crypto-map CONNECTION for your IKEv2 VPN connection your are using:
      • example:
        • Router> show ike policy Client02_IKEv2_LAB_04_GATEWAY
        • Router> show crypto map Client02_IKEv2_LAB_04_CONNECTI

    I am curious what cases this...

    Heres an example of a successful rekey from the DEBUG VPN logs in the USG router.

    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg="IKEv2 SA [Responder rekey, NAT-T] negotiation completed:" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Local Authentication Method : RSA signature" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Remote Authentication Method : (null)" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" IKE algorithms : aes256-cbc, hmac-sha256, hmac-sha256-128" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Diffie-Hellman : group 14 (2048 bits)" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Local IKE peer 21X.XX2.14.X0:4500 ID lab_04_somewhere.ddns.int (fqdn)" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Remote IKE peer x1x.xxx.156.xx3:44738 ID 10.183.185.72 (ipv4)" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Authenticated EAP identity warwick (unknown)" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Initiator SPI 4fa097f9 bf9fa5ee Responder SPI bc51d12e ff0fa9f7" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg=" Local Lifetime: 86400 seconds" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="0.0.0.0: 0" dst="0.0.0.0:0" msg="IKE SA negotiations: 18 done, 17 successful, 1 failed" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="21X.XX2.14.X0: 4500" dst="x1x.xxx.156.xx3:4500" msg="The cookie pair is : 0xa563edf4189a87a7 / 0x68e94ca915d884c0" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    Mar 30 20:39:46 myrouter src="21X.XX2.14.X0: 4500" dst="x1x.xxx.156.xx3:4500" msg="[REKEY] Send:[SAr2][NONCE][KE][NOTIFY]" note="IKE_LOG" user="unknown" devID="1c74xxxcxx1c" cat="IKE"
    

    Maybe the lads at Zyxel can help.

    HTH

    Warwick

    Hong Kong

  • thom_ko
    thom_ko Posts: 16  Freshman Member
    First Comment Friend Collector Second Anniversary

    Hi Warwick,

    thanks for your hints.

    Update

    When I was creating the initial certificate for the USG 20, I added the following information to the certificate:

    • Host Domain Name (which acts as CN and Subject Alternativ Name)
    • Town
    • State
    • Country

    This leaded the connection to be dropped during the first rekey - process, which happens after 8 Mins (480s).

    I then recreated the certificate, by just providing the follwing information:

    • Host Domain Name (which acts as CN and Subject Alternativ Name)

    Meaning I was leaving all optional information aside.

    Now the first time the rekey is happening after 8 mins seems to work, but the second time (after 16mins) the connection get's dropped again.

    This means I now can provide logs with a successful rekey process and a failed one. I think that should help finding the mistake.

    Logs

    Timestamps
    • Connection establishment: 05.04.2020 17:34:43 (t0)
    • Successful rekey: 05.04.2020 17:42:43 (t1)
    • Failed rekey: 05.04.2020 17:50:43 (t2)
    Remarks

    I modified the logs and changed the following information:

    • Public Dyndns URL: myown.domain.tld
    • Public IP: 12.34.56.78
    • Username of VPN-User: myvpnuser (in group: vpn-users)
    Logfiles

    VPN-Gateway / VPN Connection configuration

    MacOS Log

    Zyxel USG20-VPN VPN-Log

    @Zyxel_Jerry Could you please have a look at these logs and let me know if you need more information?

    I'm interested in having this solved....

  • warwickt
    warwickt Posts: 111  Ally Member
    5 Answers First Comment Friend Collector Third Anniversary

    Hi thom_ko just a few things from your logs and your USG ike and crypto configs:

    Log Analysis:

    From  MacOS logs at :
    2020-04-05 17:42:47.818353+0200 0x716f  Error   0x0         925  0  NEIKEv2Provider: (NetworkExtension) [com.apple.networkextension:] IKEv2Session[1, 7AC75DFD4F70F186-D92C130D731B2A3D] Failed to process Delete packet
    
    From USG IKE/IPSEC logs zyxelusg-vpn.rtf:
    ## Tunnel is SUCCESSFULLY rekeyed! 
    2020-04-05 17:42:47,12.34.56.78:4500,213.55.221.76:4500,info,ike ,IKE_LOG ,,,,Dynamic Tunnel [IKEv2:IKEv2:0x0298769d] rekeyed successfully
    
    ## then   ........
    
    2020-04-05 17:42:59,12.34.56.78:4500 ,213.55.221.76:20628 ,debug ,ike,IKE_LOG ,, ,, Received inbound invalidated event for tunnel[IKEv2(#3):IKEv2(#5)], SPI[0x9a7007d8], trd_idx: 16777238(0x01000016)
    
    Interestingly:

    The REYKEY is successful according to your logs on the USG - see above...

    .The MacOS IKE2 agent seems to be failed and tried to issue yet another SA with the usual credentials including IP addresses) and is gets whacked by the router..


    Maybe Worth a Look: (try it anyway) - USG Connection change...

    I noticed your the local policy for your Connection "crypto map IKEv2" is "LAN1_SUBNET", (grep -i "local policy" configuration.txt)

    ##Connection : "crypto map IKEv2"
    local policy: LAN1_SUBNET
    

    Because this is a client-to-site scenario , You might consider trying an "any IPV4" scenario Object such as: "all_ipaddresses for the Local Policy 0.0.0.0/24

    • subnet = 0.0.0.0/24
    • or range 0.0.0.0.-255.255.255.255
    Router> show address-object all_ipaddresses
    Object name           Type        Address             
    Note      Ref.
    ===============================================================================
    all_ipaddresses         SUBNET       0.0.0.0/24
            0
    Router>
    

    I have attached the section in your USG logs and coloured them where I found this. (unzip at your leisure..)


    Let me know how you go as I'm interested in see these resolutions as their problems are most abundant presently.

    HTH

    Warwick

    Hong Kong

  • thom_ko
    thom_ko Posts: 16  Freshman Member
    First Comment Friend Collector Second Anniversary

    Hi Warwick,

    thanks for your analizey. Altough I'm not sure if you're analyzed the right part of the log. Like I mentioned there's a successful rekey - procedure in the log, which can be found at 17:42:43.

    The problem happened 8 mins later at around 17:50:43 during the second rekey - procedure where the connection got dropped.

    USG-Connection Change

    You're right: LAN1_SUBNET is mapped to the following range 192.168.11.0 - 192.168.11.255.

    Like you suggested: I changed the Local-Policy in order to have all traffic routed through the VPN - without success. The problem remains the same. I didn't collect any logs this time tough.....

  • Zyxel_Jerry
    Zyxel_Jerry Posts: 1,324  Zyxel Employee
    Zyxel Certified Network Administrator - Security Zyxel Certified Sales Associate 50 Answers 1000 Comments

    Hi @thom_ko

    We saw the same symptom when building IKEv2 connection from Apple devices and found that it could be related to Apple IKEv2 behavior. Here it is the explanation for your reference.

    Or you may consider to implement L2TP instead of IKEv2 on Apple device?

    We’ll try to find a solution to get over the behavior on APPLE.

  • thom_ko
    thom_ko Posts: 16  Freshman Member
    First Comment Friend Collector Second Anniversary

    Hi Jerry,

    thanks for getting back to me. I looked at the link you provided and I'm pretty sure that this isn't the problem. If you're looking at the configuration file I posted on the 5. April you'll see that PFS is disabled in phase 2. Exactly like the response from Apple in the link posted suggested.

    I initially had an IKEv2 - Setup (with PSK, instead of Certificate combined with Username/Password which I'm using now) working from iOS and Mac OS to the USG20 for many years without problems. So it used to work! I started to have the connection drops after 8 mins after updating MacOS/iOS and/or the USG20.

    I went through my logs which I posted on the 5. april again:

    Mac OS Log: Successful rekey:

    2020-04-05 17:42:46.149099 MacOS is sending CREATE_CHILD_SA with Rekey SPI = 7AC75DFD4F70F186 (Message 7)
    2020-04-05 17:42:47.151237 MacOS retries sending the above Message 7
    
    2020-04-05 17:42:47.623052 MacOS is receiving reply to Message 7 from ZyxelUSG
    
    2020-04-05 17:42:47.647979 MacOS is receiving retransmitted reply to Message 7 from Zyxel USG
    

    As MacOS is not getting a reply from the USG it decides to resend the CREATE_CHILD_SA message after 1 second.

    About 500ms later the reply is received and processed. About 20ms later MacOS is also receiving the retransmitted reply.

    As we see the retransmission of the request as well as the retransmission of the reply we must assume the the USG is receiving both of the requests and that it takes about 1.5s to reply to the request. Let's keep that in mind for the failed case.

    Mac OS Log: Failed rekey:

    2020-04-05 17:50:47.720484 MacOS is sending CREATE_CHILD_SA with Rekey SPI = 4F1F5955088E4C12 (Message 3)
    2020-04-05 17:50:48.206174 Received PFKey Message associated with DB...
    2020-04-05 17:50:48.xxxxxx Multiple messages about "IKEv2 Session Database" 
    2020-04-05 17:50:48.212862 MacOS reports "last request still outstanding"
    --> The connection get's dropped!!!!!!!
    

    We can see that the order of dthe debug output is definitely different than it was in the successful case. The messages about PFKey and IKEv2 Session Database were also present in the successful case. But they appeared after the reply for the CREAE_CHILD_SA message had been received. I'm asking myself if there's a problem with the message ordering. By knowing that it takes approx 1.5secons for the USG to reply to the CREATE_CHILD_SA message, we notice that in this case 500ms after sending the CREATE_CHILD_SA message we get the report about last request still outstanding. We don't see any retries or anything.

    In the USG's log file I can see that the CREATE_CHILD_SA-request has been received....

    I'm not a pro when it comes to debugging VPN's. But I hope that somebody from Zyxel can look deeper in it by using this analysis.....

  • Zyxel_Jerry
    Zyxel_Jerry Posts: 1,324  Zyxel Employee
    Zyxel Certified Network Administrator - Security Zyxel Certified Sales Associate 50 Answers 1000 Comments

    Hi @thom_ko

    Yes, I’ve checked your configuration file on Apr.5 .

    I have setup the same configuration as yours and set PFS to none on phase2,

    Just setup Host Domain Name on certificate, however after 8 minutes the connection drop.

    As your previous mentioned, it works fine before IOS/MAC OS upgrade,

    I think their might be some behavior changed after the upgrade on IOS/MAC OS.

    We’ll look into this case and try to find a workaround on this.