Nas326 Clean CLI Shutdown

Duracell
Duracell Posts: 13  Freshman Member
edited April 2019 in Personal Cloud Storage
I have NAS326 Firmware V5.21 (AAZF.3).  On this I run a small job from a USB stick that monitors an IP address range and, if no boxes are present for 30 minutes, tries to close the server down.

This worked fine on my NAS325v2 using the "poweroff" command. On the NAS326 the execution of the poweroff command leaves the unit idle/inaccessible with all LEDs lit. On restart it does a full disc resync of a 6tb raid! So the box isn't closing cleanly.

Does anybody know which command(s) to use to close the box cleanly. 

I've found the halt command, halt_wrapper script (calls halt) and the /etc/init.d rc.shutdown script (but does this do a shutdown or just the fore work).

Intriguingly, I note from the WEB Interface scripts that there is reference to a shutdown command "/ck6fup6/system.main/shutdown" that is part of the zysh command set. BUT /bin/zysh is a link to /sbin/zyshclient which does not appear to exist?  I did do a su && find . -name zyshclient but it found nowt :-(

Any suggestions would be welcome.  Problem for me is that a 6tb resync = 12hours!  So "playing" can be a tad tiresome.

#NAS_Apr_2019

Accepted Solution

  • Duracell
    Duracell Posts: 13  Freshman Member
    Answer ✓
    Mijzelf
    OK, tested your code and you wont be surprised that on the 326 it worked super.  Still miffed as my code should have worked so the 326 has a sticky usb dongle!  I'll keep digging :-)
«1

All Replies

  • Mijzelf
    Mijzelf Posts: 2,815  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    Does the resync problem not popup if you shutdown the box from the webinterface?

    Both poweroff and halt are supposed to close the box cleanly. They both trigger /sbin/init to shutdown, which executes /etc/init.d/rc.shutdown, and, when completed, tells the kernel to halt or poweroff. (On the 325 that boils down to the same action, power off the box. Don't know for sure for the 326).
    If your raid array gets damaged by that action, then I think the power is cut before the disks have flushed their caches.
    In that case it could help to add a sleep to the end of rc.shutdown. If you have serial access, you can look if rc.shutdown spits any errors. When it fails to unmount the raidarray it will not be flushed. Failure can be caused by open files. Open files can be caused by processed which for some reason resist to be killed.

  • Duracell
    Duracell Posts: 13  Freshman Member
    edited April 2019
    Mijzelf:  Thanks for your comments; most interesting.
    Yup the web interface doesn't appear to cause a resync (or hasn't yet).  So I think that the poweroff is certainly not closing cleanly - and it's something to do with my code or the usb stick inserted
    What I've got installed is a usb_key_func.sh script that starts a pseudo daemon.  Simply a loop checking (ping) an IP address range.  If non of the pings return true then, after 15 minutes (so the box should be stable), it issues a poweroff command.
    The usb_key_func.sh script simply calls a second script that writes the pseudo daemon script to /tmp.  This is then executed in background while the full fs mount takes place. The script is running as root.
    It's all based around the Entware-ng-stick idea so I thought I'd be on solid ground :-) And it works fine on a NASA325v2.  But not the NAS326. 
    I think the sleep idea sounds interesting. Certainly, giving the box more time to clear down seems sensible And since it is an inactivity shutdown giving it 2 minutes to do it's business is no loss to me!
    I also noted, looking through the Zyxel scripts, that they call halt - so maybe I'll try that as well

    Cheers again

  • Duracell
    Duracell Posts: 13  Freshman Member
    OK so I changed my "daemon" script to call "halt &" instead of "poweroff" and then exited the script with "exit 0".  Further I added a "sleep 60" to rc.shutdown.
    The result was that after the 15 mins inactivity being checked for the script executed "halt &" "exit 0" and left itself idle with all LEDs steady.  On reboot it started a resync!!!
    Prior to my edits I tested the box WITHOUT the usb stick containing the usb_key_func.sh executing an "ssh root:192.168.11.10 halt" command and this closed the box down without issue?

    Driving me potty this one!
  • Mijzelf
    Mijzelf Posts: 2,815  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    with all LEDs steady.

    I overlooked that gem in your first post. If the leds are still on, apparently the box fails to poweroff. Are these leds all on when you start the shutdown? Could it be possible that the box is simply hanging in rc.shutdown?

    On my 520 rc.shutdown basically stops the packages, kills system daemons, (smb, ftp, ...) unmounts everything, and set the NIC's to 10mbit.

    The unmounting is done in the sequence which /proc/mounts show. Is your usb stick listed before the raid array? In that case the array is still mounted if for some reason unmount stick fails.

    Can you unmount the stick manually? Can you share the script in usb_key_func.sh?

    BUT /bin/zysh is a link to /sbin/zyshclient which does not appear to exist?

    Are you sure about that? On my 520 it is available. But it's one of the files which at boot is copied from /ram_bin/sbin/ to /sbin/. (And /ram_bin/ is the mountpoint of a big file which is located on md0.) This happens after usb_key_func.sh has run, so theoretically that script can prevent this from happening.


  • Duracell
    Duracell Posts: 13  Freshman Member
    edited April 2019
    Mijzelf
    Thanks again. Sensible comments again.
    Yup all lights on and steady but unable to access the box as the services are closed (ssh, HTTP, telnet!) Not even ping responds which means I have to use the power button. So yup, it's not a clean close.
    I did some tests and if I initiate a halt via a telnet script from my pc "(echo telnet, echo user, echo pass, echo su, echo pass, echo halt ) | telnet". It closes fine!
    This leads me to believe that, like you say, that pesky USB stick is causing this?
    I checked for the zyshclient again. No usb inserted and it's not there! Curious as the symbolic link from /bin/zysh IS still present? Couldn't find the file anywhere? But it IS present on my NAS325v2!
    I checked my /proc/mounts:
    /proc $ cat mounts
    rootfs / rootfs rw 0 0
    /proc /proc proc rw,relatime 0 0
    /sys /sys sysfs rw,relatime 0 0
    devpts /dev/pts devpts rw,relatime,mode=600 0 0
    ubi6:ubi_rootfs2 /firmware/mnt/nand ubifs ro,relatime 0 0
    /dev/md0 /firmware/mnt/sysdisk ext4 ro,relatime,data=ordered 0 0
    /dev/loop0 /ram_bin ext2 ro,relatime 0 0
    /dev/loop0 /usr ext2 ro,relatime 0 0
    /dev/loop0 /lib/security ext2 ro,relatime 0 0
    /dev/loop0 /lib/modules ext2 ro,relatime 0 0
    /dev/loop0 /lib/locale ext2 ro,relatime 0 0
    /dev/ram0 /tmp/tmpfs tmpfs rw,relatime,size=5120k 0 0
    /dev/ram0 /usr/local/etc tmpfs rw,relatime,size=5120k 0 0
    ubi2:ubi_config /etc/zyxel ubifs rw,relatime 0 0
    /dev/mapper/vg_29611905-lv_0f532478 /i-data/0f532478 ext4 rw,noatime,quota,usrquota,stripe=16,data=ordered 0 0
    /dev/mapper/vg_29611905-vg_info_area /mnt/vg_info_area/vg_29611905 ext4 rw,relatime,stripe=16,data=ordered 0 0
    /dev/mapper/vg_29611905-lv_0f532478 /usr/local/apache/htdocs/desktop,/pkg ext4 rw,noatime,quota,usrquota,stripe=16,data=ordered 0 0
    /dev/mapper/vg_29611905-lv_0f532478 /usr/local/mysql ext4 rw,noatime,quota,usrquota,stripe=16,data=ordered 0 0
    /dev/sdb1 /e-data/5224b721e353f0dc5c4cb66897165fc4 vfat rw,relatime,uid=99,fmask=0000,dmask=0000,allow_utime=0022,codepage=437,iocharset=utf8,shortname=mixed,errors=continue 0 0
    configfs /sys/kernel/config configfs rw,relatime 0 0
    One difference with the USB in (highlighted) and that's second from bottom? I would have thought umount of i-data would have removed the raid partitions?

    And just to be complete our rc.shutdowns sound similar:
    /etc/init.d # cat rc.shutdown
    #!/bin/sh
    . /etc/profile
    echo -e "\033[033m- `basename $0` start -\033[0m"
    # set system LED to fast blink green
    /sbin/setLED SYS WHITE FAST_BLINK
    # shutdown all zypkgs
    /etc/init.d/zypkg_controller.sh stop
    /etc/init.d/zypkg_controller.sh release_env
    # try to umount FS without block waiting
    /bin/umount -f -a -t nfs,smbfs,cifs &
    sleep 1
    # pwr_resume
    if [ ! -e /etc/zyxel/storage/pwron.status ]; then
            /sbin/i2cset -y 0x0 0x0a 0x0a 0x0007 w
    #       pwr_resume disable
    fi
    # clear the AF flag of rtc
    #/sbin/rtcAccess clearAF        # no this option in STG-328
    # stop service
    #not to run/mount again
    /bin/killall -9 app_wd myhotplug 2>/dev/null
    #ftp,samba
    /bin/killall pure-ftpd smbd nmbd zylogger 2>/dev/null
    #kill
    other=`ps|grep -Ev "\[.*\]"|grep -v init|grep -v "\-sh"|grep -v ps|grep -v PID|grep -v ${0##/*/}|grep -v grep`
    echo "start kill." > /dev/console
    echo "${other}" > /dev/console
    kill `echo "${other}"|awk '{print $1}'`
    sleep 5
    #kill -9
    other=`ps|grep -Ev "\[.*\]"|grep -v init|grep -v "\-sh"|grep -v ps|grep -v PID|grep -v ${0##/*/}|grep -v grep`
    echo -e "\n ${other}" > /dev/console
    kill -9 `echo "${other}"|awk '{print $1}'` 2>/dev/null
    sleep 5
    #backup startup-config.conf
    #cp /etc/zyxel/conf/startup-config.conf /etc/zyxel/conf/startup-config.conf.bk
    #for iscsi
    [ -e "$(which target.init)" ] && target.init stop
    # swapoff
    /sbin/swapoff `cat /proc/swaps|awk '{print $1}'|grep -v Filename`
    # umount
    /bin/umount `cat /proc/mounts|grep /dev/md|awk '{print $2}'` 2>/dev/null
    /bin/umount `cat /proc/mounts|grep /dev/sd|awk '{print $2}'` 2>/dev/null
    /bin/umount `cat /proc/mounts | grep "/dev/mapper/vg_" | awk '{print $2}'` 2>/dev/null
    /usr/sbin/vgchange -an
    /bin/umount /etc/zyxel
    mdadm -Ss
    # debug info
    cat /proc/mounts > /dev/console
    # umount config/rootfs partitions
    /bin/umount `cat /proc/mounts|grep ubi|awk '{print $2}'` 2>/dev/null
    ubidetach -m ${CONFIG_MTD_NUM}
    #umount /firmware/mnt/nand
    ubidetach -m `${INFO_PRINTENV} sysimg_mtd_1 | awk -F= '{print $2}'`
    ubidetach -m `${INFO_PRINTENV} sysimg_mtd_2 | awk -F= '{print $2}'`
    # set PHY speed to 10M to reduce power consuming
    ethtool -s egiga0 autoneg off speed 10 duplex half
    #ethtool -s egiga1 autoneg off speed 10 duplex half             # no egiga1 in STG-328
    ethtool egiga0
    #ethtool egiga1         # no egiga1 in STG-328
    echo -e "\033[033m- `basename $0` end -\033[0m"

    Finally, even with the stick installed a BUTTON shutdown is fine ... so the script to close is the issue?
  • Duracell
    Duracell Posts: 13  Freshman Member
    Mijzelf
    I'll be candid and post the script that the usb_key_func.sh starts:
    #!/bin/sh
    #
    # This is the USER executeable script for the universal_usb_key_func
    # autorun script facility on Zyxel NAS server.
    # It is called by usb_key_func.sh and starts user Sleep On Idle Daemon.
    #
    #
    SCRIPTVERSION="20190225"
    #
    ########################################################################
    # Functions
    payload()
    {
        # This is the work load section
        # It will be called by stage 2 from the copy of this script running in background
        # It is this function that starts SOID
        touch /tmp/usbscript.sh.log
        echo $(date) usbscript.sh--STARTS.IN.TMP > /tmp/usbscript.sh.log
        /tmp/soid &
        echo $(date) usbscript.sh--ENDS.IN.TMP >> /tmp/usbscript.sh.log
    }
    poll()
    {
        # Wait for the harddisk to be mounted
        while [ 1 ]; do
            if cat /proc/mounts | grep /dev/md0 ; then
                return 0
            fi
            sleep 5
        done
    }
    createsoid()
    {
        # This function sets up the SOID daemon in the tmp directory
        touch /tmp/soid
        chmod +x /tmp/soid
        cat > /tmp/soid << EOF
    #!/bin/sh
    # Note on initial startup the clocks have yet to be synchronised and
    # carry the wrong time. So sleep 5 minutes for clocks to correct
    sleep 300
    touch /tmp/soid.log
    echo \$(date) SOID_RUNNING > /tmp/soid.log
    # Variables
    STARTTIME=\$(date +%s)
    ENDTIME=\$(date +%s)
    DIFF=0
    NETIP="192.168.11."
    FULLIP="0"
    # Just loop while running
    while true; do
        DEVICEON=0
        HOSTADDR=30
    # Loop through ips       
        while [ \$HOSTADDR -lt 60 ]; do
            FULLIP=\$NETIP\$HOSTADDR
            if [ "\` ping -c 2 \$FULLIP | grep "2 packets received" \`" ]; then
                echo \$(date) SOID_FOUND: \$FULLIP >> /tmp/soid.log
                DEVICEON=1
                STARTTIME=\$(date +%s)
                ENDTIME=\$(date +%s)
            fi
            HOSTADDR=\$((HOSTADDR+1))
        done
    # Test if anything there
        if [ \$DEVICEON -eq 0 ]; then
            echo \$(date) SOID_NODEVICE >> /tmp/soid.log
            ENDTIME=\$(date +%s)
        fi
        DIFF=\$((\$ENDTIME-\$STARTTIME))
        if [ \$DIFF -gt 900 ]; then
            halt &
            exit 0
        fi
    # 5 minutes wait to loop   
        sleep 300
    done
    EOF
    }
    StickMain()
    {
        # This is the main function
        # Stage 1 creates the SOID Daemon and makes a copy of this script so OS loading can be continued
        # The copy will be started in background and will initiate the SOID daemon
        local command=$1
        shift
       
        case "$command" in
        stage1)
            # leave a trace
            echo usb_key_func_RUNS > /mnt/parnerkey/usb_key_func.log
            # Create soid
            createsoid
            # Copy script to /tmp
            cp $0 /tmp/usbscript.sh
            # Start it in background
            /tmp/usbscript.sh stage2 >/dev/null 2>&1 &
            # continue /etc/init.d/rcS
            exit 1
            ;;
        stage2)
            poll
            payload
            ;;
        *)
            echo "This script is for internal use of the SOID daemon"
            ;;   
        esac       
    }
    StickMain "$@"

    As a telnet session halt, an ssh root@Nas halt, a (echo blah blah) | telnet and a button push all close OK.  HOLD ON .... just did a button close with USB in and it's hung!!! But no resync after a power switch off! So the i-data's were down by then.
    This is getting weird - I might try another USB stick to see if that does the same?
  • Mijzelf
    Mijzelf Posts: 2,815  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    Not even ping responds

    I *think* this means it's not a hanging umount. Ping is handled at kernel level, or even in the NIC itself, depending on hardware capabilities. So it's not dependent on a daemon. A kernel panic?

    This is getting weird - I might try another USB stick to see if that does the same?

    It won't hurt.

    I don't see anything in your script which can cause this strange behaviour. I see no open files left on the stick.

    Your stick is /dev/sdb. Isn't that strange? I assume you have 2 disks, so that is sda and sdc? Normally a stick in a 2 disk system is either sda or sdc, depending on which bus is scanned first.

    Are both disks healthy, according to smartctl?





  • Duracell
    Duracell Posts: 13  Freshman Member
    Mijzelf
    Checked the smart info and no reported problems. Pretty standard with both stated as good.
    I didn't see the sdb1 for the usb (getting old!). So I did a little test and put the stick in after boot ... it came up as sdc1? There is only one stick in at a time? I tried all three usb slots but always sdc1. Odd.
    Give me a day or two and I'll try another stick to see what happens.

  • Mijzelf
    Mijzelf Posts: 2,815  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    I tried all three usb slots but always sdc1. Odd.

    No. It's to be expected. By default a new scsi device gets the first available free sd node. (And all sata disks and usb mass storage devices use some scsi compatibility layer).

    If you remove a stick the node is freed, and when you plug it in again the node is reused. It doesn't matter how you connect it.

    I was surprised that both harddisks didn't have contiguous nodes, as expected by timing at boot.




  • Duracell
    Duracell Posts: 13  Freshman Member
    edited April 2019
    Well I'm stumped. Tried another USB stick and the same result.  The box was left with all LEDs lit but no way (telnet, http, etc) to access it.  Obviously the services had closed but the box did not fully shutdown.
    So it was closed by holding the power button until it died. Then, of course, on reboot the discs started to resync.
    I can still close the box successfully via a telnet session calling "halt"? So something isn't umounting. But as the box rewrites it's file store I cant think how to get logs saved so I can see what is happening.

Consumer Product Help Center