Dead Zyxel NSA211after deleting partition - is it possible to boot it from a USB rescue stick?

Options
OrlandoScarlet
OrlandoScarlet Posts: 9  Freshman Member
edited July 2019 in Personal Cloud Storage
Trying to recover from a moment of madness...

Had a working NSA221 that I was able to access via the UI.

Inserted two new drives that I'd previously used in a Linux Mint system to check for bad blocks. Was expecting to create a RAID 1 via the Web UI on the NAS but somehow the UI was only offering to create a JBOD disk from Disk 1 (Drive 2 was greyed out).

I opened the Telnet backdoor and accessed the Zyxel via PuTTY, then examined fdisk output. Comparing /dev/sda and /dev/sdb there seemed an unexpected partition on sdb that I used fdisk to delete.

Since re-booting the NAS I am not longer able to access the Web UI, so I presume there must have been some small flash drive used for boot whoise partition I have mistakenly deleted.

Is there any way to boot from a USB stick and perhaps attempt to recreate the deleted partition or re-install the Zyxel firmware from it? Or any alternative NAs software I can install or run from a usb stick to make use of the unit?

I seem to be able to find the archive of Zyxel firmware for the  NSA221 and other resources such as the archive of zyxel.nas-central.org and am fairly Unix literate but am not really finding steps on creating a Rescue disk for the NSA221.

Is anyone able to give me some pointers so I can recover?

Cheers,
Orlando Scarlet


#NAS_Jul_2019
«1

All Replies

  • Mijzelf
    Mijzelf Posts: 2,618  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    The NAS can't boot from an USB stick. But the firmware can run a script from an USB stick before it accesses the disks. This gives a possibility to start a telnet daemon.
    Such a stick can be found here, one of the universal_usb_key_func zipfiles.

  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    Thanks so much for the pointer.

    I haven't had much luck so far, so I'm wondering if I'm following the README correctly.
    Here's what I've done:
    • Downloaded universal_usb_key_func-2013-03-21.zip and expanded it to a "usb_key_func-2013-03-21" directory on my hard drive
    • Formatted a 32Gb thumb drive to FAT32
    • Copied all files from "usb_key_func-2013-03-21" to the root directory of the thumb drive
    • Copied usb_key_func.sh.network_telnet_stop within the USB drive to become usb_key_func.sh.2
    • Modified the file to change the IP address of the ifconfig call from 192.168.0.33 to 192.168.31.150 to match my network range (and to set the IP that the NAS was using before).
    • Made the changes in TextPad ensuring that the UNIX file mode was preserved (and using od to confirm):
      $ od -a usb_key_func.sh.2
      0000000   #   !   /   b   i   n   /   s   h  nl  nl   /   s   b   i   n
      0000020   /   i   f   c   o   n   f   i   g  sp   e   g   i   g   a   0
      0000040   :   1  sp   1   9   2   .   1   6   8   .   3   1   .   1   5
      0000060   0  sp   n   e   t   m   a   s   k  sp   2   5   5   .   2   5
      0000100   5   .   2   5   5   .   0  sp   u   p  nl  nl   t   e   l   n
      0000120   e   t   d  sp   -   l  sp   /   b   i   n   /   s   h  nl   /
      0000140   b   i   n   /   s   h  nl   e   x   i   t  sp   1  nl
      0000156
      
    • Inserted into the USB slot at the back of the NSA221 and powered on the unit
    • The activity LED flashes constantly
    • The USB LED lights up
    • The led on the USB Flash drive flashes six times in total (three times before and after an interval of a couple of seconds).
    • The Activity LED on the RJ45 connector at the back of the unit flashes (the corresponding column of lights for the switch on my desk also flashes)

    However I can't, even allowing several minutes, access the NAS from a PuTTY Telnet session or ping the 192.168.31.150 ip address. I also can't see the device in the list of unit's connected to network.

    I think I've followed the instructions properly, so am thinking my earlier action of deleting the wrong partition seems to have disrupted the boot process before the point where it looks for the script on the USB.

    I'm not sure what the boot process does but looking at "Bootlog_NSA-221" there seems a lot of activity before it gets to the first reference to usb_key_func.sh.2.

    I thought I'd enabled logging for PuTTY sessions to go back and see exactly what I've done (I know it was only to delete one partition via fdisk but as time passes I'm not 100% sure if the was /dev/sdb1 (as I originally thought) but /dev/sda1 -- is there something critical to the boot process for one but not other of those disks?

    I'm wondering if there's a way to get serial output from the NSA221 console to see more about what's going on?

    Any other thoughts would be very welcome!

    Cheers,
    Orlando
  • Mijzelf
    Mijzelf Posts: 2,618  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited July 2019
    Options
    I suggest you to try with the 2015 zipfile. The main difference is that I added NAS5xx and NSA326, but I also remember there was some checksum error for some box. Just can't remember which one, and the forum where it was reported is down.
    There is a known timing issue with the usb_key_func sticks, but in that case it wouldn't have accessed the stick at all, I think. A 221 indeed checks the stick twice, as you can read here, so your observation is right.

    <quote>I'm wondering if there's a way to get serial output from the NSA221 console to see more about what's going on?</quote>

    AFAIK yes. I never saw the mainboard of a 221 or a picture of it, but all ZyXEL devices I looked at had the same serial port. (Including 3 different modem/routers). For the 325 it's documented here. If you can find that 3space1 pins on your 221, you can assume it's a 3.3V TTL serial port.

    <quote> is there something critical to the boot process for one but not other of those disks?</quote>

    Not that I'm aware of. The box boots from flash, and accesses the disks equally. But you can simply exchange the disks, the sequence doesn't matter. The disks are recognized at the GUID of the internal raid array, Not their physical position.

  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    Thank you for your continued support, very much appreciated.

    I tried with the 2015 zipfile, following the same approach I outlined earlier, but unfortunately got the same exact same results I reported in my last update. I explored the links you provided and did some additional experimentation but sadly to no avail.

    I think it will prove interesting (and hopefully enlightening!) to go down the path of connecting a serial cable to see the console output and I see the following connectors on the board that I believe align with the 3space1 connection you mentioned for the 325:

    The dedicated FTDI cables to a SIL seem quite expensive so I'm hoping that the following is suitable: JANSANE PL2303TA USB to TTL Serial Cable. The cable is for a Rasberry Pi, which I believe has 3.3V TTL pins on it's header. The details for the cable say "this usb debug cable can be configured for either v5 or v3.3 power output. Built-in PL2303 chipset has an on-board DC-DC converter."

    Per comments on using for
    3.3V TTL: "The wiring is designed for 3.3V TTL Serial connection at RXD and TXD. The Wiring colors as follows: GREEN = TXD, WHITE = RXD, BLACK = signal Ground. The RED is VUSB(+5V) which IS NOT needed for Serial Connection."

    Based on the above I intend connecting the TX,RX and GND leads from the cable to the respective pins but leaving the 3.3V pin disconnected.

    Let me know if you think I'm getting ahead of myself and need to go with a more dedicated 3.3V cable.

    Cheers,
    Orlando







  • Mijzelf
    Mijzelf Posts: 2,618  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    That cable should be fine. And it's never a good idea to connect the Vcc, unless you have to power the box through the cable.
    BTW, if you have a RPi, you can also use it's serial port.

  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    Thanks again - I've ordered the cable and will let you know what the console output tells us once it arrives.

    Cheers,
    Orlando

  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    For some reason, the first set of cables didn't work for me, so I ended up getting the following: CP2102 USB 2.0 to TTL UART 6Pin Serial Converter with Cables

    I struggled with that for a while, even resorting to getting my multi-meter out to confirm the expected pin-outs on the NAS before resolving it the way I had it wired originally (the connectors seemed very loose and I'm unsure the cables crimped a firm enough connection).

    Anyway, I have this working now and thought I would post the connections for the benefit of others:

    wiring at CP2102Wiring at NAS

    The following is how the CP102 shows up in Device Manager (after the correct drivers were installed):



    The following are the settings in PuTTY (COM port will vary, depending on which USB port you connect to):



    Now that I have access to the console, I'll do a little research and capture a couple of logs to share.

    Cheers,
    Orlando
  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    I've done a quick initial review of the logs, which I have attached, and am a little puzzled by what I see.

    I've tried booting the NAS two ways:
    1. Without the original drives mounted (log file: zyxel_console_log_no-disks.txt)
    2. With the original drives (unmodified) re-inserted (log file: zyxel_console_log4_disks.txt)

    I've done a quick review against: Some_information_from_slash_proc_(NSA-221)

    In comparison to those log messages, here's where things seem to come off the rails when booting with no disks:
    ...
    sd 2:0:0:0: [sda] Attached SCSI removable disk
    sd 2:0:0:0: Attached scsi generic sg0 type 0
    
    umount: can't umount /zyxel/mnt/NAND: Invalid argument						<====
    bsname}: no internal disk available
     Flag_HD_Exists = 1
    WARNING: No valid partition on HDD or no HDD plugged!
    WARNING: No valid partition on HDD or no HDD plugged
    Booting from ramdisk
    gzip: /zyxel/mnt/NAND/sysdisk.img.gz: No such file or directory
    mount: mounting /dev/loop0 on /ram_bin failed: Invalid argument
    *** ERROR: Can not mount system image, file is invalid
    killall: udhcpc: no process killed
    mount: mounting /ram_bin/usr on /usr failed: No such file or directory
    mount: mounting /ram_bin/sbin on /sbin failed: No such file or directory
    mount: mounting /ram_bin/bin on /bin failed: No such file or directory
    mount: mounting /ram_bin/lib on /lib failed: No such file or directory
    tar: can't open '/ram_bin/tmp.tar.gz': No such file or directory
    cp: can't stat '/ram_bin/var/*': No such file or directory
    cp: can't stat '/ram_bin/home/*': No such file or directory
    cp: can't stat '/ram_bin/mnt/*': No such file or directory
    cp: can't stat '/ram_bin/etc/*': No such file or directory
    cp: can't stat '/bin/makedev.sh': No such file or directory
    /etc/init.d/rcS.221: line 307: ./makedev.sh: not found
    /etc/init.d/rcS.221: line 309: /etc/init.d/rcS2: not found
    
    Please press Enter to activate this console. sd 3:0:0:0: [sdb] 60555264 512-byte hardware sectors (31004 MB)
    sd 3:0:0:0: [sdb] Write Protect is off
    sd 3:0:0:0: [sdb] Assuming drive cache: write through
    sd 3:0:0:0: [sdb] 60555264 512-byte hardware sectors (31004 MB)
    sd 3:0:0:0: [sdb] Write Protect is off
    sd 3:0:0:0: [sdb] Assuming drive cache: write through
     sdb: sdb1
    sd 3:0:0:0: [sdb] Attached SCSI removable disk
    sd 3:0:0:0: Attached scsi generic sg1 type 0
    ------------------
    --- HANGS HERE ---
    ------------------
    
    My expectation is that this should still bring up the UI to inspect in administration mode (correct me if I'm getting ahead of myself...)

    When booting with the original drives installed I see the NAS trying to boot from disk first:
    ...
    OS type: Linux
    Block size=1024 (log=0)
    Fragment size=1024 (log=0)
    124928 inodes, 498688 blocks
    0 blocks (0%) reserved for the super user
    First data block=1
    Maximum filesystem blocks=524288
    61 block groups
    8192 blocks per group, 8192 fragments per group
    2048 inodes per group
    Superblock backups stored on blocks:
            8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
    /dev/sda1 /zyxel/mnt/sysdisk ext2 ro 0 0
     Flag_HD_Exists = 0
    Boot from disk
    System disk image does NOT exist on HDD! Extract new firmware from NAND flash ...
    bsname}: skip changing partition name because parted command not available yet
    Filesystem label=
    OS type: Linux
    Block size=1024 (log=0)
    Fragment size=1024 (log=0)
    124928 inodes, 498688 blocks
    0 blocks (0%) reserved for the super user
    First data block=1
    Maximum filesystem blocks=524288
    61 block groups
    8192 blocks per group, 8192 fragments per group
    2048 inodes per group
    Superblock backups stored on blocks:
            8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
    /dev/sda1 /zyxel/mnt/sysdisk ext2 rw 0 0
    gzip: /zyxel/mnt/NAND/sysdisk.img.gz: No such file or directory
    Checksum of sysdisk.img : d41d8cd98f00b204e9800998ecf8427e
    Checksum from INFO  : a54c703439224f1ab395b24004edc395
    Checksum of sysdisk.img does NOT match!
    WARNING: No valid partition on HDD or no HDD plugged
    Booting from ramdisk
    ...
    

    Within the above, I see a warning that the checksum for sysdisk.img is not as expected, which I also confirm from the following:
    / # cat /zyxel/mnt/info/image_checksum
    a54c703439224f1ab395b24004edc395 sysdisk.img
    
    / # md5sum /zyxel/mnt/sysdisk/sysdisk.img
    d41d8cd98f00b204e9800998ecf8427e  /zyxel/mnt/sysdisk/sysdisk.img
    

    It fails to boot from disk due to "WARNING: No valid partition on HDD or no HDD plugged", which does not match my recollection of the state of the disks (I had taken one disk out to insert into a desktop, mount there to Linux and take a backup copy, for safety, to a further disk).

    Q: Is the difference in chksum on sysdisk.img against /zyxel/mnt/info/image_checksum enough to prevent it from continuing the boot against the disk?

    After the above it tries to boot to RAMDISK and fails with the same result and messages as when there are no disks present

    I will start by inserting the disks into a Linux desktop to inspect status to see if/how either differs from my recollection. It seems that if the original disk contents still exist then the NAS should be able to boot without going to RAMDISK (which might get the system back usable enough for me to undo whatever madness I created previously with fdisk).

    As I can now reach a prompt I will inspect things a little better, so will later post a further update of findings when I've had a chance to explore further.

    I'm currently trying to locate the script that containing the chksum test on sysdisk.img to better understand the logic there, to see if that's why it no longer boots from the original disks.

    One other thing I've tried exploring quickly was getting 'fdisk -l' output to explore if I can quickly redefine the partition I believe deleted to see if that helps. In that direction I've hit an immediate problem as I'm getting the error "fdisk: can't open '/dev/null': No such file or directory":

    / # fdisk -l
    fdisk: can't open '/dev/null': No such file or directory
    
    / # ls -l /dev
    brw-r--r--    1 0        0           7,   0 Apr  8 01:29 loop0
    

    Any quick pointers on any of my above ramblings or any better strategy on recovering things would be very welcome!

    Cheers,
    Orlando
  • Mijzelf
    Mijzelf Posts: 2,618  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    Right. I think I know what is going on. A ZyXEL nas (with exception of the 220) has a part of the firmware (mainly the webinterface) compressed in flash. When you install a disk, a small partition (512MB?) is created, a filesystem is created, and that compressed part is extracted to that filesystem, as sysdisk.img. That file is actually an ext2 filesystem, which is loopmounted somewhere, and using some bindmounts it's added to the rootfilesystem.
    When no disk is available, the compressed file is extracted to a ramdisk, to be able to use the webinterface.

    On boot the firmware checks if the checksum of sysdisk.img is equal to the known checksum of the compressed flash file, if not, a fresh one is extracted. I attached the script which does this (/etc/init.d/rcS.221)

    For some reason your compressed flash file (sysdisk.img.gz) is gone, or corrupted. That was no problem, until you deleted sysdisk.img on disk, which made the box unresponsive, as there is no copy of the webinterface anymore.

    The flash which contains that file is on a 221 an internal usb disk (in contrast with all other ZyXEL NASses, where it's some raw NAND flash). Don't know if that is a recognizable disk, or if it's soldered on the PCB. I've never seen a 221.

    To get the box running, you'll have to put a valid sysdisk.img.gz on that usb disk. Maybe you can simply put it on an external usb thumb disk, the bootscript seems to loop through all available usb disks. I've extracted that file from fw 4.41, and put it here.



  • OrlandoScarlet
    OrlandoScarlet Posts: 9  Freshman Member
    Options
    Hi Mijzelf,

    Thanks for the continued help.

    I reviewed the script you provided which has really helped me relate to the log messages I was seeing during boot.

    I see the following block:
    ### Check USB key
    USB_CHECK_TIMEOUT=10
    check_time=0
    echo -n "INITRD: Trying to mount NAND flash as Root FS"
    while sg_map -x -i | grep "${NAND_DISK}" > /dev/null 2>&1
    	[ $? -ne 0 ] && [ $check_time -lt $USB_CHECK_TIMEOUT ]
    do
    	echo -n "."
    	check_time=$(($check_time+1))
    	sleep 1
    done
    
    which I believe relates to the following log fragment:
    ...scsi 2:0:0:0: Direct-Access     ZyXEL    USB DISK 2.0     PMAP PQ: 0 ANSI: 0 CCS
    The three dots mean that "sg_map -x -i" is discovering the list of SCSI drives before the 10 second retry limit.

    Then we enter the following code block:
    ### check upgrade key
    any_usb=`sg_map -x -i|grep -v " 0 0 0 0"|grep -v " 1 0 0 0"|grep -v "${NAND_DISK}"|awk '{print $7}'`
    echo "${any_usb}"
    if [ -n "${any_usb}" ]; then
    	/bin/mkdir /mnt/parnerkey
    	for usb in ${any_usb}
    	do
    		echo "mount upgrade key"
    		mount "${usb}"1 /mnt/parnerkey
    		ls -la /mnt/parnerkey | grep "NSA221_fw"
    		FW=$?
    		ls -al /mnt/parnerkey | grep "NSA221_pwr_func_check"
    		PWR=$?
    		if [ $FW == 0 ] || [ $PWR == 0 ] ; then
    			/sbin/check_key /mnt/parnerkey/NSA221_check_file
    			if [ $? == 0 ] ; then
    				echo "========  Start USB Upgrade Key  ========"
    				/mnt/parnerkey/usb_key_func.sh
    				test $? -eq 0 && exit 0
    			fi
    			umount /mnt/parnerkey
    			exit 1
    		else
    			umount /mnt/parnerkey
    		fi
    	done
    	rmdir /mnt/parnerkey
    fi
    
    which maps to the remaining output following the three dots:
    scsi 2:0:0:0: Direct-Access     ZyXEL    USB DISK 2.0     PMAP PQ: 0 ANSI: 0 CCS
    scsi 3:0:0:0: Direct-Access              USB DISK 2.0     PMAP PQ: 0 ANSI: 6
    sd 2:0:0:0: [sda] 247808 512-byte hardware sectors (127 MB)
    sd 2:0:0:0: [sda] Write Protect is off
    sd 2:0:0:0: [sda] Assuming drive cache: write through
    sd 2:0:0:0: [sda] 247808 512-byte hardware sectors (127 MB)
    sd 2:0:0:0: [sda] Write Protect is off
    sd 2:0:0:0: [sda] Assuming drive cache: write through
     sda:
    sd 2:0:0:0: [sda] Attached SCSI removable disk
    sd 2:0:0:0: Attached scsi generic sg0 type 0
    
    umount: can't umount /zyxel/mnt/NAND: Invalid argument
    bsname}: no internal disk available
    

    The lines containing "scsi X:0:0:0:0: ..." seem the output assigned to the "any_usb" variable.

    I believe:
    • "scsi 2:0:0:0" is the INTERNAL USB you identified (I assume the "Zyxel" within the line is the label assigned to the disk?)
    • "scsi 3:0:0:0" is the USB I inserted (which contains files from 2015 zipfile)
    I'm not 100% sure of the last bullet since the label on my thumb drive is "USB DISK" which should show up in the line (I'll change the label to something more distinct so it's easier to tell if I'm right).

    However, it also strikes me the format of the output in "any_usb" is different to that I expected -- I thought (by picking off a single column via the "awk { print $7 }", it would just be a single device value, like "/dev/sda"??

    I can escape the hang at the end of the failed boot to get to the busybox prompt, so I'll run the "sg_map" command to see what the output should look like.

    The other thing that bothers me is that the log output never shows the phrase "mount upgrade key", which should be seen once for each iteration of the loop in the above code block.

    Then the last but one line logged (the umount failure) seems to come from the code block following the one discussed above:
                    ...
    		if [ -f ${NAND_PATH}/sysdisk.img.gz ]; then
    			echo "Find compressed sys image NAND"
    			break
    		else
    			umount ${NAND_PATH}    <====
    		fi
    
    The worrying thing is that neither of the the log lines in that block ("There is new sys image" or " Find compressed sys image NAND") are seen, suggesting execution doesn't enter that block, though if that's the case it shouldn't get to the umount call either.

    That might suggest that none of the USB drives are mounted which is consistent with what I have been seeing so far from the busybox prompt (which would be worrying as it would make accessing a fresh copy of "sysdisk.img" from a USB stick impossible).

    Again, now that I have the script for reference about device paths and exact syntax on mount commands, I'll do some additional exploration to see what more I can figure out and let you know.

    One final thought...

    If all else fails, I was wondering if the procedure documented here could be used to replace everything that is missing: ftp://ftp.zyxel.it/guide/nas/nsa220_recovery_firmware.pdf via tftp?

    The challenge there is that unless I can mount a USB on the NAS to make 400AFM4CO.bin available, I don't have an environment where I can run "bin2ram" or "fw_unpack" to get the ~12 DATA_ files I'd need to stage on the tftp server.

    Anyway, one step at a time -- your help has given me a good direction to follow to see if I can find a good way to locally restore the sysdisk.img and I will let you know how things go.

    Again, huge thanks for your invaluable assistance.

    Cheers,
    Orlando


Consumer Product Help Center