NAS542 stuck in a boot loop

footstep
footstep Posts: 6
First Comment First Anniversary
edited March 2021 in Personal Cloud Storage
Our litte daughter felt in love pressing the power button on the NAS542. I don't know how many times she did this, but since then the NAS doesn't boot up properly anymore. After some time I hear the harddrives spinning up and suddenly they stop. This repeats infinitly until I cut the power.

After contacting the Zyxel Support, I've received the files to create a recovery stick. Unfortunatly this didn't work, so I've created another stick to gain telnet access to the NAS. Based on my investigations, the script is failing on the following command: CURR_BOOTFROM=`${FW_PATH}/sbin/info_printenv curr_bootfrom | awk -F"=" '{print $2}'`

When running this command manually, I receive the following error:
/firmware/sbin/info_printenv curr_bootfrom
envfs: wrong magic on /dev/mtd2

The Zyxel Support told me to send the NAS for repair, but I'm wondering if there's another way to fix this issue. Does someone have a clue?



Accepted Solution

  • footstep
    footstep Posts: 6
    First Comment First Anniversary
    Answer ✓
    I had to erase the flash first using: /sbin/flash_erase /dev/mtd2 0 0. Then I've rewritten it using the file you've provided and now /firmware/sbin/info_printenv is finally returning readable output.

    I'm now looking into updating the firmware to correct the checksums in the env partition.

    Thanks you very much for your help.
«134

All Replies

  • Mijzelf
    Mijzelf Posts: 2,764  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    Hm. I think I know what went wrong. The NAS has a mechanism to prevent bad flashes, which is called 'double flash'. All flash partitions which are updated by a firmware flash are doubly ended. You flash the half which is not 'current', and when that succeeded the 'curr_bootfrom' variable is updated, so that it will boot the other half next boot. When the flashing fails this flag is not updated.
    But there is more. The bootscript also detects if last boot failed, and then it switches that variable and reboots, to fall back on the previous firmware, to protect you against bad firmware.
    Now I think your daughter has triggered the 'last boot failed' safety, and cut the power while the 'cur_bootfrom' variable was updated, corrupting the u-boot environment.

    I *think* you can repair that magic by simply updating some variable, for instance that 'cur_bootfrom'. On my NAS540 the value is 2:

    /firmware/sbin/info_setenv cur_bootfrom 2

    It is possible that it will drop you in the previous firmware, of course. It is also possible that there will be some severe damage in the u-boot environment, keeping it from booting.
    The environment on my 540 is:

    /firmware/sbin$ info_printenv
    ip=dhcp
    eth0.serverip=192.168.1.70
    kernel_loc=nand
    rootfs_loc=nand
    uloaderimage=microloader-c2kevm.bin
    bareboximage=barebox-c2kevm.bin
    mfg_kernel_img=uImage_MFG
    mfg_rootfs_img=rootfs_ubi.img_MFG
    rootfs_type=ubifs
    rootfsimage=root.$rootfs_type-128k
    kernelimage_type=uimage
    kernelimage=uImage
    spi_parts=256k(uloader)ro,512k(barebox)ro,256k(env)
    spi_device=spi0.0
    nand_device=comcertonand
    nand_parts=10M(config),10M(kernel1),110M(rootfs1),10M(kernel2),110M(rootfs2),-(reserved)
    rootfs_mtdblock_nand=2
    autoboot_timeout=3
    usb3_internal_clk=yes
    bootargs=console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes
    bootargs=$bootargs mac_addr=$eth0.ethaddr,$eth1.ethaddr,$eth2.ethaddr
    next_bootfrom=2
    curr_bootfrom=2
    kernel_mtd_1=4
    sysimg_mtd_1=5
    kernel_mtd_2=6
    sysimg_mtd_2=7
    MODEL_ID=B103
    fwversion_1=V5.04(AATB.0)
    fwversion_2=V5.11(AATB.2)
    revision_1=46843
    revision_2=49397
    modelid_1=B103
    modelid_2=B103
    core_checksum_1=32768bcdcd9677274d4af1c02f41dda6
    core_checksum_2=0eaa12517d117ff7dd2f68502b7f961d
    zld_checksum_1=dbdacfd6dd97dad4787d514f7cdaa496
    zld_checksum_2=44485b00ede541d4f27db02f0da490f9
    romfile_checksum_1=8D7D
    romfile_checksum_2=28C8
    img_checksum_1=2dbaf250ef4e9574d28a0340379f831a
    img_checksum_2=83d14a443096a8284b07e3f3a91b1673
    serial_number=S140Z45007917
    ethaddr=5C:F4:AB:5C:58:FC
    eth2addr=5C:F4:AB:5C:58:FD
    change_boot_part=0

    As you can see it is not possible to know all values, as it contains md5sums of installed firmware blobs. Don't know what happens if these don't fit. (Well, I know for img_checksum_X, it will on each boot pull a fresh copy of the on disk installed firmware from flash) It also contains the MAC addresses. Not all variables are important, but at least the 'nand_parts' and 'spi_parts' are. Without them the box can't boot.
    Would ZyXEL repair this under warranty?

  • footstep
    footstep Posts: 6
    First Comment First Anniversary
    Thanks for your answer. Based on your other helpful entries in this forum, I think I've reached the right person :-)

    Unfortunatly I cannot set any values, because the command gives me the same error:

    / $ /firmware/sbin/info_setenv cur_bootfrom 2
    envfs: wrong magic on /dev/mtd2

    Based on the recovery script, /dev/mtd2 is the barebox env partition. The script also includes a section to rewrite this partition, but the support was unable to provide me the required barebox_env file. But I also don't know if that could do more harm than good.

    As I bought the NAS back in 2016, I don't think that it's still under warranty.

  • footstep
    footstep Posts: 6
    First Comment First Anniversary
    edited March 2021
    At least I've found a workaround to boot the NAS (without any disks at the moment):
    1. Boot it with your universal_usb_key_func-2015-10-12 (network and telnet)
    2. Connect by telnet
    3. Change to root using su
    4. Use vi to change the following lines in /etc/init.d/rcS
      #ubiattach -m ${IMG_MTD} -d ${IMG_MTD}
      #mount -t ${NAND_FS_TYPE} -o ro ubi${IMG_MTD}:ubi_rootfs${CURR_BOOTFROM} ${NAND_PATH}
      ubiattach -m 5 -d 5
      mount -t ubifs -o ro ubi5:ubirootfs1 /firmware/mnt/nand     
    5. Remove the USB stick
    6. Run /etc/init.d/rcS
    I think the missing part is to fix the barebox env.
  • Mijzelf
    Mijzelf Posts: 2,764  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    I pm'd you a download link to a dump of my nand partition (or at least I think I did, the forum software is confusing),
    which I created with
    nanddump /dev/mtd2 | gzip >nas540.mtd2.gz
    I think you should be able to write it with
    cat nas540.mtd2.gz | gzip -d | nandwrite /dev/mtd2
    If that fails things got worse, as without any environment the box won't boot at all. So if you hesitate I think it should be possible to automate your work around.

    If you flash this, and the box boots with it, you'll have to change the modelid_1 and modelid_2, the 542 has B403, and perform an update to get the checksums right. Further it's neat to change the MAC addresses to what it should be, but it's not necessary. Odds are low that your NAS will ever be in the same LAN as mine.

    But re-reading your comment I think you mean that the barebox_env file could be part of an update blob. If that is true, I can extract it. Where did you read that in which script? It seems a bit strange to me as the MAC addresses are also stored in the barebox env, but possibly they are backed up before overwriting.

  • footstep
    footstep Posts: 6
    First Comment First Anniversary
    I took the risk and have written your mtd2 to my NAS. It didn't fix the wrong magic error, but at least the NAS is still booting using the USB stick.
  • footstep
    footstep Posts: 6
    First Comment First Anniversary
    Answer ✓
    I had to erase the flash first using: /sbin/flash_erase /dev/mtd2 0 0. Then I've rewritten it using the file you've provided and now /firmware/sbin/info_printenv is finally returning readable output.

    I'm now looking into updating the firmware to correct the checksums in the env partition.

    Thanks you very much for your help.
  • ilbirs
    ilbirs Posts: 6
    edited April 2021
    good afternoon, at the moment I have the same problem, can you post the mtd2 dump?
  • Mijzelf
    Mijzelf Posts: 2,764  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    I pm'ed you a downloadlink.
  • ilbirs
    ilbirs Posts: 6
    Dear Mijzelf many thanks for the file. now your script has started working, but it is not clear what is meant here about the model? running on NAS 542
    currently hanging on such a message
    + file_model=B403
    + echo -n 'board_model=(B103), file_model=(B403) ... '
    board_model=(B103), file_model=(B403) ... + '[' xB103 == xB403 ']'
    + echo 'NOT equal! /firmware/sbin/mrd_model -s B403'
    NOT equal! /firmware/sbin/mrd_model -s B403
    + error_exit

  • Mijzelf
    Mijzelf Posts: 2,764  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    See above. The u-boot environment contains the board_model. My nand dump is from a NAS540, which has B103. Apparently you have a 542, board_model B403. So to flash 542 firmware you have to change the u-boot environment.

Consumer Product Help Center