NAS540 Bootloop, then Upgradekey, now Nothing

Options
Nas540U2022
Nas540U2022 Posts: 16  Freshman Member
10 Comments Friend Collector
Hello together.
Hopefully you can help me.
My NAS540 started rebooting and my disks got really hot.
Turned everything off, removed the disks, turned the nas on -> still bootlooping.
I checked the forums a little and found this magic Upgradekey from Mijzelf. Put that on a stick without changing anything, plugged it in and it did copy around for a while until box turned off.
Since it didn't turn on by itself anymore I pressed the ON-button.

Now the box is kinda dead. I connected via serial and got the following:
uloader 2011.06.0 (May 20 2014 - 16:36:41)

Board: Mindspeed C2000
c2k_spi_probe

Copying Barebox from SPI Flash(bootopt=0)
BB Copying Done
## Starting Barebox at 0x01000000 ...


barebox 2011.06.0-svn44305-dirty6 (Aug 28 2014 - 22:25:22)

Board: Mindspeed C2000
c2k_spi_probe
c2k_otp_probe.
cbus_baseaddr: 9c000000, ddr_baseaddr: 03800000, ddr_phys_baseaddr: 03800000
class init complete
tmu init complete
bmu1 init: done
bmu2 init: done
util init complete
GPI1 init complete
GPI2 init complete
HGPI init complete
HIF rx desc: base_va: 03e80000, base_pa: 03e80000
HIF tx desc: base_va: 03e80400, base_pa: 03e80400
HIF init complete
bmu1 enabled
bmu2 enabled
pfe_hw_init: done
pfe_firmware_init
pfe_load_elf
pfe_load_elf no of sections: 10
pfe_firmware_init: class firmware loaded
pfe_load_elf
pfe_load_elf no of sections: 10
pfe_firmware_init: tmu firmware loaded
pfe_load_elf
pfe_load_elf no of sections: 14
pfe_firmware_init: util firmware loaded
eth_port: 0
NAS540_phy_reg_setting[eth_port].phyaddr: 0x4
miidev_restart_aneg for PHY4
eth_port: 1
NAS540_phy_reg_setting[eth_port].phyaddr: 0x6
miidev_restart_aneg for PHY6
cfi_probe: cfi_flash base: 0xc0000000 size: 0x04000000
## Unknown FLASH on Bank at 0xc0000000 - Size = 0x00000000 = 0 MB
bootopt = 0x0
Using ENV from SPI Flash.
nand_probe: comcerto_nand base: 0xc8300000 size: 0x256 MB
NAND device: Manufacturer ID: 0x01, Chip ID: 0xda (AMD NAND 256MiB 3,3V 8-bit), page size: 2048, OOB size: 64
Using default values
Bad block table not found for chip 0
Bad block table not found for chip 0
Scanning device for bad blocks
Bad block table written to 0x0ffe0000, version 0x01
Bad block table written to 0x0ffc0000, version 0x01
Malloc space: 0x00600000 -> 0x01000000 (size 10 MB)
Stack space : 0x005f8000 -> 0x00600000 (size 32 kB)
running /env/bin/init...
Unknown command 'export' - try 'help'
Disabling eee function of phy 4 ...
Disabling eee function of phy 6 ...

Hit any key to stop autoboot:  1
booting kernel of type uimage from /dev/nand0.kernel1.bb
Bad Header Checksum
 Failed.
booting kernel of type uimage from /dev/nand0.kernel2.bb
Bad Header Checksum
warning: No MAC address set. Using random address 4E:7E:63:E0:6E:7E
T DHCP client bound to address 192.168.188.92
DHCP client bound to address 192.168.188.92

How can I fix this? USB-Sticks are not recognized anymore, whether the reset button on the back nor the buttons on the front do anything.

I tried flashing a new uImage (521) like this guy, without success.
It looks as if my nand is not working anymore.

Looking forward for your comments.

mfg

Best Answers

  • Mijzelf
    Mijzelf Posts: 2,607  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited January 2023 Answer ✓
    Options
    Can I conclude that you managed to reflash the uImage? In that case a small how-to would be appreciated.
    The 'reboot permanently on scheduled reboot' is an old bug, from which I assumed it was squashed by now. But apparently not. Of course it can be turned of via the command line, but unfortunately I only know a way to simply delete the whole configuration. The configuration is stored on /etc/zyxel, on which a flash partition is mounted:
    ubiattach -m 3 -d 3
    mount -t ubifs ubi3:ubi_config /etc/zyxel

    It contains several config files in different formats (sqlite, xml, some python format, ...) and I don't know which file contains this specific setting. Deleting the complete content of this partition equals a factory reset.
    If you are not able to get a shell in Linux, you can use an 'Universal usb_key_func stick' to get telnet access. (Use the network_telnet_stop or telnet_stop function)
    Of course it should also be possible to erase that partition from barebox, but I don't know how.

  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    Answer ✓
    Options
    Mijzelf said:
    Can I conclude that you managed to reflash the uImage? In that case a small how-to would be appreciated.
    Thank you again @Mijzelf for your help, would not be at this point if it weren't for you. My NAS is up and running again. My disk were recognized in the first second. Only thing I had to to were turn on link aggregation, add my users and edit the sharing permissions that got deleted/disabled.

    Here is how I did it:
    1. Download any TFTP-Server onto your pc/notebook, I used the server from this website.
    2. Download the uImage of your Box, mine is uImage.521 from the link @Mijzelf provided.
    3. Connection via serial port to the box and turn it on.
    4. Cancel the autoboot to log into barebox shell.
    5. Open file /env/config with editor "edit"
    6. Change Line 11: eth0.serverip="192.168.188.100" <- this is my notebook, running the tftp server.
    7. Change Line 31: mfg_kernel_img=uImage.521 <- I have this file in the main folder of the tftp server.
    8. Exit with Ctrl+D to save the config file.
    9. Type "saveenv" to save the changes through the next boot.
    10. Now either power cycle or type "boot" and it's downloading the uImage from your tftp server directly to the nand twice, for each kernel once.
    11. This gives you your basic linux installation on top of barebox.
    12. Now you download the Rescue Stick Image and put that on a usb stick, I used a 16GB Toshiba USB 3.0 stick formatted to FAT32. If it doesn't work, try another stick, some simply don't work for reasons unknown.
    13. Replace the ras.bin with the Firmware Image matching the uImage you downloaded before.
    14. Edit the "ras.bin.md5sum" accordingly. Since I'm on windows I used Notepad++ for editing.
    15. To create the md5 hash you can use powershell (see attached image) or a good old cmd-droplet (see attached md5.rar) which will create a textfile with the name of the dropped file plus the extension .txt.
    I made the mistake to use the newest firmware available for my unit, just to realize, that the checksums don't match with the linux kernel and thus thinks the kernel is bad. As long as the tftp server is running though it will download and flash the uImage again and again and again ;)

    What was weird in my case, and I still can't explain why that is, was that my barebox kept rebooting after a couple minutes, so I had to to the editing of the config file quickly.


    To reset the NAS to a state where it wouldn't reboot anymore I did this:
    1. I flashed the linux image as described above.
    2. Then I downloaded the universal_usb_key_func-2015-10-12.zip and put it on the same stick from before, deleting everything else beforehand.
    3. I think I renamed the usb_key_func.sh.network_telnet_stop to usb_key_func.sh.2 as is described in the readme file.
    4. I did not use telnet, as the command line of the linux kernel was sufficent for the following commands:
    • ubiattach -m 3 -d 3
    • mount -t ubifs ubi3:ubi_config /etc/zyxel
    • cd /etc/zyxel
    • rm -r * < CAREFUL! this will delete everything in /etc/zyxel leaving the box in factory settings!
    • exit
    I did this because I couldn't find a way to transfer files to a stick or network location and no sqlite was installed to access the .db-files.

«13

All Replies

  • Mijzelf
    Mijzelf Posts: 2,607  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    When you compare your bootlog to this one, there is nothing which suggests that nand is bad, except for the bad checksums of the kernels.
    I tried flashing a new uImage (521) like this guy, without success.
    That guy didn't flash a kernel, at least not in that thread, but booted a kernel which he provided over tftp.
    Where did your attempt fail?
  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    Options
    Hi Mijzelf,

    thank you for looking into this.
    After the whole boot process ran through it did a reboot and well, it went back to square one.
    I provided a logfile of the boot process. I see a lot of errors and warnings in there, don't know though if those are normal or not.

    I thought that the nand might be dead, when I saw the message "NAND Flash Corrupt" in line 533 of the file. Got a stick connected with the latest firmware on it (modified upgrade key), but booting from the stick doesn't happen.

    mfg

  • Mijzelf
    Mijzelf Posts: 2,607  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    Right. I got a déja vu feeling. The direct problem is not the nand, but the kernel command line. That might be caused by bad flash, but also by a bad boot command. I just don't know barebox good enough to know that.
    For this kernel the flash layout has to be provided in the kernel command line. It should be
    console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes  mac_addr=,, ip=dhcp root=ubi0:rootfs ubi.mtd=2,2048 rootfstype=ubifs rw noinitrd mtdparts=spi0.0:256k(uloader)ro,512k(barebox)ro,256k(env);comcertonand:10M(config),10M(kernel1),110M(rootfs1),10M(kernel2),110M(rootfs2),-(reserved) usb3_internal_clk=yes
    (That is a single line), and in your case it is
    console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes  mac_addr=,,
    so the precious mtdparts is missing. Result is that the kernel doesn't know about flash partitions, and /dev/mtd7 (rootfs2) is not available, which causes the bootscript to conclude that the flash is corrupt.
    On a normal boot the commandline is created by a script which is run before actually booting the kernel. I don't know if that script is supposed to run on 'bootm'. If it does, the environment could be corrupt, causing the script to fail. (The environment is in spi flash, as opposed to the 'comcertonand' flash which contains the kernels and rootfs's). If it doesn't, another preparation before bootm is needed, or maybe another boot* command.
    Sorry that I don't have a readymade answer on your problem.
    Your initial problem looks strange. I can imagine that the box gets into a bootloop, due to some bad setting or hardware problem, but why would the disks get hot? They are not really stressed by the boot procedure. Do you have the possibility to read their smart values?
     
  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    Options
    Is there a way to stop barebox from rebooting all the time? It feels like it's on a timeout of 3 minutes.
    I can type in commands but at some point it just reboots, so I have to type fast...

    I could read out the smart values in another pc, but I don't want to hook up the disks right now.
    They got hot because they were spun up every minute or so. 4 WD Red à 8gigs. I'm sure they are fine though.

    Back to the problem at hand.
    To make it boot properly I would have to somehow switch out the console command, which I probably can't without the source of the uImage file...

    Man I just don't want to give up here because it seems to be nothing more than a software issue.


  • Mijzelf
    Mijzelf Posts: 2,607  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    Is there a way to stop barebox from rebooting all the time?

    ? AFAIK it shouldn't. Could that 3 minutes be the same interval at which the firmware originally rebooted? Does the SoC run hot? If for some reason the box always reboots after 3 minutes, the UpgradeKey could have had not enough time to properly flash the firmware, causing the checksum error.

    The SoC has a watchdog, as you can see in the kernel log. Maybe it stays active after a watchdog initiated reboot? Does it also reboot after a power cycle (where the power should be long enough down to reset all electronics)

    How about the power supply? Is it stable?

  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    Options
    Well it boots and sits there after calling dhcp if I don't press anything. It's not turning off or resetting after this point. Only if I anykey into barebox shell it's resetting after these 3 minutes.
    The SoC is warm, I can keep my finger on the cooler, so I don't think it's overheating. The case is completely gone right now. The barebone frame is sitting on my desk here.

    If I missed the autoboot sequence, there's nothing else I can to do reboot the box other then a power cycle, so I'm doing that constantly already.

    Power supply seems stable, I could check if I find another one. I just touched it and it's totally cool, so no overheating there either.

    I checked the /etc/config file and it has these two lines in it:
    bootargs="console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes "
    bootargs="$bootargs mac_addr=$eth0.ethaddr,$eth1.ethaddr,$eth2.ethaddr"

    The documentation of barebox says the following:

    The simple method to pass bootargs to the kernel is with CONFIG_FLEXIBLE_BOOTARGS disabled: in this case the bootm command takes the bootargs from the bootargs environment variable.

    With CONFIG_FLEXIBLE_BOOTARGS enabled, the bootargs are composed from different global device variables. All variables beginning with global.linux.bootargs. will be concatenated to the bootargs
    and further:
    Additionally all variables starting with global.linux.mtdparts. are concatenated to a mtdparts= parameter to the kernel. This makes it possible to consistently partition devices with the addpart - add a partition description to a device command and pass the same string as used with addpart to the Kernel:
    Now I don't have the global keyword in barebox, but the two lines in the config are an exact match for the commandline that I got.

    Can I just edit that file and add everything I want?
  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    edited January 2023
    Options
    I think the original interval was shorter, but I could be wrong, didn't time it.
    SoC is warm, can put my finger on the cooler and keep it there.
    That could be a possibility, maybe left the system with only have the data, though it didn't reboot after the running the upgrade, but turned off completely. I turned it back on manually, I guess that's not normal either.

    It does reboot after a power cycle, if I key into the shell. If I let it boot it calls dhcp and sits there afterwards without rebooting ever again. I think it crashes there because no input of any kind other than power cycling is possible after that point.

    I checked the /etc/config file and it has these two lines in it:
    bootargs="console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes "
    bootargs="$bootargs mac_addr=$eth0.ethaddr,$eth1.ethaddr,$eth2.ethaddr"

    The documentation of barebox says the following:

    The simple method to pass bootargs to the kernel is with CONFIG_FLEXIBLE_BOOTARGS disabled: in this case the bootm command takes the bootargs from the bootargs environment variable.

    With CONFIG_FLEXIBLE_BOOTARGS enabled, the bootargs are composed from different global device variables. All variables beginning with global.linux.bootargs. will be concatenated to the bootargs
    and further:
    Additionally all variables starting with global.linux.mtdparts. are concatenated to a mtdparts= parameter to the kernel. This makes it possible to consistently partition devices with the addpart - add a partition description to a device command and pass the same string as used with addpart to the Kernel:
    Now I don't have the global keyword in barebox, but the two lines in the config are an exact match for the commandline that I got, it could also just be the standard config.

    Can I just edit that file and add everything I want?
  • tonygibbs16
    tonygibbs16 Posts: 842  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited January 2023
    Options
    Hello,

    I don't usually comment on NAS posts, because @Mijzelf knows these devices much better than I do.

    But it seems from the log like the Linux kernel is running its shutdown script /etc/rc.shutdown .

    /etc/init.d/rc.shutdown: line 18: /sbin/i2cset: not found
    start kill.
      700 root      2512 S <  /sbin/watchdog -t 8 -T 15 /dev/comcerto_wdt

    https://linux.die.net/man/8/watchdog has some information about /sbin/watchdog 

    The page says that /sbin/watchdog checks a number of things:
    "The watchdog daemon can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.

    Tests

    The watchdog daemon does several tests to check the system status:

    • Is the process table full?• Is there enough free memory?• Are some files accessible?• Have some files changed within a given interval?• Is the average work load too high?• Has a file table overflow occurred?• Is a process still running? The process is specified by a pid file.• Do some IP addresses answer to ping?• Do network interfaces receive traffic?• Is the temperature too high? (Temperature data not always available.)• Execute a user defined command to do arbitrary tests.• Execute one or more test/repair commands found in /etc/watchdog.d. These commands are called with the argument test or repair.If any of these checks fail watchdog will cause a shutdown. Should any of these tests except the user defined binary last longer than one minute the machine will be rebooted, too."

    also

    "

    Function

    After watchdog starts, it puts itself into the background and then tries all checks specified in its configuration file in turn. Between each two tests it will write to the kernel device to prevent a reset. After finishing all tests watchdog goes to sleep for some time. The kernel drivers expects a write to the watchdog device every minute. Otherwise the system will be reset. As a default watchdog will sleep for only 10 seconds so it triggers the device early enough."


    I hope that this is helpful.

    Kind regards,
         Tony
  • Mijzelf
    Mijzelf Posts: 2,607  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options
    Can I just edit that file and add everything I want?

    I wouldn't know why not. The environment files can be edited, and, when I remember well, saveenv will write it to flash. There is an editor build-in, which name could be 'edit'.

    About the barebox documentation, keep in mind that your barebox is old. It's version 2011.06.0, while the initial release (according to Wikipedia) was in 2009. In the barebox documentation I miss the mention in which version some feature was added.

    tonygibbs16 citates another manual. I learned something more, never knew that there exists an 'official' watchdog daemon, and that /dev/'watchdog' has some documented API. But I looked at it, and the watchdog binary on the NAS is a busybox applet, and the busybox applet doesn't do that much. If you look at the source,  (which is of  course much newer than the binary on the box) it registers some signal handlers which will disable the watchdog, and then forever loops in

    while (1) {
            write(3, "", 1); /* write zero byte */
            msleep(stimer_duration);
    }

    So apart from telling the kernel that the deamon is still alive, it does nothing.

    In case anyone is interested,

    /etc/init.d/rc.shutdown: line 18: /sbin/i2cset: not found

    i2cset is not found because the boot didn't complete. A major part of the rootfs is inside an ext2 filesystem containing blob, which is loopmounted on /ram_bin. The content of /ram_bin/sbin/ is copied to /sbin/ (which is in the embedded initrd), and the binary i2cset is one of these files.


  • Nas540U2022
    Nas540U2022 Posts: 16  Freshman Member
    10 Comments Friend Collector
    edited January 2023
    Options
    Okay. So I was able to boot the kernel now and actually perform the usb recovery operation, just so it failed again. Seems something is off about the upgradekey.

    Well, now I'm back to my original bootloop...

    What I did to get back:
    I had to edit the /env/config with "edit".
    Line 11: eth0.serverip="192.168.188.100" <- this is my notebook, running the tftp server from this website.
    Line 31: mfg_kernel_img=uImage.521 <- I have this file in the main folder of the tftp server

    I also changed the bootargs to the line @Mijzelf supplied, but uncommented it later, because it seems to not be necessary.

    To save the file you have to exit with ctrl+D followed by a "saveenv" command.
    Now either power cycle or type "boot" and it's downloading the uImage from my tftp-server directly to the nand.

    If I use the recovery key with the current image, I'm back to my nonresponsive box.

    I have attached the origintal /env/config file again, so you can see which lines were edited.
    Also attached is the boot_and_loop.txt with my current problem.

    The interesting part is this:
    / # ipc_send:54: send IPC event OK
    sh: can't create /i-data/sysvol/.system/my_timezone.info: nonexistent directory
    [get_dst_info2]: year=2023 [get_dst_info2]: fp is NULL errno = 2 [sch_controller.c]: target sec(1675652400) vs Mon Feb  6 04:00:00 2023

    sh: can't create /i-data/sysvol/.system/my_timezone.info: nonexistent directory
    [get_dst_info2]: year=2023 [get_dst_info2]: fp is NULL errno = 2 [sch_controller.c]: target sec(1675652400) vs Mon Feb  6 04:00:00 2023

    sch_queue_del:206 found item 0x28388
    sch_queue_del:218: free 0x28388
    Jan  7 21:27:26 NAS540 linuxrc: starting pid 3887, tty '': '/etc/init.d/rc.shutdown'


    What is this target "Mon Feb 6 04:00:00 2023"
    Looks like the box is behaving the way it is since it hit this date...


Consumer Product Help Center