NAS542: How do I establish serial connection?

13

All Replies

  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    How would I backup the falsh partitions and does it make sense to do so in the current state?




    I was hoping that I could get the stock OS running again first, then maybe upgrade the firmware in a save environment and after that try to install debian once more.

    If I boot without any sd card it does nothing but beeping. My router said that the device got an IP address but it's still unreachable. I can log on via serial but it's no use. I can't even think of editing a config file because this message keeps flashing whatever I do (even in vi editor).

    ipc_send:54: send IPC event OK
    Syntax error on line 24 of /etc/service_conf/httpd_zld.conf:
    Port must be specified
    Syntax error on line 24 of /etc/service_conf/httpd_zld.conf:
    Port must be specified<br>

  • Mijzelf
    Mijzelf Posts: 2,790  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    How would I backup the falsh partitions and does it make sense to do so in the current state?
    If it boots now, it makes sense. As it is the opposite of not booting. The way I know is to boot the stock kernel using tftpboot (in this forum there is a thread which describes how to do that), combined with a Universal usb_key_func stick which can stop the boot and give you shell access. Then in /firmware/bin/ or something like that there is a nand_read program. Run it without parameters to get instructions.

    But your failing stock OS might tell that there is something wrong with the bootargs. Can you post (or attach) a bootlog?
  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Mijzelf said:

    The way I know is to boot the stock kernel using tftpboot (in this forum there is a thread which describes how to do that), combined with a Universal usb_key_func stick which can stop the boot and give you shell access.

    tftpboot doesn't seem to be available at my barebox version (see attached file containing output of help).

    It wasn't easy to find a way to make the tftp download working. Especially because after a certain amount of time in barebox-mode my device is automatically rebooting. I only have a few moments to try things out.

    I was using this file: uImage.520ABAG1C0.bin but renaming it to uImage.bin for convenience.

    Barebox-C2K >/ mkdir /mnt # creating a RAM disk doesn't seem to make a difference
    </code>Barebox-C2K >/ mount none ramfs /mnt
    <code>Barebox-C2K >/ cd /mnt
    # same result when starting from here
    Barebox-C2K >/mnt dhcp
    warning: No MAC address set. Using random address EA:26:53:25:B0:2A
    DHCP client bound to address 192.168.1.84
    Barebox-C2K >/mnt eth0.serverip=192.168.1.55
    Barebox-C2K >/mnt tftp uImage.bin
    TFTP from server 192.168.1.55 ('uImage.bin' -> 'uImage.bin')
            #################################################################
            #################################################################
            ####...
    <code>
    With exactly this start I've tried three different approaches
    1st:
    Barebox-C2K >/mnt bootm uImage.bin
    ERROR: out of memory
    
    Same result with or without the -n option (there might be some major difference in the barebox-versions because bootm requires a file and not an address as parameter on my system).

    2nd:
    Barebox-C2K >/mnt bootz uImage.bin
    invalid magic 0x2e332d78
    

    3rd:
    Barebox-C2K >/mnt memcpy -s uImage.bin 0 0x21f00000 7330152
    Barebox-C2K >/mnt bootu 0x21f00000
    commandline: console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes  mac_addr=,,
    arch_number: 1094
    
    
    This seems a little better than before, but nothing happens after that until the device reboots again.
  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Mijzelf said:
    Then in /firmware/bin/ or something like that there is a nand_read program. Run it without parameters to get instructions.
    Since I can't boot the stock kernel using tftpboot, I've removed my sd card and booted the device. Then with constant beeping I've poked around a bit.

    The only command I've found that seemed to match the description was nanddump.
    # /sbin/nanddump Usage: nanddump [OPTIONS] MTD-device Dumps the contents of a nand mtd partition. -h --help Display this help and exit --version Output version information and exit --bb=METHOD Choose bad block handling method (see below). -a --forcebinary Force printing of binary data to tty -c --canonicalprint Print canonical Hex+ASCII dump -f file --file=file Dump to file -l length --length=length Length -n --noecc Read without error correction --omitoob Omit OOB data (default) -o --oob Dump OOB data -p --prettyprint Print nice (hexdump) -q --quiet Don't display progress and status messages -s addr --startaddress=addr Start address --bb=METHOD, where METHOD can be `padbad', `dumpbad', or `skipbad': padbad: dump flash data, substituting 0xFF for any bad blocks dumpbad: dump flash data, including any bad blocks skipbad: dump good data, completely skipping any bad blocks (default)
    After more search I've managed to find those informations:
    </code># /firmware/sbin/info_printenv ip=dhcp eth0.serverip=192.168.1.70 kernel_loc=nand rootfs_loc=nand uloaderimage=microloader-c2kevm.bin bareboximage=barebox-c2kevm.bin mfg_kernel_img=uImage_MFG mfg_rootfs_img=rootfs_ubi.img_MFG rootfs_type=ubifs rootfsimage=root.$rootfs_type-128k kernelimage_type=uimage kernelimage=uImage spi_parts=256k(uloader)ro,512k(barebox)ro,256k(env) spi_device=spi0.0 nand_device=comcertonand nand_parts=10M(config),10M(kernel1),110M(rootfs1),10M(kernel2),110M(rootfs2),-(reserved) rootfs_mtdblock_nand=2 autoboot_timeout=3 usb3_internal_clk=yes bootargs=console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes bootargs=$bootargs mac_addr=$eth0.ethaddr,$eth1.ethaddr,$eth2.ethaddr next_bootfrom=1 curr_bootfrom=1 kernel_mtd_1=4 sysimg_mtd_1=5 kernel_mtd_2=6 sysimg_mtd_2=7 MODEL_ID=B403 fwversion_1=V5.21(ABAG.5) revision_1=51181 modelid_1=B403 core_checksum_1=8656509367ee12a59840ba4883163966 zld_checksum_1=7fb4eade4aaac0bb78d61939f4c8ae7a romfile_checksum_1=FB1A img_checksum_1=cac0e147f330acfced6fe7db4c3d7425 fwversion_2=V5.21(ABAG.5) revision_2=51181 modelid_2=B403 core_checksum_2=8656509367ee12a59840ba4883163966 zld_checksum_2=7fb4eade4aaac0bb78d61939f4c8ae7a romfile_checksum_2=FB1A img_checksum_2=cac0e147f330acfced6fe7db4c3d7425 ethaddr=08:26:97:78:15:78 eth2addr=08:26:97:78:15:79 serial_number=S200Z30000343</pre><div><br>With this information I was able to dump four images to my USB stick.<br><pre class="CodeBlock"><code># nanddump -f kernel_mtd_1.bin /dev/mtd4 ECC failed: 0 ECC corrected: 0 Number of bad blocks: 0 Number of bbt blocks: 0 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x00a00000... # nanddump -f sysimg_mtd_1.bin /dev/mtd5 ECC failed: 0 ECC corrected: 0 Number of bad blocks: 0 Number of bbt blocks: 0 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x06e00000... # nanddump -f kernel_mtd_2.bin /dev/mtd6 ECC failed: 0 ECC corrected: 0 Number of bad blocks: 0 Number of bbt blocks: 0 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x00a00000... # nanddump -f sysimg_mtd_2.bin /dev/mtd7 ECC failed: 0 ECC corrected: 0 Number of bad blocks: 2 Number of bbt blocks: 0 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x06e00000... # ls *.bin -lh -rwxrwxrwx 1 nobody root 10.0M Dec 22 12:48 kernel_mtd_1.bin -rwxrwxrwx 1 nobody root 10.0M Dec 22 12:51 kernel_mtd_2.bin -rwxrwxrwx 1 nobody root 110.0M Dec 22 12:50 sysimg_mtd_1.bin -rwxrwxrwx 1 nobody root 109.8M Dec 22 12:53 sysimg_mtd_2.bin
    Are those the flash partitions you were talking about?
    Is that enough for a backup or did I miss something?

    Best regards
  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Damn, the format went wrong. But after what happened the last time I don't want to edit a post anymore.
  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Mijzelf said:
    But your failing stock OS might tell that there is something wrong with the bootargs. Can you post (or attach) a bootlog?
    Attached is a complete log of booting without an sd card. After the boot was complete I've used the console to fix the file that caused most of the error messages, but it didn't make much of a difference. The machine kept on beeping and was not reachable from the network. I've removed repeaded error messages from the log.
  • Mijzelf
    Mijzelf Posts: 2,790  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    About the beeping: Did you disconnect the fan? AFAIK there are 2 reasons why the firmware would start the buzzer, a degraded raid array or a not running fan, and it's not a degraded array here. Anyway, here is the command to stop the buzzer.
    Then tftpboot, in barebox the command is a bit different, here you can find it. But according to your bootlog the box is running the stock kernel, so it's no issue here.
    About the dumps, from your bootlog I see this list of partitions:
    [    3.331725] 0x000000000000-0x000000040000 : "uloader"
    [    3.343188] 0x000000040000-0x0000000c0000 : "barebox"
    [    3.359381] 0x0000000c0000-0x000000100000 : "env"
    [    3.544045] 0x000000000000-0x000000a00000 : "config"
    [    3.555619] 0x000000a00000-0x000001400000 : "kernel1"
    [    3.571391] 0x000001400000-0x000008200000 : "rootfs1"
    [    3.593877] 0x000008200000-0x000008c00000 : "kernel2"
    [    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
    [    3.621849] 0x00000fa00000-0x000010000000 : "reserved"
    If one of the first 2 get damaged, the box might be beyond repair. But you'll never know, the 'uloader' is big enough to be a complete u-boot. Maybe if barebox can't boot you will be thrown back to a u-boot prompt. And the first 3 partitions are on spi flash. That is a tiny little 6 pin chip on the motherboard, and it seems to be possible to reprogram it without de-soldering it using an arduino. So why not backup the first two (/dev/mtd0 and /dev/mtd1)? The 3th (env) is important. It contains the barebox environment. Without it the box won't boot, but barebox will start and give you a prompt. It also contains the flash partition table (as command line argument for the Linux boot). Several people here on the forum have used a backup of my env partition to get their box running again.
    'config' isn't important. It contains the user settings, and is erased on a factory reset.
    As the whole thing is about 256MB, I'd simply backup mtd0 - mtd8. Better safe than sorry.

    Your failing boot is strange. The webserver is complaining about a missing port in the config, and indeed it says 'Listen None' here. And the firmware keeps restarting the webinterface, just for fun.
    You should be able to solve the problem (temporary) by editing the file:
    sed -i 's/Listen None/Listen 8888/' /etc/service_conf/httpd_zld.conf
    But that's only temporary. This file is generated dynamically on boot. So somehow the generation fails. I think it's a good idea to start with a factory reset, to remove all (damaged) user config. A factory reset can be performed from the commandline by executing
    /usr/local/btn/reset_and_reboot.sh

    A little word about the boot process, in the listing above you can see that the flash address starts over at 0 for 'config'. The reason is that the SoC cannot boot from the cheap 256MiB Nand chip, so ZyXEL added a 1MiB spi flash chip. The cpu automagically reads and executes the first x bytes from the spi chip, that is 'uloader'. uloader knows it has to boot barebox, and where to find it. Barebox contains drivers for the nand flash. It reads it's environment 'env', which contains the boot script, which tells to load the Linux kernel from a certain flash partition ('kernel1' or 'kernel2'), and boot it with certain command line. Linux boots, reads it's command line (which you can find in the bootlog), from which it extracts the flash partition table. It detects&initializes all hardware from which it has the drivers build in, and finally it executes the init executable, which is responsible for all further configuration/startup. The init keeps running. It's pid 1, and if you kill it, the box shuts down.

    You have now the dumps of the 2 kernel partitions. Using 'file' you can see what is inside. It should be a 'legacy uImage', but what is interesting is the specified kernel name and timestamp. It will tell if you flashed a custom kernel.

  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Excellent! The stock os is working fine now!!!
    /usr/local/btn/reset_and_reboot.sh
    Did the trick!
    Mijzelf said:
    About the beeping: Did you disconnect the fan?
    I did. Thanks so much for this info. It was very unnerving to operate like that.

    Now I've dumped all nand partition. Once at the initial state, then again with the working stock ok and finally after I've upgraded the firmware to the newest version.

    Thanks also for the very valuable information about the boot process. I was indeed wondering, why the partitions overlap. I am still wondering why the 2nd rootfs is slightly smaller than the 1st even though I've dumped exactly the same length.

    I'm attaching one of the nand-extraction logs and the fw-upgrade+automatic-restart log.


    Can I assume that it's now as safe as it can be to retry installing debian and flashing the kernel again? Because that would be the next thing I'd do.

    Kind regards!
  • Mijzelf
    Mijzelf Posts: 2,790  Guru Member
    250 Answers 2500 Comments Friend Collector Seventh Anniversary
    I am still wondering why the 2nd rootfs is slightly smaller than the 1st even though I've dumped exactly the same length.
    The reason can be found in the bootlog:
    [    3.481228] Bad eraseblock 1536 at 0x00000c000000
    [    3.491399] Bad eraseblock 1537 at 0x00000c020000
    [    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
    There are two bad eraseblocks, both in the range of rootfs2. A bad eraseblock is not usable, and the nand driver skips it transparently, so the partition is actually smaller. As the address between 1536 and 1537 changes 0x20000, I suppose one block is 128KiB, so your rootfs2 dump is 256KiB smaller.
    The mtdblocks 9 and 10 are UBI partitions, which are created on 'config' and the active 'rootfs' partition. That is the reason you couldn't dump them, they are mounted. The rootfs partition is mounted on /firmware/mnt/nand, and when you use 'df' you can see that one or two bad eraseblocks are no problem at all, the partition is used for only 60%.
    (BTW, that partition is mounted ro, but nobody stops you if you remount it rw, and store some data on it. It has a normal filesystem (ubifs), and the data will last until you upgrade the firmware twice)
    Can I assume that it's now as safe as it can be to retry installing debian and flashing the kernel again? Because that would be the next thing I'd do.
    I can't think of anything you have forgotten.
  • bugblatterbeast
    bugblatterbeast Posts: 30  Freshman Member
    Mijzelf said:
    The reason can be found in the bootlog:
    [    3.481228] Bad eraseblock 1536 at 0x00000c000000
    [    3.491399] Bad eraseblock 1537 at 0x00000c020000
    [    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
    There are two bad eraseblocks, both in the range of rootfs2. A bad eraseblock is not usable, and the nand driver skips it transparently, so the partition is actually smaller. As the address between 1536 and 1537 changes 0x20000, I suppose one block is 128KiB, so your rootfs2 dump is 256KiB smaller.
    Ah ofc, that figures. Thanks.

    I have a suspicion what might have caused the issue:

    I've just install debian and updated the kernel. Everything went fine so far. But after I've changed the hostname, the domain and the workgroup in the openmediavault configuration and rebooted I couldn't connect anymore and I suddenly remembered that the last time, those changes were exactly the last thing I did before it all went wrong. Luckily I had the serial connection this time.

    # ifconfig
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 16436
            inet 127.0.0.1  netmask 255.0.0.0
            loop  txqueuelen 0  (Lokale Schleife)
            RX packets 448  bytes 88413 (86.3 KiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 448  bytes 88413 (86.3 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    #

    Then I've noticed that openmediavault had "cleaned" my /etc/network/interfaces file.
    # This file is auto-generated by openmediavault (https://www.openmediavault.org)
    # WARNING: Do not edit this file, your changes will get lost.
    
    # interfaces(5) file used by ifup(8) and ifdown(8)
    # Better use netplan.io or systemd-networkd to configure additional interface stanzas.
    
    # Include files from /etc/network/interfaces.d:
    source-directory /etc/network/interfaces.d

    All network interfaces were inactive:
    </code># lshw -C network
      *-network:0 DISABLED
           description: Ethernet interface
           physical id: 4
           logical name: egiga1
           serial: 08:26:97:78:15:79
           capabilities: ethernet physical
           configuration: broadcast=yes driver=c2000-geth driverversion=1.0 firmware=N/A link=no multicast=yes
      *-network:1 DISABLED
           description: Ethernet interface
           physical id: 5
           logical name: ethip0
           serial: 22:32:93:76:bc:80
           capabilities: ethernet physical
           configuration: broadcast=yes multicast=yes point-to-point=yes
      *-network:2 DISABLED
           description: Ethernet interface
           physical id: 6
           logical name: egiga0
           serial: 08:26:97:78:15:78
           capabilities: ethernet physical
           configuration: broadcast=yes driver=c2000-geth driverversion=1.0 firmware=N/A link=no multicast=yes</pre><div><br></div>Creating configuration files /etc/network/interfaces.d/egiga0<br><pre class="CodeBlock"><code># The primary network interface
    allow-hotplug egiga0
    iface egiga0 inet dhcp
    auto egiga0
    and /etc/network/interfaces.d/egiga1
    # The secondary network interface
    allow-hotplug egiga1
    iface egiga1 inet dhcp
    auto egiga1
    seems to have solved the issue for good. I have rebooted several times now and the system always comes up as it should. Last time this happened I've tried reinstalling debian and flashing the kernel again and that made things even worse.


    There is one last tiny thing that is bothering me a little. About two minutes after every reboot there is a long beep with this message on the serial console:
    [  123.046271] bz time = 1
    [  123.048729] bz status = 1
    [  123.051354] bz_timer_status = 0
    [  123.054564] start buzzer
    I can't find any event that might be causing it.

    Best regards

Consumer Product Help Center