NAS542: How do I establish serial connection?

bugblatterbeast · December 2021

How would I backup the falsh partitions and does it make sense to do so in the current state?

I was hoping that I could get the stock OS running again first, then maybe upgrade the firmware in a save environment and after that try to install debian once more.

If I boot without any sd card it does nothing but beeping. My router said that the device got an IP address but it's still unreachable. I can log on via serial but it's no use. I can't even think of editing a config file because this message keeps flashing whatever I do (even in vi editor).

ipc_send:54: send IPC event OK
Syntax error on line 24 of /etc/service_conf/httpd_zld.conf:
Port must be specified
Syntax error on line 24 of /etc/service_conf/httpd_zld.conf:
Port must be specified<br>

Mijzelf · December 2021

bugblatterbeast said:

How would I backup the falsh partitions and does it make sense to do so in the current state?

If it boots now, it makes sense. As it is the opposite of not booting. The way I know is to boot the stock kernel using tftpboot (in this forum there is a thread which describes how to do that), combined with a Universal usb_key_func stick which can stop the boot and give you shell access. Then in /firmware/bin/ or something like that there is a nand_read program. Run it without parameters to get instructions.

But your failing stock OS might tell that there is something wrong with the bootargs. Can you post (or attach) a bootlog?

bugblatterbeast · December 2021

Mijzelf said:

The way I know is to boot the stock kernel using tftpboot (in this forum there is a thread which describes how to do that), combined with a Universal usb_key_func stick which can stop the boot and give you shell access.

tftpboot doesn't seem to be available at my barebox version (see attached file containing output of help).

It wasn't easy to find a way to make the tftp download working. Especially because after a certain amount of time in barebox-mode my device is automatically rebooting. I only have a few moments to try things out.

I was using this file: uImage.520ABAG1C0.bin but renaming it to uImage.bin for convenience.

Barebox-C2K >/ mkdir /mnt # creating a RAM disk doesn't seem to make a difference
</code>Barebox-C2K >/ mount none ramfs /mnt
<code>Barebox-C2K >/ cd /mnt
# same result when starting from here
Barebox-C2K >/mnt dhcp
warning: No MAC address set. Using random address EA:26:53:25:B0:2A
DHCP client bound to address 192.168.1.84
Barebox-C2K >/mnt eth0.serverip=192.168.1.55
Barebox-C2K >/mnt tftp uImage.bin
TFTP from server 192.168.1.55 ('uImage.bin' -> 'uImage.bin')
        #################################################################
        #################################################################
        ####...
<code>

With exactly this start I've tried three different approaches
1st:

Barebox-C2K >/mnt bootm uImage.bin
ERROR: out of memory

Same result with or without the -n option (there might be some major difference in the barebox-versions because bootm requires a file and not an address as parameter on my system).

2nd:

Barebox-C2K >/mnt bootz uImage.bin
invalid magic 0x2e332d78

3rd:

Barebox-C2K >/mnt memcpy -s uImage.bin 0 0x21f00000 7330152
Barebox-C2K >/mnt bootu 0x21f00000
commandline: console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes  mac_addr=,,
arch_number: 1094

This seems a little better than before, but nothing happens after that until the device reboots again.

bugblatterbeast · December 2021

Mijzelf said:
Then in /firmware/bin/ or something like that there is a nand_read program. Run it without parameters to get instructions.

Since I can't boot the stock kernel using tftpboot, I've removed my sd card and booted the device. Then with constant beeping I've poked around a bit.

The only command I've found that seemed to match the description was nanddump.

# /sbin/nanddump
Usage: nanddump [OPTIONS] MTD-device
Dumps the contents of a nand mtd partition.

-h         --help               Display this help and exit
           --version            Output version information and exit
           --bb=METHOD          Choose bad block handling method (see below).
-a         --forcebinary        Force printing of binary data to tty
-c         --canonicalprint     Print canonical Hex+ASCII dump
-f file    --file=file          Dump to file
-l length  --length=length      Length
-n         --noecc              Read without error correction
           --omitoob            Omit OOB data (default)
-o         --oob                Dump OOB data
-p         --prettyprint        Print nice (hexdump)
-q         --quiet              Don't display progress and status messages
-s addr    --startaddress=addr  Start address

--bb=METHOD, where METHOD can be `padbad', `dumpbad', or `skipbad':
    padbad:  dump flash data, substituting 0xFF for any bad blocks
    dumpbad: dump flash data, including any bad blocks
    skipbad: dump good data, completely skipping any bad blocks (default)

After more search I've managed to find those informations:

</code># /firmware/sbin/info_printenv 
ip=dhcp
eth0.serverip=192.168.1.70
kernel_loc=nand
rootfs_loc=nand
uloaderimage=microloader-c2kevm.bin
bareboximage=barebox-c2kevm.bin
mfg_kernel_img=uImage_MFG
mfg_rootfs_img=rootfs_ubi.img_MFG
rootfs_type=ubifs
rootfsimage=root.$rootfs_type-128k
kernelimage_type=uimage
kernelimage=uImage
spi_parts=256k(uloader)ro,512k(barebox)ro,256k(env)
spi_device=spi0.0
nand_device=comcertonand
nand_parts=10M(config),10M(kernel1),110M(rootfs1),10M(kernel2),110M(rootfs2),-(reserved)
rootfs_mtdblock_nand=2
autoboot_timeout=3
usb3_internal_clk=yes
bootargs=console=ttyS0,115200n8, init=/etc/preinit pcie_gen1_only=yes 
bootargs=$bootargs mac_addr=$eth0.ethaddr,$eth1.ethaddr,$eth2.ethaddr
next_bootfrom=1
curr_bootfrom=1
kernel_mtd_1=4
sysimg_mtd_1=5
kernel_mtd_2=6
sysimg_mtd_2=7
MODEL_ID=B403
fwversion_1=V5.21(ABAG.5)
revision_1=51181
modelid_1=B403
core_checksum_1=8656509367ee12a59840ba4883163966
zld_checksum_1=7fb4eade4aaac0bb78d61939f4c8ae7a
romfile_checksum_1=FB1A
img_checksum_1=cac0e147f330acfced6fe7db4c3d7425
fwversion_2=V5.21(ABAG.5)
revision_2=51181
modelid_2=B403
core_checksum_2=8656509367ee12a59840ba4883163966
zld_checksum_2=7fb4eade4aaac0bb78d61939f4c8ae7a
romfile_checksum_2=FB1A
img_checksum_2=cac0e147f330acfced6fe7db4c3d7425
ethaddr=08:26:97:78:15:78
eth2addr=08:26:97:78:15:79
serial_number=S200Z30000343</pre><div><br>With this information I was able to dump four images to my USB stick.<br><pre class="CodeBlock"><code># nanddump -f kernel_mtd_1.bin /dev/mtd4
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00a00000...

# nanddump -f sysimg_mtd_1.bin /dev/mtd5
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x06e00000...

# nanddump -f kernel_mtd_2.bin /dev/mtd6
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00a00000...

# nanddump -f sysimg_mtd_2.bin /dev/mtd7
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 2
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x06e00000...

# ls *.bin -lh
-rwxrwxrwx    1 nobody   root       10.0M Dec 22 12:48 kernel_mtd_1.bin
-rwxrwxrwx    1 nobody   root       10.0M Dec 22 12:51 kernel_mtd_2.bin
-rwxrwxrwx    1 nobody   root      110.0M Dec 22 12:50 sysimg_mtd_1.bin
-rwxrwxrwx    1 nobody   root      109.8M Dec 22 12:53 sysimg_mtd_2.bin

Are those the flash partitions you were talking about?
Is that enough for a backup or did I miss something?

Best regards

bugblatterbeast · December 2021

Damn, the format went wrong. But after what happened the last time I don't want to edit a post anymore.

bugblatterbeast · December 2021

Mijzelf said:
But your failing stock OS might tell that there is something wrong with the bootargs. Can you post (or attach) a bootlog?

Attached is a complete log of booting without an sd card. After the boot was complete I've used the console to fix the file that caused most of the error messages, but it didn't make much of a difference. The machine kept on beeping and was not reachable from the network. I've removed repeaded error messages from the log.

Mijzelf · December 2021

About the beeping: Did you disconnect the fan? AFAIK there are 2 reasons why the firmware would start the buzzer, a degraded raid array or a not running fan, and it's not a degraded array here. Anyway, here is the command to stop the buzzer.

Then tftpboot, in barebox the command is a bit different, here you can find it. But according to your bootlog the box is running the stock kernel, so it's no issue here.

About the dumps, from your bootlog I see this list of partitions:

[    3.331725] 0x000000000000-0x000000040000 : "uloader"
[    3.343188] 0x000000040000-0x0000000c0000 : "barebox"
[    3.359381] 0x0000000c0000-0x000000100000 : "env"
[    3.544045] 0x000000000000-0x000000a00000 : "config"
[    3.555619] 0x000000a00000-0x000001400000 : "kernel1"
[    3.571391] 0x000001400000-0x000008200000 : "rootfs1"
[    3.593877] 0x000008200000-0x000008c00000 : "kernel2"
[    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
[    3.621849] 0x00000fa00000-0x000010000000 : "reserved"

If one of the first 2 get damaged, the box might be beyond repair. But you'll never know, the 'uloader' is big enough to be a complete u-boot. Maybe if barebox can't boot you will be thrown back to a u-boot prompt. And the first 3 partitions are on spi flash. That is a tiny little 6 pin chip on the motherboard, and it seems to be possible to reprogram it without de-soldering it using an arduino. So why not backup the first two (/dev/mtd0 and /dev/mtd1)? The 3th (env) is important. It contains the barebox environment. Without it the box won't boot, but barebox will start and give you a prompt. It also contains the flash partition table (as command line argument for the Linux boot). Several people here on the forum have used a backup of my env partition to get their box running again.

'config' isn't important. It contains the user settings, and is erased on a factory reset.

As the whole thing is about 256MB, I'd simply backup mtd0 - mtd8. Better safe than sorry.

Your failing boot is strange. The webserver is complaining about a missing port in the config, and indeed it says 'Listen None' here. And the firmware keeps restarting the webinterface, just for fun.

You should be able to solve the problem (temporary) by editing the file:

sed -i 's/Listen None/Listen 8888/' /etc/service_conf/httpd_zld.conf

But that's only temporary. This file is generated dynamically on boot. So somehow the generation fails. I think it's a good idea to start with a factory reset, to remove all (damaged) user config. A factory reset can be performed from the commandline by executing

/usr/local/btn/reset_and_reboot.sh

A little word about the boot process, in the listing above you can see that the flash address starts over at 0 for 'config'. The reason is that the SoC cannot boot from the cheap 256MiB Nand chip, so ZyXEL added a 1MiB spi flash chip. The cpu automagically reads and executes the first x bytes from the spi chip, that is 'uloader'. uloader knows it has to boot barebox, and where to find it. Barebox contains drivers for the nand flash. It reads it's environment 'env', which contains the boot script, which tells to load the Linux kernel from a certain flash partition ('kernel1' or 'kernel2'), and boot it with certain command line. Linux boots, reads it's command line (which you can find in the bootlog), from which it extracts the flash partition table. It detects&initializes all hardware from which it has the drivers build in, and finally it executes the init executable, which is responsible for all further configuration/startup. The init keeps running. It's pid 1, and if you kill it, the box shuts down.

You have now the dumps of the 2 kernel partitions. Using 'file' you can see what is inside. It should be a 'legacy uImage', but what is interesting is the specified kernel name and timestamp. It will tell if you flashed a custom kernel.

bugblatterbeast · December 2021

Excellent! The stock os is working fine now!!!

/usr/local/btn/reset_and_reboot.sh

Did the trick!

Mijzelf said:

About the beeping: Did you disconnect the fan?

I did. Thanks so much for this info. It was very unnerving to operate like that.

Now I've dumped all nand partition. Once at the initial state, then again with the working stock ok and finally after I've upgraded the firmware to the newest version.

Thanks also for the very valuable information about the boot process. I was indeed wondering, why the partitions overlap. I am still wondering why the 2nd rootfs is slightly smaller than the 1st even though I've dumped exactly the same length.

I'm attaching one of the nand-extraction logs and the fw-upgrade+automatic-restart log.

Can I assume that it's now as safe as it can be to retry installing debian and flashing the kernel again? Because that would be the next thing I'd do.

Kind regards!

Mijzelf · December 2021

bugblatterbeast said:

I am still wondering why the 2nd rootfs is slightly smaller than the 1st even though I've dumped exactly the same length.

The reason can be found in the bootlog:

[    3.481228] Bad eraseblock 1536 at 0x00000c000000
[    3.491399] Bad eraseblock 1537 at 0x00000c020000
[    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
There are two bad eraseblocks, both in the range of rootfs2. A bad eraseblock is not usable, and the nand driver skips it transparently, so the partition is actually smaller. As the address between 1536 and 1537 changes 0x20000, I suppose one block is 128KiB, so your rootfs2 dump is 256KiB smaller.

The mtdblocks 9 and 10 are UBI partitions, which are created on 'config' and the active 'rootfs' partition. That is the reason you couldn't dump them, they are mounted. The rootfs partition is mounted on /firmware/mnt/nand, and when you use 'df' you can see that one or two bad eraseblocks are no problem at all, the partition is used for only 60%.

(BTW, that partition is mounted ro, but nobody stops you if you remount it rw, and store some data on it. It has a normal filesystem (ubifs), and the data will last until you upgrade the firmware twice)

Can I assume that it's now as safe as it can be to retry installing debian and flashing the kernel again? Because that would be the next thing I'd do.

I can't think of anything you have forgotten.

bugblatterbeast · December 2021

Mijzelf said:
The reason can be found in the bootlog:
[    3.481228] Bad eraseblock 1536 at 0x00000c000000
[    3.491399] Bad eraseblock 1537 at 0x00000c020000
[    3.610057] 0x000008c00000-0x00000fa00000 : "rootfs2"
There are two bad eraseblocks, both in the range of rootfs2. A bad eraseblock is not usable, and the nand driver skips it transparently, so the partition is actually smaller. As the address between 1536 and 1537 changes 0x20000, I suppose one block is 128KiB, so your rootfs2 dump is 256KiB smaller.

Ah ofc, that figures. Thanks.

I have a suspicion what might have caused the issue:

I've just install debian and updated the kernel. Everything went fine so far. But after I've changed the hostname, the domain and the workgroup in the openmediavault configuration and rebooted I couldn't connect anymore and I suddenly remembered that the last time, those changes were exactly the last thing I did before it all went wrong. Luckily I had the serial connection this time.

# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 16436
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 0  (Lokale Schleife)
        RX packets 448  bytes 88413 (86.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 448  bytes 88413 (86.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
#

Then I've noticed that openmediavault had "cleaned" my /etc/network/interfaces file.

# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# interfaces(5) file used by ifup(8) and ifdown(8)
# Better use netplan.io or systemd-networkd to configure additional interface stanzas.

# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

All network interfaces were inactive:

</code># lshw -C network
  *-network:0 DISABLED
       description: Ethernet interface
       physical id: 4
       logical name: egiga1
       serial: 08:26:97:78:15:79
       capabilities: ethernet physical
       configuration: broadcast=yes driver=c2000-geth driverversion=1.0 firmware=N/A link=no multicast=yes
  *-network:1 DISABLED
       description: Ethernet interface
       physical id: 5
       logical name: ethip0
       serial: 22:32:93:76:bc:80
       capabilities: ethernet physical
       configuration: broadcast=yes multicast=yes point-to-point=yes
  *-network:2 DISABLED
       description: Ethernet interface
       physical id: 6
       logical name: egiga0
       serial: 08:26:97:78:15:78
       capabilities: ethernet physical
       configuration: broadcast=yes driver=c2000-geth driverversion=1.0 firmware=N/A link=no multicast=yes</pre><div><br></div>Creating configuration files /etc/network/interfaces.d/egiga0<br><pre class="CodeBlock"><code># The primary network interface
allow-hotplug egiga0
iface egiga0 inet dhcp
auto egiga0

and /etc/network/interfaces.d/egiga1

# The secondary network interface
allow-hotplug egiga1
iface egiga1 inet dhcp
auto egiga1

seems to have solved the issue for good. I have rebooted several times now and the system always comes up as it should. Last time this happened I've tried reinstalling debian and flashing the kernel again and that made things even worse.

There is one last tiny thing that is bothering me a little. About two minutes after every reboot there is a long beep with this message on the serial console:

[  123.046271] bz time = 1
[  123.048729] bz status = 1
[  123.051354] bz_timer_status = 0
[  123.054564] start buzzer

I can't find any event that might be causing it.

Best regards

NAS542: How do I establish serial connection?

All Replies

Categories

Consumer Product Help Center