Nas 540 broken after power-outage

Fever · April 2022

I had a power outage and after this incidend my NAS 540 kept beeping.
Then i wanted to try to fix the NAS by using Mijzelf's RescueSticks (Thank you at this point for all the amazing work you are doing here). After that it didnt boot at all.
On debugging with a serial connection i was able to see that i get a "Bad header checksum".
I was able to fix that with tftp and getting the bootloader right again.
Now when i try to boot the NAS again without HDD's it seems to be working.
But if i want to boot with the drives, it wont start fully.
It seems to indicate broken hard drives but that doesnt make sense to me to have 4 broken drives ("failed command: READ FPDMA QUEUED") at once and not even be able to start the system?

Mijzelf · October 2022

Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.

Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.

Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.

I wrote more about ddrescue in this thread.

Fever · April 2022

More logs:
This one is a normal boot without hard drives:
https://pastebin.com/KDwccPvt

This one with the rescure stick without hard drives:
https://pastebin.com/WQyKMjYp

This one with the rescure stick with hard drives:
https://pastebin.com/wmwYfnSL

Mijzelf · April 2022

I agree that the odds that all disks together died are low. In your log 'only' ata2 and ata4 show this problem. When you exchange disk 1&2 and disk 3&4, does the problem follow the disks? In that case indeed 2 disks are bad.

If it doesn't follow the disks, does the box boot when you remove disk 2 & 4?

Fever · October 2022

Mijzelf said:

I agree that the odds that all disks together died are low. In your log 'only' ata2 and ata4 show this problem. When you exchange disk 1&2 and disk 3&4, does the problem follow the disks? In that case indeed 2 disks are bad.
If it doesn't follow the disks, does the box boot when you remove disk 2 & 4?

Mijzelf, I am so sorry to not reply to your answer. I did let the summer get me and left the NAS sitting in the corner.
Your observation seems to be on point. After swapping discs 1&2 and 3&4, I seem to get the same errors, but on ata1 and ata3. This is no good for my data, but I assume that means I got two broken discs at once. Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
I attached the log in case you want to check again.

Mijzelf · October 2022

Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.

Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.

Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.

I wrote more about ddrescue in this thread.

Fever · October 2022

Mijzelf said:

Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.
Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.
I wrote more about ddrescue in this thread.

Yes, I thought that RAID5 is backup enough for my use case. I did use the NAS a lot when it was working, but did not bother when it was not running.
Surely I would use it more if the firmware would be better, for which you offer a lot of solutions.
I will try to read more into the logs i got.
The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1.
If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.

And i ordered a new drive and will check if ddsecure can help me somehow.
If you have any other tipps or tests i can do, i would be so happy :-)

Mijzelf · October 2022

The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1. If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.

I think it has something to do with timing. The webinterface runs from harddisk. At install a firmware partition is created, on which a compressed filesystem blob stored in flash is extracted. When no harddisk is available, a ramdrive is created on which the blob is extracted.

When the broken disk is in slot 1, the firmware does not see quick enough there is a problem, and so doesn't create that ramdrive. When the disk is in slot 2, the raid header in slot 1 tells there is a problem, and so there is time for the ramdisk.

Or something like that. The firmware is not really bulletproof when it comes to errors.

BTW, it's ddrescue, and not ddsecure.

Fever · October 2022

Mijzelf said:

The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1. If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.
I think it has something to do with timing. The webinterface runs from harddisk. At install a firmware partition is created, on which a compressed filesystem blob stored in flash is extracted. When no harddisk is available, a ramdrive is created on which the blob is extracted.
When the broken disk is in slot 1, the firmware does not see quick enough there is a problem, and so doesn't create that ramdrive. When the disk is in slot 2, the raid header in slot 1 tells there is a problem, and so there is time for the ramdisk.
Or something like that. The firmware is not really bulletproof when it comes to errors.
BTW, it's ddrescue, and not ddsecure.

If I understand that right, i need to ask you for the compiled ddrescue file?
Can you PM me that link you mention in the other threads?

Fever · October 2022

After reading a bit more i did now also check with mdadm:
I i understand that correctly, my 4th drive seems to be broken but the rest is ok. I could try to recreate the raid with missing /dev/sdd3 right?

~ # mdadm --examine /dev/sd[abcd]3

/dev/sda3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aae

Name : NAS540:2 (local to host NAS540)

Creation Time : Sat Mar 12 15:10:28 2016

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 9b3cc732:bf5ce9e1:985834fb:039acca0

Update Time : Fri Oct 21 00:18:39 2022

Checksum : 9ec3787e - correct

Events : 31189

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 0

Array State : A.A. ('A' == active, '.' == missing)

/dev/sdb3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aae

Name : NAS540:2 (local to host NAS540)

Creation Time : Sat Mar 12 15:10:28 2016

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 1a92a04e:9edf8c1b:5ca9cebe:9bb47bcb

Update Time : Fri Oct 21 00:17:57 2022

Checksum : e28cd5e4 - correct

Events : 31188

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 1

Array State : AAA. ('A' == active, '.' == missing)

/dev/sdc3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aae

Name : NAS540:2 (local to host NAS540)

Creation Time : Sat Mar 12 15:10:28 2016

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : b1c807f7:c9aa530f:644f2e15:f7e1fec0

Update Time : Fri Oct 21 00:18:39 2022

Checksum : ca9a915f - correct

Events : 31189

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 2

Array State : A.A. ('A' == active, '.' == missing)

mdadm: No md superblock detected on /dev/sdd3.

I tried to recreate the array but got:

~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md2 : active raid5 sda3[0] sdc3[2] sdb3[1](F)

17569172928 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [U_U_]

md1 : active raid1 sda2[0] sdc2[2] sdb2[1]

1998784 blocks super 1.2 [4/3] [UUU_]

md0 : active raid1 sda1[4] sdc1[2]

1997760 blocks super 1.2 [4/2] [U_U_]

unused devices: <none>

~ # mdadm --stop /dev/md2

mdadm: Cannot get exclusive access to /dev/md2:Perhaps a running process, mounted filesystem or active volume group?

Mijzelf · October 2022

I pm'd you a link to a compatible ddrescue. About the re-creation of the array, I suppose you mean re-assembling? Your headers show a creation time in 2016, which would be changed when you re-created them.

As you can see sdb3 shows an array state of AAA., while both sda3 and sdc3 show an A.A. . So sda and sdc tell sdb was dropped. Sdb doesn't 'know' about that, because he wasn't updated after being dropped. Mdadm won't put sdb back in the array, because sda and sdc disagree.

Maybe there is a way to force sdb back in the existing array, but I'm not aware of that. The only way I know is to create a new array around the existing payload.

Having said that, sdb3 is updated last night, so it looks like it *was* back in the array at Fri Oct 21 00:17:57 2022, and was dropped again at Fri Oct 21 00:18:39 2022.

How did you manage to do so?

Anyway, when it was dropped within a minute, that disk has serious problems, and so it's not a good candidate for a healthy array. A copy is needed.

Fever · October 2022

Mijzelf said:

I pm'd you a link to a compatible ddrescue. About the re-creation of the array, I suppose you mean re-assembling? Your headers show a creation time in 2016, which would be changed when you re-created them.
As you can see sdb3 shows an array state of AAA., while both sda3 and sdc3 show an A.A. . So sda and sdc tell sdb was dropped. Sdb doesn't 'know' about that, because he wasn't updated after being dropped. Mdadm won't put sdb back in the array, because sda and sdc disagree.
Maybe there is a way to force sdb back in the existing array, but I'm not aware of that. The only way I know is to create a new array around the existing payload.
Having said that, sdb3 is updated last night, so it looks like it *was* back in the array at Fri Oct 21 00:17:57 2022, and was dropped again at Fri Oct 21 00:18:39 2022.
How did you manage to do so?
Anyway, when it was dropped within a minute, that disk has serious problems, and so it's not a good candidate for a healthy array. A copy is needed.

Thank you for the link :-)

I wish I knew why sdb was updatet yesterday. I tried to start the NAS with one new disk as sdd and was hopping that it could recover on its own. But I did not tell the system to do anything (yet) besides power on and power off.

It seems like drive 4 (ata4) is working (somehow) but it makes no sense at that rate....

~ # ddrescue /dev/sdb /dev/sda /tmp/mapfile --force

GNU ddrescue 1.26

Press Ctrl-C to interrupt

ipos: 72482 kB, non-trimmed: 61440 B, current rate: 21845 B/s

opos: 72482 kB, non-scraped: 0 B, average rate: 25031 B/s

non-tried: 6001 GB, bad-sector: 0 B, error rate: 0 B/s

rescued: 12390 kB, bad areas: 0, run time: 8m 15s

pct rescued: 0.00%, read errors: 1, remaining time: 3206d 7h

time since last successful read: 0s

Copying non-tried blocks... Pass 1 (forwards)

Nas 540 broken after power-outage

Accepted Solution

All Replies

Categories

Consumer Product Help Center