Nas 540 broken after power-outage
Then i wanted to try to fix the NAS by using Mijzelf's RescueSticks (Thank you at this point for all the amazing work you are doing here). After that it didnt boot at all.
On debugging with a serial connection i was able to see that i get a "Bad header checksum".
I was able to fix that with tftp and getting the bootloader right again.
Now when i try to boot the NAS again without HDD's it seems to be working.
But if i want to boot with the drives, it wont start fully.
It seems to indicate broken hard drives but that doesnt make sense to me to have 4 broken drives ("failed command: READ FPDMA QUEUED") at once and not even be able to start the system?
Accepted Solution
-
Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.
Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.
I wrote more about ddrescue in this thread.
0
All Replies
-
More logs:
This one is a normal boot without hard drives:
https://pastebin.com/KDwccPvt
This one with the rescure stick without hard drives:
https://pastebin.com/WQyKMjYp
This one with the rescure stick with hard drives:
https://pastebin.com/wmwYfnSL
0 -
I agree that the odds that all disks together died are low. In your log 'only' ata2 and ata4 show this problem. When you exchange disk 1&2 and disk 3&4, does the problem follow the disks? In that case indeed 2 disks are bad.If it doesn't follow the disks, does the box boot when you remove disk 2 & 4?0
-
This works really well for us, thank you! Facing same issue here. Help is appreciated.0
-
Mijzelf said:I agree that the odds that all disks together died are low. In your log 'only' ata2 and ata4 show this problem. When you exchange disk 1&2 and disk 3&4, does the problem follow the disks? In that case indeed 2 disks are bad.If it doesn't follow the disks, does the box boot when you remove disk 2 & 4?
Mijzelf, I am so sorry to not reply to your answer. I did let the summer get me and left the NAS sitting in the corner.
Your observation seems to be on point. After swapping discs 1&2 and 3&4, I seem to get the same errors, but on ata1 and ata3. This is no good for my data, but I assume that means I got two broken discs at once. Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
I attached the log in case you want to check again.
0 -
Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.
Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.
I wrote more about ddrescue in this thread.
0 -
Mijzelf said:Very unlucky for me as I really wanted to have RAID5 to avoid this exact scenario.
Does that mean you don't have a backup? The real purpose of raid5 is availability, but as you left the box for 6 months I suppose that was not the goal.
Don't know what is wrong with those disks, but if you manage to make a bitwise copy of only one of them, using ddrescue or something like that, you might be able to save your data.
I wrote more about ddrescue in this thread.
Yes, I thought that RAID5 is backup enough for my use case. I did use the NAS a lot when it was working, but did not bother when it was not running.
Surely I would use it more if the firmware would be better, for which you offer a lot of solutions.
I will try to read more into the logs i got.
The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1.
If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.
And i ordered a new drive and will check if ddsecure can help me somehow.
If you have any other tipps or tests i can do, i would be so happy :-)0 -
The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1. If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.I think it has something to do with timing. The webinterface runs from harddisk. At install a firmware partition is created, on which a compressed filesystem blob stored in flash is extracted. When no harddisk is available, a ramdrive is created on which the blob is extracted.When the broken disk is in slot 1, the firmware does not see quick enough there is a problem, and so doesn't create that ramdrive. When the disk is in slot 2, the raid header in slot 1 tells there is a problem, and so there is time for the ramdisk.Or something like that. The firmware is not really bulletproof when it comes to errors.BTW, it's ddrescue, and not ddsecure.1
-
Mijzelf said:The confusing part here for me is that the web interface will not load if i have HDD 1 in slot 1. If i put for example 2134 into the NAS it boots but shows no drives ect like in the post with ddsecure.I think it has something to do with timing. The webinterface runs from harddisk. At install a firmware partition is created, on which a compressed filesystem blob stored in flash is extracted. When no harddisk is available, a ramdrive is created on which the blob is extracted.When the broken disk is in slot 1, the firmware does not see quick enough there is a problem, and so doesn't create that ramdrive. When the disk is in slot 2, the raid header in slot 1 tells there is a problem, and so there is time for the ramdisk.Or something like that. The firmware is not really bulletproof when it comes to errors.BTW, it's ddrescue, and not ddsecure.
Can you PM me that link you mention in the other threads?0 -
After reading a bit more i did now also check with mdadm:
I i understand that correctly, my 4th drive seems to be broken but the rest is ok. I could try to recreate the raid with missing /dev/sdd3 right?
~ # mdadm --examine /dev/sd[abcd]3/dev/sda3:Magic : a92b4efcVersion : 1.2Feature Map : 0x0Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aaeName : NAS540:2 (local to host NAS540)Creation Time : Sat Mar 12 15:10:28 2016Raid Level : raid5Raid Devices : 4Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)Array Size : 17569172928 (16755.27 GiB 17990.83 GB)Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)Data Offset : 262144 sectorsSuper Offset : 8 sectorsState : cleanDevice UUID : 9b3cc732:bf5ce9e1:985834fb:039acca0Update Time : Fri Oct 21 00:18:39 2022Checksum : 9ec3787e - correctEvents : 31189Layout : left-symmetricChunk Size : 64KDevice Role : Active device 0Array State : A.A. ('A' == active, '.' == missing)/dev/sdb3:Magic : a92b4efcVersion : 1.2Feature Map : 0x0Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aaeName : NAS540:2 (local to host NAS540)Creation Time : Sat Mar 12 15:10:28 2016Raid Level : raid5Raid Devices : 4Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)Array Size : 17569172928 (16755.27 GiB 17990.83 GB)Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)Data Offset : 262144 sectorsSuper Offset : 8 sectorsState : cleanDevice UUID : 1a92a04e:9edf8c1b:5ca9cebe:9bb47bcbUpdate Time : Fri Oct 21 00:17:57 2022Checksum : e28cd5e4 - correctEvents : 31188Layout : left-symmetricChunk Size : 64KDevice Role : Active device 1Array State : AAA. ('A' == active, '.' == missing)/dev/sdc3:Magic : a92b4efcVersion : 1.2Feature Map : 0x0Array UUID : 6c9f75bc:413133d9:db34176b:a5f25aaeName : NAS540:2 (local to host NAS540)Creation Time : Sat Mar 12 15:10:28 2016Raid Level : raid5Raid Devices : 4Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)Array Size : 17569172928 (16755.27 GiB 17990.83 GB)Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)Data Offset : 262144 sectorsSuper Offset : 8 sectorsState : cleanDevice UUID : b1c807f7:c9aa530f:644f2e15:f7e1fec0Update Time : Fri Oct 21 00:18:39 2022Checksum : ca9a915f - correctEvents : 31189Layout : left-symmetricChunk Size : 64KDevice Role : Active device 2Array State : A.A. ('A' == active, '.' == missing)mdadm: No md superblock detected on /dev/sdd3.
I tried to recreate the array but got:~ # cat /proc/mdstatPersonalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]md2 : active raid5 sda3[0] sdc3[2] sdb3[1](F)17569172928 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [U_U_]md1 : active raid1 sda2[0] sdc2[2] sdb2[1]1998784 blocks super 1.2 [4/3] [UUU_]md0 : active raid1 sda1[4] sdc1[2]1997760 blocks super 1.2 [4/2] [U_U_]unused devices: <none>~ # mdadm --stop /dev/md2mdadm: Cannot get exclusive access to /dev/md2:Perhaps a running process, mounted filesystem or active volume group?0 -
I pm'd you a link to a compatible ddrescue. About the re-creation of the array, I suppose you mean re-assembling? Your headers show a creation time in 2016, which would be changed when you re-created them.As you can see sdb3 shows an array state of AAA., while both sda3 and sdc3 show an A.A. . So sda and sdc tell sdb was dropped. Sdb doesn't 'know' about that, because he wasn't updated after being dropped. Mdadm won't put sdb back in the array, because sda and sdc disagree.Maybe there is a way to force sdb back in the existing array, but I'm not aware of that. The only way I know is to create a new array around the existing payload.Having said that, sdb3 is updated last night, so it looks like it *was* back in the array at Fri Oct 21 00:17:57 2022, and was dropped again at Fri Oct 21 00:18:39 2022.How did you manage to do so?Anyway, when it was dropped within a minute, that disk has serious problems, and so it's not a good candidate for a healthy array. A copy is needed.
0
Categories
- All Categories
- 415 Beta Program
- 2.4K Nebula
- 145 Nebula Ideas
- 94 Nebula Status and Incidents
- 5.6K Security
- 239 USG FLEX H Series
- 267 Security Ideas
- 1.4K Switch
- 71 Switch Ideas
- 1.1K Wireless
- 40 Wireless Ideas
- 6.3K Consumer Product
- 247 Service & License
- 384 News and Release
- 83 Security Advisories
- 29 Education Center
- 10 [Campaign] Zyxel Network Detective
- 3.2K FAQ
- 34 Documents
- 34 Nebula Monthly Express
- 83 About Community
- 71 Security Highlight