NAS542 Raid5 Degraded

ValentinIstoc · January 2022

Hello,
I have raid degraded, I replace disk 2, follow the steps to repair it, after a few minutes of loading took me back to the repair page. I check and this is my status:

~ # mdadm --examine /dev/sd[abcd]3

/dev/sda3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : ba107b29:c0145772:da954bd1:e8ed408c

Name : NasProductie:2 (local to host NasProductie)

Creation Time : Wed May 2 12:23:26 2018

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : active

Device UUID : 25519dae:6ffb4334:78ac2f19:040a1bc1

Update Time : Fri Jan 28 20:47:48 2022

Checksum : bbe1e0e2 - correct

Events : 20244

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 0

Array State : AAAA ('A' == active, '.' == missing)

/dev/sdb3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : ba107b29:c0145772:da954bd1:e8ed408c

Name : NasProductie:2 (local to host NasProductie)

Creation Time : Wed May 2 12:23:26 2018

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 98cda793:3772a83f:396cc157:68bfdcbb

Update Time : Sat Jan 29 11:05:35 2022

Checksum : e5a7135e - correct

Events : 20517

Layout : left-symmetric

Chunk Size : 64K

Device Role : spare

Array State : ..AA ('A' == active, '.' == missing)

/dev/sdc3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : ba107b29:c0145772:da954bd1:e8ed408c

Name : NasProductie:2 (local to host NasProductie)

Creation Time : Wed May 2 12:23:26 2018

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : c9f527ed:fe6bbd25:c086f5a3:c58af25c

Update Time : Sat Jan 29 11:05:35 2022

Checksum : 12858769 - correct

Events : 20517

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 2

Array State : ..AA ('A' == active, '.' == missing)

/dev/sdd3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : ba107b29:c0145772:da954bd1:e8ed408c

Name : NasProductie:2 (local to host NasProductie)

Creation Time : Wed May 2 12:23:26 2018

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569172928 (16755.27 GiB 17990.83 GB)

Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 882a0bb0:cc7ccefa:29f17a31:15af39f6

Update Time : Sat Jan 29 11:05:35 2022

Checksum : d1465bb0 - correct

Events : 20517

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 3

Array State : ..AA ('A' == active, '.' == missing)

~ #

Some advice please ?

Mijzelf · January 2022

You have got a problem. Either you exchanged the wrong disk, or 2 disks were already dropped when you exchanged the disk.

/dev/sda3:
    Update Time : Fri Jan 28 20:47:48 2022
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdc3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdd3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)

Disk 1 was last updated Jan 28 20:47, and at that moment the array was healthy. Disk 2,3 and 4 were updated at Jan 29 11:05, and the array has only 2 members left. So disk 2 was added as spare, as 2 disks is not enough to add an active member.

So disk 1 failed first, as it's 'Array State' was never upgraded. Maybe disk 2 also failed, maybe not. Was the array degraded or down, when you exchanged the disk?

Mijzelf · February 2022

and initialize it,

What do you mean by that?

What you are proposing is pretty harmless, as long as you don't delete and/or recreate volumes using the webinterface. I don't think you will be able to get any data from the NAS, as the array is down, and won't automagically be up again. The raid headers tell the raid manager they don't belong to the same array anymore.

From the commandline it is possible to bring the array up (by re-creating it without touching the content), but as long as the original unreadable sector is there, the array will go down as soon as it's accessed. Very inconvenient when you are trying to backup.

Mijzelf · January 2022

You have got a problem. Either you exchanged the wrong disk, or 2 disks were already dropped when you exchanged the disk.

/dev/sda3:
    Update Time : Fri Jan 28 20:47:48 2022
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdc3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdd3:
    Update Time : Sat Jan 29 11:05:35 2022
   Array State : ..AA ('A' == active, '.' == missing)

Disk 1 was last updated Jan 28 20:47, and at that moment the array was healthy. Disk 2,3 and 4 were updated at Jan 29 11:05, and the array has only 2 members left. So disk 2 was added as spare, as 2 disks is not enough to add an active member.

So disk 1 failed first, as it's 'Array State' was never upgraded. Maybe disk 2 also failed, maybe not. Was the array degraded or down, when you exchanged the disk?

ValentinIstoc · January 2022

On the 27th I changed disc 2. I read some articles here on the forum and on the 28th I proceeded to repair the raid from the web interface. After about 30 minutes I started to hear the beeps again, it took me out of the web interface and at the storage manager I had the option to repair the raid again. I left it like that overnight, did some more searches on the 29th and started running some SSH commands to see the raid status. Is it possible that disk 1 also crashed while repairing the raid? What can I do in this situation? I'm thinking of inserting disk 2 (the old one) and maybe I have a chance to repair the raid with disk 1. What would be the solution? I have important data there.

Mijzelf · January 2022

So the rebuild for disk 2 started on the 27th and completed. Then disk 2 was dropped again, and while rebuilding disk 1 was dropped after Fri Jan 28 20:47 (UTC, I think), leaving you in your current situation.

It is important to know that the disks are not crashed, they just have one or more unreadable sectors. The raid manager drops a member as soon as an I/O error occurs, which is in many cases an unreadable sector.

It is possible to recreate your array using the original 4 disks. Problem is that the unreadable sector is still unreadable, so sooner or later this will hit you again.

The solution is to create a bit-by-bit copy on a new disk. The unreadable sector cannot be copied, so it will be filled with zero's on the copy. If that is a problem depends on the function of that sector.

You have got 5 disks: A, B1, B2, C and D, where A failed during the 2nd rebuild, B1 was dropped first, and B2 was dropped soon after the 1 rebuild. C and D are healthy, as far as we know.

It's a bit strange that B2 was dropped soon after the 1st rebuild. Is that a new disk? Have you looked at it's SMART values?

Anyway, I think B1 is most out of sync. It was dropped on the 27th, and all changes to the filesystem here after are not on B1. When A was dropped, the array was down, so A should be up-to-date.

I think you should try to create a bit-by-bit backup of A, and then create a degraded array of A, C and D. Then you can add a 4th disk to get redundancy back.

The procedure to create the bit-by-bit copy:

Remove all disks except A and plug a new disk in. Then execute

cat /proc/partitions

or

mdadm --examine /dev/sd[ab]3

to make sure disk A is still disk /dev/sda and the new disk is /dev/sdb

Download this 3 files and put them on an USB stick, and plug it in.

Execute

cd /e-data/<some-hex-code>/

./fix-ld-linux.sh

./screen

<enter>

./ddrescue /dev/sda /dev/sbb ./logfile

This will copy disk /dev/sda to /dev/sdb, and skip unreadable sectors. Make sure sda is disk A, and sdb is the new disk. When copying is busy, you can close your ssh session. Later you can get your session back with

cd /e-data/<some-hex-code>/

./screen -r

(That is the function of screen). When copying is done,

mdadm --examine /dev/sd[ab]3

should show 2 identical headers.

When that is completed, let's talk about recreating the array.

I have important data there.

By now it's clear that you should have a backup. And raid is not a backup.

ValentinIstoc · February 2022

Thanks. There is a lot of information and I try to understand the steps. What I've done in the meantime. Yesterday, I connected the NAS to a network where I had space to back up data. I turned on the NAS and let it work. I wait until it is fully initialized and reread the status. If I can't access the data, I will reinsert disk 2 (which has been replaced) and initialize it, check the status, and try to copy all the data if I can access it (or at least the critical ones). ). If I still do not have a solution, I will use the steps presented by you.

Do you think it's okay to continue? Are there any risks that I do not anticipate at this time due to lack of experience in such issues? Or do you think I should go straight to the steps?

Thank you very much for your time and information.

I'll be back with a status.

PS Related to back-up. Now I realize. I relied on the redundancy of a disk.

Mijzelf · February 2022

and initialize it,

What do you mean by that?

What you are proposing is pretty harmless, as long as you don't delete and/or recreate volumes using the webinterface. I don't think you will be able to get any data from the NAS, as the array is down, and won't automagically be up again. The raid headers tell the raid manager they don't belong to the same array anymore.

From the commandline it is possible to bring the array up (by re-creating it without touching the content), but as long as the original unreadable sector is there, the array will go down as soon as it's accessed. Very inconvenient when you are trying to backup.

NAS542 Raid5 Degraded

Best Answers

All Replies

Categories

Consumer Product Help Center