Problem with recovery of RAID 5 on NAS 542

xstor · November 2021

Hello,

I have a problem with RAID 5 recovery on NAS 542. I had RAID 5 consisting of four 2TB disks. First I replaced one 2TB drive with a 4TB one and waited for the data to resynchronize and the NAS to work. Then I took another 2TB drive and replaced it with another 4TB drive. (I formatted the first 2TB disk, that was a mistake) And waited a few days, but I didn't get into working condition anymore.

The NAS has started a trap that RAID is degraded and fails to recover.

The original set of disks was 4x 2TB disks. Now I have 2x 2TB and 2x 4TB. The data on the NAS is corrupted and cannot be read.

When I try to repair RAID, I see that I am working with disks in positions 1, 2 and 4, and when I add a third disk, the repair fails and RAID throws back into a degraded state.

Note: What seems strange to me is that I exchanged disks in positions 4 and 3. And when I look at the logs, I see a DISK 2 IO error there.

Is there a way to recover RAID?

Thanks in advance.

Mijzelf · November 2021

The data on the NAS is corrupted.

What do you mean by that?

And when I look at the logs, I see a DISK 2 IO error there.

A disk I/O error while recovering a degraded array is always fatal for the recovery. The raid manager stops because there is no way to continue.

After you exchanged the first disk (which one?) the array was healthy? I suppose yes, else you wouldn't have exchanged the 2nd one.

Anyway, can you login over ssh, and post the output of

su

mdadm --examine /dev/sd[abcd]3

cat /proc/mdstat

xstor · November 2021

Mijzelf said:

The data on the NAS is corrupted.
What do you mean by that?
And when I look at the logs, I see a DISK 2 IO error there.
A disk I/O error while recovering a degraded array is always fatal for the recovery. The raid manager stops because there is no way to continue.
After you exchanged the first disk (which one?) the array was healthy? I suppose yes, else you wouldn't have exchanged the 2nd one.

Anyway, can you login over ssh, and post the output of
su
mdadm --examine /dev/sd[abcd]3
cat /proc/mdstat

By NAS is corrupted i mean that almost all the files I have on the NAS are damaged, I can't open them. They have a reduced size, etc.

Firstly i changed disk on 4 position let it synchronize and the raid was healthy. So i format the disk and continue with the disk on 3 position. after i replace the disk on 3 position i let it synchronize and format the 3 disk (Mistake

). And after some time the synchronization finished with error.

So i try to reapir it and it always fail. So i checked the log and saw: "Detected Disk2 I/O error"

Output of the commands through ssh:

~ # mdadm --examine /dev/sd[abcd]3

/dev/sda3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298

Name : NAS542:2

Creation Time : Fri Sep 22 16:08:07 2017

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)

Array Size : 5848151040 (5577.23 GiB 5988.51 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 802bdbf8:d7919097:4f90f1db:4cd1d776

Update Time : Fri Nov 26 14:35:20 2021

Checksum : 95a03e9b - correct

Events : 82655

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 0

Array State : A..A ('A' == active, '.' == missing)

/dev/sdb3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298

Name : NAS542:2

Creation Time : Fri Sep 22 16:08:07 2017

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)

Array Size : 5848151040 (5577.23 GiB 5988.51 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : active

Device UUID : 2bead52a:476c845b:52b6e5e1:1d0787e4

Update Time : Fri Nov 26 14:31:09 2021

Checksum : ff35320c - correct

Events : 82525

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 1

Array State : AA.A ('A' == active, '.' == missing)

mdadm: cannot open /dev/sdc3: No such device or address

/dev/sdd3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298

Name : NAS542:2

Creation Time : Fri Sep 22 16:08:07 2017

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)

Array Size : 5848151040 (5577.23 GiB 5988.51 GB)

Used Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : bca2c55b:38f565ac:e95b58ff:8812835b

Update Time : Fri Nov 26 14:35:20 2021

Checksum : fe525612 - correct

Events : 82655

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 3

Array State : A..A ('A' == active, '.' == missing)

~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md2 : active raid5 sda3[0] sdd3[4] sdb3[1](F)

5848151040 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [U__U]

md1 : active raid1 sda2[6] sdd2[4] sdb2[5]

1998784 blocks super 1.2 [4/3] [UU_U]

md0 : active raid1 sda1[6] sdd1[4] sdb1[5]

1997760 blocks super 1.2 [4/3] [UU_U]

unused devices: <none>

~ #

Thanks for your time.

Mijzelf · November 2021

That doesn't look nice. It seems your disk 2 (sdb) has developed an I/O problem, after you exchanged the first disk. When an I/O error occurs, the disk is dropped from the array. When the array was already degraded, it will be down. And yours is down, only 2 disks are left in the array.

With a trick it is possible to add disk 2 again, problem is that it will be dropped again as soon as the I/O error reoccurs. So adding a 4th disk is not possible.

The clean solution is to make a bit-by-bit copy of disk 2 to a new disk, using something like ddrescue. The copy will contain soft error(s) as at least one sector of disk 2 is not readable, but no longer an I/O error. So this disk can be re-inserted in the array, using some commandline magic, after which the 4th disk can be added, to regain redundancy.

However there is something I don't understand. You write the filesystem is corrupted, but as the array is down, there is no volume, and so no filesystem. If the corruption showed up before the I/O error, either the disk failed silently, as in without telling upstream it couldn't read it's sector anymore, which is very bad. If the corruption showed up after the I/O error, you were either looking at some local cache in your client, or I misinterpreted the data I have got.

Do I understand correctly that you formatted both the original disk 3 and 4, and that only the new disk 4 succeeded the rebuild?

xstor · November 2021

Do I understand correctly that you formatted both the original disk 3 and 4, and that only the new disk 4 succeeded the rebuild?

Yes.

However there is something I don't understand. You write the filesystem is corrupted, but as the array is down, there is no volume, and so no filesystem. If the corruption showed up before the I/O error, either the disk failed silently, as in without telling upstream it couldn't read it's sector anymore, which is very bad. If the corruption showed up after the I/O error, you were either looking at some local cache in your client, or I misinterpreted the data I have got.

I see the filesystems and files inside it (I covered the file names with a white block). I can also download the data, but I cannot open most of them, because they are corrupted.

Image: https://us.v-cdn.net/6029482/uploads/editor/38/mub3x4xe68jd.jpg

There is a system log from the NAS

Image: https://us.v-cdn.net/6029482/uploads/editor/3j/azp8jo1dmn44.png

With a trick it is possible to add disk 2 again, problem is that it will be dropped again as soon as the I/O error reoccurs. So adding a 4th disk is not possible.
The clean solution is to make a bit-by-bit copy of disk 2 to a new disk, using something like ddrescue. The copy will contain soft error(s) as at least one sector of disk 2 is not readable, but no longer an I/O error. So this disk can be re-inserted in the array, using some commandline magic, after which the 4th disk can be added, to regain redundancy.

I will try to do a bit-by-bit copy. I need to recover the photos, they are the most important to me. I've already bought new disks to replace remaining old ones, so if I can recover the data. I will make a backup and replace the disks and create a new RAID.

Thank you for you time.

I'll let you know when the bit-by-bit copy will be ready.

Problem with recovery of RAID 5 on NAS 542

All Replies

Categories

Consumer Product Help Center