Problem with recovery of RAID 5 on NAS 542

Hello,

I have a problem with RAID 5 recovery on NAS 542. I had RAID 5 consisting of four 2TB disks. First I replaced one 2TB drive with a 4TB one and waited for the data to resynchronize and the NAS to work. Then I took another 2TB drive and replaced it with another 4TB drive. (I formatted the first 2TB disk, that was a mistake) And waited a few days, but I didn't get into working condition anymore.
The NAS has started a trap that RAID is degraded and fails to recover.

The original set of disks was 4x 2TB disks. Now I have 2x 2TB and 2x 4TB. The data on the NAS is corrupted and cannot be read.

When I try to repair RAID, I see that I am working with disks in positions 1, 2 and 4, and when I add a third disk, the repair fails and RAID throws back into a degraded state.

Note: What seems strange to me is that I exchanged disks in positions 4 and 3. And when I look at the logs, I see a DISK 2 IO error there.

Is there a way to recover RAID?

Thanks in advance.

Answers

  • Mijzelf
    Mijzelf Posts: 1,799  Guru Member
    The data on the NAS is corrupted.
    What do you mean by that?
    And when I look at the logs, I see a DISK 2 IO error there.
    A disk I/O error while recovering a degraded array is always fatal for the recovery. The raid manager stops because there is no way to continue.
    After you exchanged the first disk (which one?) the array was healthy? I suppose yes, else you wouldn't have exchanged the 2nd one.

    Anyway, can you login over ssh, and post the output of
    su
    mdadm --examine /dev/sd[abcd]3
    cat /proc/mdstat


  • xstor
    xstor Posts: 3
    Mijzelf said:
    The data on the NAS is corrupted.
    What do you mean by that?
    And when I look at the logs, I see a DISK 2 IO error there.
    A disk I/O error while recovering a degraded array is always fatal for the recovery. The raid manager stops because there is no way to continue.
    After you exchanged the first disk (which one?) the array was healthy? I suppose yes, else you wouldn't have exchanged the 2nd one.

    Anyway, can you login over ssh, and post the output of
    su
    mdadm --examine /dev/sd[abcd]3
    cat /proc/mdstat


    By NAS is corrupted i mean that almost all the files I have on the NAS are damaged, I can't open them. They have a reduced size, etc.

    Firstly i changed disk on 4 position let it synchronize and the raid was healthy.  So  i format the disk and continue with the disk on 3 position. after i replace the disk on 3 position i let it synchronize and format the 3 disk (Mistake :( ). And after some time the synchronization finished with error.  

    So i try to reapir it and it always fail. So i checked the log and saw: "Detected Disk2 I/O error"


    Output of the commands through ssh: 

    ~ # mdadm --examine /dev/sd[abcd]3
    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298
               Name : NAS542:2
      Creation Time : Fri Sep 22 16:08:07 2017
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)
         Array Size : 5848151040 (5577.23 GiB 5988.51 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 802bdbf8:d7919097:4f90f1db:4cd1d776

        Update Time : Fri Nov 26 14:35:20 2021
           Checksum : 95a03e9b - correct
             Events : 82655

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 0
       Array State : A..A ('A' == active, '.' == missing)
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298
               Name : NAS542:2
      Creation Time : Fri Sep 22 16:08:07 2017
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)
         Array Size : 5848151040 (5577.23 GiB 5988.51 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : active
        Device UUID : 2bead52a:476c845b:52b6e5e1:1d0787e4

        Update Time : Fri Nov 26 14:31:09 2021
           Checksum : ff35320c - correct
             Events : 82525

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 1
       Array State : AA.A ('A' == active, '.' == missing)
    mdadm: cannot open /dev/sdc3: No such device or address
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : b3e06031:85e1e9bb:53ade68f:efaf9298
               Name : NAS542:2
      Creation Time : Fri Sep 22 16:08:07 2017
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 5848151040 (5577.23 GiB 5988.51 GB)
      Used Dev Size : 3898767360 (1859.08 GiB 1996.17 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : bca2c55b:38f565ac:e95b58ff:8812835b

        Update Time : Fri Nov 26 14:35:20 2021
           Checksum : fe525612 - correct
             Events : 82655

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 3
       Array State : A..A ('A' == active, '.' == missing)
    ~ # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
    md2 : active raid5 sda3[0] sdd3[4] sdb3[1](F)
          5848151040 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [U__U]

    md1 : active raid1 sda2[6] sdd2[4] sdb2[5]
          1998784 blocks super 1.2 [4/3] [UU_U]

    md0 : active raid1 sda1[6] sdd1[4] sdb1[5]
          1997760 blocks super 1.2 [4/3] [UU_U]

    unused devices: <none>
    ~ #


    Thanks for your time. 


  • Mijzelf
    Mijzelf Posts: 1,799  Guru Member
    That doesn't look nice. It seems your disk 2 (sdb) has developed an I/O problem, after you exchanged the first disk. When an I/O error occurs, the disk is dropped from the array. When the array was already degraded, it will be down. And yours is down, only 2 disks are left in the array.
    With a trick it is possible to add disk 2 again, problem is that it will be dropped again as soon as the I/O error reoccurs. So adding a 4th disk is not possible.
    The clean solution is to make a bit-by-bit copy of disk 2 to a new disk, using something like ddrescue. The copy will contain soft error(s) as at least one sector of disk 2 is not readable, but no longer an I/O error. So this disk can be re-inserted in the array, using some commandline magic, after which the 4th disk can be added, to regain redundancy.

    However there is something I don't understand. You write the filesystem is corrupted, but as the array is down, there is no volume, and so no filesystem. If the corruption showed up before the I/O error, either the disk failed silently, as in without telling upstream it couldn't read it's sector anymore, which is very bad. If the corruption showed up after the I/O error, you were either looking at some local cache in your client, or I misinterpreted the data I have got.

    Do I understand correctly that you formatted both the original disk 3 and 4, and that only the new disk 4 succeeded the rebuild?
  • xstor
    xstor Posts: 3
    Do I understand correctly that you formatted both the original disk 3 and 4, and that only the new disk 4 succeeded the rebuild?
      Yes.


    However there is something I don't understand. You write the filesystem is corrupted, but as the array is down, there is no volume, and so no filesystem. If the corruption showed up before the I/O error, either the disk failed silently, as in without telling upstream it couldn't read it's sector anymore, which is very bad. If the corruption showed up after the I/O error, you were either looking at some local cache in your client, or I misinterpreted the data I have got.


    I see the filesystems and files inside it (I covered the file names with a white block). I can also download the data, but I cannot open most of them, because they are corrupted. 



    There is a system log from the NAS




    With a trick it is possible to add disk 2 again, problem is that it will be dropped again as soon as the I/O error reoccurs. So adding a 4th disk is not possible.
    The clean solution is to make a bit-by-bit copy of disk 2 to a new disk, using something like ddrescue. The copy will contain soft error(s) as at least one sector of disk 2 is not readable, but no longer an I/O error. So this disk can be re-inserted in the array, using some commandline magic, after which the 4th disk can be added, to regain redundancy.

    I will try to do a bit-by-bit copy. I need to recover the photos, they are the most important to me. I've already bought new disks to replace remaining old ones, so if I can recover the data. I will make a backup and replace the disks and create a new RAID.

    Thank you for you time.

    I'll let you know when the bit-by-bit copy will be ready.