NAS542 Raid5 Degraded

Options
Hello,
I have raid degraded, I replace disk 2, follow the steps to repair it, after a few minutes of loading took me back to the repair page. I check and this is my status: 

~ # mdadm --examine /dev/sd[abcd]3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ba107b29:c0145772:da954bd1:e8ed408c
           Name : NasProductie:2  (local to host NasProductie)
  Creation Time : Wed May  2 12:23:26 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
     Array Size : 17569172928 (16755.27 GiB 17990.83 GB)
  Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 25519dae:6ffb4334:78ac2f19:040a1bc1

    Update Time : Fri Jan 28 20:47:48 2022
       Checksum : bbe1e0e2 - correct
         Events : 20244

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ba107b29:c0145772:da954bd1:e8ed408c
           Name : NasProductie:2  (local to host NasProductie)
  Creation Time : Wed May  2 12:23:26 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
     Array Size : 17569172928 (16755.27 GiB 17990.83 GB)
  Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 98cda793:3772a83f:396cc157:68bfdcbb

    Update Time : Sat Jan 29 11:05:35 2022
       Checksum : e5a7135e - correct
         Events : 20517

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ba107b29:c0145772:da954bd1:e8ed408c
           Name : NasProductie:2  (local to host NasProductie)
  Creation Time : Wed May  2 12:23:26 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
     Array Size : 17569172928 (16755.27 GiB 17990.83 GB)
  Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c9f527ed:fe6bbd25:c086f5a3:c58af25c

    Update Time : Sat Jan 29 11:05:35 2022
       Checksum : 12858769 - correct
         Events : 20517

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : ..AA ('A' == active, '.' == missing)
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ba107b29:c0145772:da954bd1:e8ed408c
           Name : NasProductie:2  (local to host NasProductie)
  Creation Time : Wed May  2 12:23:26 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)
     Array Size : 17569172928 (16755.27 GiB 17990.83 GB)
  Used Dev Size : 11712781952 (5585.09 GiB 5996.94 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 882a0bb0:cc7ccefa:29f17a31:15af39f6

    Update Time : Sat Jan 29 11:05:35 2022
       Checksum : d1465bb0 - correct
         Events : 20517

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : ..AA ('A' == active, '.' == missing)
~ #

Some advice please ?

Best Answers

  • Mijzelf
    Mijzelf Posts: 2,605  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Options
    You have got a problem. Either you exchanged the wrong disk, or 2 disks were already dropped when you exchanged the disk.

    /dev/sda3:
        Update Time : Fri Jan 28 20:47:48 2022
       Array State : AAAA ('A' == active, '.' == missing)
    /dev/sdb3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)
    /dev/sdc3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)
    /dev/sdd3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)

    Disk 1 was last updated Jan 28 20:47, and at that moment the array was healthy. Disk 2,3 and 4 were updated at Jan 29 11:05, and the array has only 2 members left. So disk 2 was added as spare, as 2 disks is not enough to add an active member.
    So disk 1 failed first, as it's 'Array State' was never upgraded. Maybe disk 2 also failed, maybe not. Was the array degraded or down, when you exchanged the disk?

  • Mijzelf
    Mijzelf Posts: 2,605  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Options
    and initialize it,

    What do you mean by that?

    What you are proposing is pretty harmless, as long as you don't delete and/or recreate volumes using the webinterface. I don't think you will be able to get any data from the NAS, as the array is down, and won't automagically be up again. The raid headers tell the raid manager they don't belong to the same array anymore.

    From the commandline it is possible to bring the array up (by re-creating it without touching the content), but as long as the original unreadable sector is there, the array will go down as soon as it's accessed. Very inconvenient when you are trying to backup.

All Replies

  • Mijzelf
    Mijzelf Posts: 2,605  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Options
    You have got a problem. Either you exchanged the wrong disk, or 2 disks were already dropped when you exchanged the disk.

    /dev/sda3:
        Update Time : Fri Jan 28 20:47:48 2022
       Array State : AAAA ('A' == active, '.' == missing)
    /dev/sdb3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)
    /dev/sdc3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)
    /dev/sdd3:
        Update Time : Sat Jan 29 11:05:35 2022
       Array State : ..AA ('A' == active, '.' == missing)

    Disk 1 was last updated Jan 28 20:47, and at that moment the array was healthy. Disk 2,3 and 4 were updated at Jan 29 11:05, and the array has only 2 members left. So disk 2 was added as spare, as 2 disks is not enough to add an active member.
    So disk 1 failed first, as it's 'Array State' was never upgraded. Maybe disk 2 also failed, maybe not. Was the array degraded or down, when you exchanged the disk?

  • ValentinIstoc
    Options
    On the 27th I changed disc 2. I read some articles here on the forum and on the 28th I proceeded to repair the raid from the web interface. After about 30 minutes I started to hear the beeps again, it took me out of the web interface and at the storage manager I had the option to repair the raid again. I left it like that overnight, did some more searches on the 29th and started running some SSH commands to see the raid status. Is it possible that disk 1 also crashed while repairing the raid? What can I do in this situation? I'm thinking of inserting disk 2 (the old one) and maybe I have a chance to repair the raid with disk 1. What would be the solution? I have important data there.
  • Mijzelf
    Mijzelf Posts: 2,605  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    edited January 2022
    Options
    So the rebuild for disk 2 started on the 27th and completed. Then disk 2 was dropped again, and while rebuilding disk 1 was dropped after Fri Jan 28 20:47 (UTC, I think), leaving you in your current situation.
    It is important to know that the disks are not crashed, they just have one or more unreadable sectors. The raid manager drops a member as soon as an I/O error occurs, which is in many cases an unreadable sector.
    It is possible to recreate your array using the original 4 disks. Problem is that the unreadable sector is still unreadable, so sooner or later this will hit you again.
    The solution is to create a bit-by-bit copy on a new disk. The unreadable sector cannot be copied, so it will be filled with zero's on the copy. If that is a problem depends on the function of that sector.
    You have got 5 disks: A, B1, B2, C and D, where A failed during the 2nd rebuild, B1 was dropped first, and B2 was dropped soon after the 1 rebuild. C and D are healthy, as far as we know.
    It's a bit strange that B2 was dropped soon after the 1st rebuild. Is that a new disk? Have you looked at it's SMART values?
    Anyway, I think B1 is most out of sync. It was dropped on the 27th, and all changes to the filesystem here after are not on B1. When A was dropped, the array was down, so A should be up-to-date.
    I think you should try to create a bit-by-bit backup of A, and then create a degraded array of A, C and D. Then you can add a 4th disk to get redundancy back.

    The procedure to create the bit-by-bit copy:
    Remove all disks except A and plug a new disk in. Then execute
    cat /proc/partitions
    or
    mdadm --examine /dev/sd[ab]3
    to make sure disk A is still disk /dev/sda and the new disk is /dev/sdb
    Download this 3 files and put them on an USB stick, and plug it in.
    Execute

    cd /e-data/<some-hex-code>/
    ./fix-ld-linux.sh
    ./screen
    <enter>
    ./ddrescue /dev/sda /dev/sbb ./logfile

    This will copy disk /dev/sda to /dev/sdb, and skip unreadable sectors. Make sure sda is disk A, and sdb is the new disk. When copying is busy, you can close your ssh session. Later you can get your session back with
    cd /e-data/<some-hex-code>/
    ./screen -r
    (That is the function of screen). When copying is done,
    mdadm --examine /dev/sd[ab]3
    should show 2 identical headers.

    When that is completed, let's talk about recreating the array.

    I have important data there.
    By now it's clear that you should have a backup. And raid is not a backup.

  • ValentinIstoc
    ValentinIstoc Posts: 4
    edited February 2022
    Options
    Thanks. There is a lot of information and I try to understand the steps. What I've done in the meantime. Yesterday, I connected the NAS to a network where I had space to back up data. I turned on the NAS and let it work. I wait until it is fully initialized and reread the status. If I can't access the data, I will reinsert disk 2 (which has been replaced) and initialize it, check the status, and try to copy all the data if I can access it (or at least the critical ones). ). If I still do not have a solution, I will use the steps presented by you.
    Do you think it's okay to continue? Are there any risks that I do not anticipate at this time due to lack of experience in such issues? Or do you think I should go straight to the steps?
    Thank you very much for your time and information.
    I'll be back with a status.

    PS Related to back-up. Now I realize. I relied on the redundancy of a disk.
  • Mijzelf
    Mijzelf Posts: 2,605  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Answer ✓
    Options
    and initialize it,

    What do you mean by that?

    What you are proposing is pretty harmless, as long as you don't delete and/or recreate volumes using the webinterface. I don't think you will be able to get any data from the NAS, as the array is down, and won't automagically be up again. The raid headers tell the raid manager they don't belong to the same array anymore.

    From the commandline it is possible to bring the array up (by re-creating it without touching the content), but as long as the original unreadable sector is there, the array will go down as soon as it's accessed. Very inconvenient when you are trying to backup.

Consumer Product Help Center