NAS542 - Out of space on 3 disk RAID5 array - now stuck

UrbanSlayer
UrbanSlayer Posts: 3
edited October 2021 in Personal Cloud Storage
As above, I have a 3 disk RAID5 array that ran out of space, this the main array in the unit.  I added a separate disk later that was not put in as a spare and is hosting other data.

Now that the RAID5 array has run out of space, the entire unit has become 'stuck'.  Initially, after a reboot, I was able to login but couldn't delete any files from samba mount points or the web interface.  After waiting for some time for this, it froze again and locked me out, so it was rebooted again and after this I can no longer login via the web frontend.  I have since logged in via SSH and when trying to delete a file from the main array, nothing happens, it just sits there.  If I create/delete files via the terminal on the other disk, it is fine.

Also, on the deletion - you cannot kill the command either.  Ctrl-C does nothing, and a kill -9 also does nothing immediately, and has been sitting for at least 30 minutes.  It will not go, and I have su'd up to root, so I assume it is waiting on something.

From what I can see from mdadm, the array itself is fine and is not rebuilding, and there is very little CPU activity.

/ # mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Wed Jun 29 18:05:51 2016
     Raid Level : raid5
     Array Size : 5852270208 (5581.16 GiB 5992.72 GB)
  Used Dev Size : 2926135104 (2790.58 GiB 2996.36 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Oct 18 19:03:38 2021
          State : clean 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : NAS542:2
           UUID : 268740ab:f319bdaf:2d5dfdcb:d15089de
         Events : 28

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
/ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md3 : active raid1 sdd3[0]
      9762304832 blocks super 1.2 [1/1] [U]
      
md2 : active raid5 sda3[0] sdc3[2] sdb3[1]
      5852270208 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Apart from pulling these 3 drives, mounting them in separate enclosure and trying to recover them on another box, is there anything else I can do?

All Replies

  • Mijzelf
    Mijzelf Posts: 2,598  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Also, on the deletion - you cannot kill the command either.  Ctrl-C does nothing, and a kill -9 also does nothing immediately, and has been sitting for at least 30 minutes.  It will not go, and I have su'd up to root, so I assume it is waiting on something.
    In my experience this is some pending I/O operation which blocks. Running top in another shell might tell.
    Further the output of dmesg can be interesting (kernel log).

    If you are familiar with the command line, you can use the Universal usb_key_func thumb to get early-in-boot telnet access, assemble&mount the array manually, to create some space.

    Another action which might help is a factory reset. (Keep reset pressed until it beeped 3x) This disables all installed packages (among some other actions), and I *think* some package is trying to write some logfile, or something like that, blocking the whole filesystem.
  • That usb key looks very handy, thank you, that should avoid having to get another enclosure and build the array outside!  I have currently forced a check of the entire array, so will wait for that to finish before rebooting again.

    There was nothing helpful in dmesg at all, across multiple reboots, no errors etc which was unhelpful, I would have preferred an error as it would be a more obvious problem!

    The only package I have installed is the myZyXelCloud Agent, as I just use this as a storage medium.

    I have also rebooted every machine on the network that mounted this volume and removed it from auto mounting, so in theory there should be no connections to the array.  As it happens, now I have rebooted it again to get the array check running, I can access the web console.  It seems as though the moment I try a write operation to the array, it locks everything up.
  • Running the check/resync over the array appears to have fixed it, I am not able to delete things via SSH, so am freeing up quite a bit of space.

    Thanks for the assistance!

Consumer Product Help Center