Zyxel NSA325 v2 shows status degraded after HDD replacement

lintux · October 2021

Hi,

as one of my 2 2TB HDDs (working in RAID1 Mode) died a week ago, I replaced it with another 2TB HDD.
Initially after replacement all went well, the System started the repair and after a wile the volume status switched to "Healthy".

Now it shows "Degraded" again - even if now both disks are showing up as "Healthy" in the S.M.A.R.T menu.

Rebooting the NSA leads to show the Voluem Status for a short while as " Recovering" with an increasing % (last was 71.3%), but switches to "Degraded" after a few minutes.

Selecting "Repair Volume" in the Volume menu leads to an error message in the Bottom Status line " Disk capacity must be equal to or greater than the smallest disk in the RAID."

The 2 Disks currently in use are:
Seagate ST2000VN000 (old)

Seagate ST2000VN004 (new)

Both showing 1.82 Capacity and a Healthy status.

Thanks in advance for any help or useful hints.

BR
lintux

Mijzelf · October 2021

lintux said:

dmesg shows an over and over repeating message:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2

Your system hit a bug, somewhere. This should be in the log only once per state change of the array, AFAIK.

I *think* you have a 'pending read error' on your source disk. This is a (healthy) sector where the checksum doesn't match (some bit toggled), and so it cannot be read. It is not considered a hardware bug, as you can write the sector normally, after which it can be read again. So Smart doesn't disapprove the disk (although the pending read errors can be found in the details) but the rebuilding of the array stops there, and as the sector is never written, it doesn't solve over time.

Don't know what would be a good strategy now. An option would be to backup everything, and create a new volume, if you have enough external storage for that.

Complicating factor is that ZyXEL in their wisdom decided to switch off the package server for the EOL NASses, so you can't reinstall any package.

Another option is to remove the new disk, and fill up the volume to the rim, in hope the pending read error is in a sector which is not in use by the filesystem.

This can be done with

dd if=/dev/zero of=/i-data/md0/admin/bigfile bs=16M

After that remove the 'bigfile', and insert the new disk again, to let it rebuild again. Possibly the pending read error sector is overwritten. A problem with this approach is that you are stressing the disk, while it's twin brother already died. So what are the odds you kill this remaining disk?

@mMontana

That is outside my cultural luggage, I'm afraid. Google was able to tell me you are speaking about some British television show I never heard of.

Mijzelf · October 2021

Can you login over ssh or over the Telnet backdoor (as root, using the admin password) and post the output of

cat /proc/mdstat

cat /proc/partition

mdadm --examine /dev/sd[ab]2

lintux · October 2021

Sure, thanks for the reply ..

cat /proc/mdstat:

Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda2[0] sdb2[2]
      1952996792 blocks super 1.2 [2/1] [U_]

unused devices: <none>

cat /proc/partitions:

major minor #blocks name

   7        0     143360 loop0
   8        0 1953514584 sda
   8        1     514048 sda1
   8        2 1952997952 sda2
   8       16 1953514584 sdb
   8       17     514048 sdb1
   8       18 1952997952 sdb2
31        0       1024 mtdblock0
31        1        512 mtdblock1
31        2        512 mtdblock2
31        3        512 mtdblock3
31        4      10240 mtdblock4
31        5      10240 mtdblock5
31        6      48896 mtdblock6
31        7      10240 mtdblock7
31        8      48896 mtdblock8
   9        0 1952996792 md0

mdadm --examine /dev/sd[ab]2:

/dev/sda2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
           Name : NSA325-v2:0
Creation Time : Thu Feb 19 19:54:55 2015
     Raid Level : raid1
   Raid Devices : 2

Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
     Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : cf9b8394:f0f82f74:969d3c47:6caf059e

    Update Time : Fri Oct 8 15:57:58 2021
       Checksum : af584a48 - correct
         Events : 221724

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x2
     Array UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
           Name : NSA325-v2:0
Creation Time : Thu Feb 19 19:54:55 2015
     Raid Level : raid1
   Raid Devices : 2

Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
     Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 2830471168 sectors
          State : clean
    Device UUID : 85b632ed:e8a6d661:cb6e0426:0a1ddc0e

    Update Time : Fri Oct 8 15:57:58 2021
       Checksum : ee01e1cd - correct
         Events : 221724

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)

To me it looks not really bad?!?

BR
lintux

Mijzelf · October 2021

lintux said:

To me it looks not really bad?!?

Well, the bad thing is that we shouldn't be seeing this. Both raid members agree that they are in an array, healthy, and both last updated on the same time, today around 4PM. That will be UTC, so that is around the time you posted. Yet the raid manager in the kernel tells the array is degraded.

I don't think you need to worry about the weird size message from the firmware. The firmware has the same info, and doesn't know what is happening either.

Maybe the kernel log has some interesting info (dmesg), or the array itself. (mdadm --detail /dev/md0)

mMontana · October 2021

Clarkson Mode ON.

Some says... the firmware cannot understand mdadm?

Mijzelf · October 2021

Clarkson?

All ZyXEL NASses understand mdadm, even the single disk ones.

lintux · October 2021

dmesg shows an over and over repeating message:

--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2

mdadm --detail /dev/md0

/dev/md0:
        Version : 1.2
Creation Time : Thu Feb 19 19:54:55 2015
     Raid Level : raid1
     Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
   Raid Devices : 2
Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat Oct 9 10:05:12 2021
          State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

           Name : NSA325-v2:0
           UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
         Events : 222350

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       2       8       18        1      spare rebuilding   /dev/sdb2

I dont try an interpretation this time ..

mMontana · October 2021

Mijzelf said:

Clarkson?

Jeremy Clarkson, Mister "High Gear no-more". A re-interpretation of the catch phrase about the Stig.
"some says...." "all we know it's called the Stig.

Mijzelf · October 2021

lintux said:

dmesg shows an over and over repeating message:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2

Your system hit a bug, somewhere. This should be in the log only once per state change of the array, AFAIK.

I *think* you have a 'pending read error' on your source disk. This is a (healthy) sector where the checksum doesn't match (some bit toggled), and so it cannot be read. It is not considered a hardware bug, as you can write the sector normally, after which it can be read again. So Smart doesn't disapprove the disk (although the pending read errors can be found in the details) but the rebuilding of the array stops there, and as the sector is never written, it doesn't solve over time.

Don't know what would be a good strategy now. An option would be to backup everything, and create a new volume, if you have enough external storage for that.

Complicating factor is that ZyXEL in their wisdom decided to switch off the package server for the EOL NASses, so you can't reinstall any package.

Another option is to remove the new disk, and fill up the volume to the rim, in hope the pending read error is in a sector which is not in use by the filesystem.

This can be done with

dd if=/dev/zero of=/i-data/md0/admin/bigfile bs=16M

After that remove the 'bigfile', and insert the new disk again, to let it rebuild again. Possibly the pending read error sector is overwritten. A problem with this approach is that you are stressing the disk, while it's twin brother already died. So what are the odds you kill this remaining disk?

@mMontana

That is outside my cultural luggage, I'm afraid. Google was able to tell me you are speaking about some British television show I never heard of.

lintux · October 2021

Thanks for the help. I'll try to back up all data, replace the old HDD by a new model might be wise anyway to have not a pair of disk so different in age) and build the volumes from scratch (an approach I thought I could avoid with a Raid 1 in place).

BR lintux

plk · July 2022

[In]Genius. Thank you.

Zyxel NSA325 v2 shows status degraded after HDD replacement

Accepted Solution

All Replies

Categories

Consumer Product Help Center