Zyxel NSA325 v2 shows status degraded after HDD replacement
lintux
Posts: 4
Hi,
as one of my 2 2TB HDDs (working in RAID1 Mode) died a week ago, I replaced it with another 2TB HDD.
Initially after replacement all went well, the System started the repair and after a wile the volume status switched to "Healthy".
Now it shows "Degraded" again - even if now both disks are showing up as "Healthy" in the S.M.A.R.T menu.
Rebooting the NSA leads to show the Voluem Status for a short while as " Recovering" with an increasing % (last was 71.3%), but switches to "Degraded" after a few minutes.
Selecting "Repair Volume" in the Volume menu leads to an error message in the Bottom Status line " Disk capacity must be equal to or greater than the smallest disk in the RAID."
The 2 Disks currently in use are:
Seagate ST2000VN000 (old)
Seagate ST2000VN004 (new)
Both showing 1.82 Capacity and a Healthy status.
Thanks in advance for any help or useful hints.
BR
lintux
as one of my 2 2TB HDDs (working in RAID1 Mode) died a week ago, I replaced it with another 2TB HDD.
Initially after replacement all went well, the System started the repair and after a wile the volume status switched to "Healthy".
Now it shows "Degraded" again - even if now both disks are showing up as "Healthy" in the S.M.A.R.T menu.
Rebooting the NSA leads to show the Voluem Status for a short while as " Recovering" with an increasing % (last was 71.3%), but switches to "Degraded" after a few minutes.
Selecting "Repair Volume" in the Volume menu leads to an error message in the Bottom Status line " Disk capacity must be equal to or greater than the smallest disk in the RAID."
The 2 Disks currently in use are:
Seagate ST2000VN000 (old)
Seagate ST2000VN004 (new)
Both showing 1.82 Capacity and a Healthy status.
Thanks in advance for any help or useful hints.
BR
lintux
0
Accepted Solution
-
lintux said:dmesg shows an over and over repeating message:--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2Your system hit a bug, somewhere. This should be in the log only once per state change of the array, AFAIK.I *think* you have a 'pending read error' on your source disk. This is a (healthy) sector where the checksum doesn't match (some bit toggled), and so it cannot be read. It is not considered a hardware bug, as you can write the sector normally, after which it can be read again. So Smart doesn't disapprove the disk (although the pending read errors can be found in the details) but the rebuilding of the array stops there, and as the sector is never written, it doesn't solve over time.Don't know what would be a good strategy now. An option would be to backup everything, and create a new volume, if you have enough external storage for that.Complicating factor is that ZyXEL in their wisdom decided to switch off the package server for the EOL NASses, so you can't reinstall any package.Another option is to remove the new disk, and fill up the volume to the rim, in hope the pending read error is in a sector which is not in use by the filesystem.This can be done withdd if=/dev/zero of=/i-data/md0/admin/bigfile bs=16MAfter that remove the 'bigfile', and insert the new disk again, to let it rebuild again. Possibly the pending read error sector is overwritten. A problem with this approach is that you are stressing the disk, while it's twin brother already died. So what are the odds you kill this remaining disk?That is outside my cultural luggage, I'm afraid. Google was able to tell me you are speaking about some British television show I never heard of.0
All Replies
-
Can you login over ssh or over the Telnet backdoor (as root, using the admin password) and post the output ofcat /proc/mdstatcat /proc/partitionmdadm --examine /dev/sd[ab]21
-
Sure, thanks for the reply ..
cat /proc/mdstat:
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda2[0] sdb2[2]
1952996792 blocks super 1.2 [2/1] [U_]
unused devices: <none>
cat /proc/partitions:
major minor #blocks name
7 0 143360 loop0
8 0 1953514584 sda
8 1 514048 sda1
8 2 1952997952 sda2
8 16 1953514584 sdb
8 17 514048 sdb1
8 18 1952997952 sdb2
31 0 1024 mtdblock0
31 1 512 mtdblock1
31 2 512 mtdblock2
31 3 512 mtdblock3
31 4 10240 mtdblock4
31 5 10240 mtdblock5
31 6 48896 mtdblock6
31 7 10240 mtdblock7
31 8 48896 mtdblock8
9 0 1952996792 md0
mdadm --examine /dev/sd[ab]2:
/dev/sda2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
Name : NSA325-v2:0
Creation Time : Thu Feb 19 19:54:55 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : cf9b8394:f0f82f74:969d3c47:6caf059e
Update Time : Fri Oct 8 15:57:58 2021
Checksum : af584a48 - correct
Events : 221724
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing)
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x2
Array UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
Name : NSA325-v2:0
Creation Time : Thu Feb 19 19:54:55 2015
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 1952996928 (1862.52 GiB 1999.87 GB)
Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 2830471168 sectors
State : clean
Device UUID : 85b632ed:e8a6d661:cb6e0426:0a1ddc0e
Update Time : Fri Oct 8 15:57:58 2021
Checksum : ee01e1cd - correct
Events : 221724
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
To me it looks not really bad?!?
BR
lintux
0 -
lintux said:To me it looks not really bad?!?Well, the bad thing is that we shouldn't be seeing this. Both raid members agree that they are in an array, healthy, and both last updated on the same time, today around 4PM. That will be UTC, so that is around the time you posted. Yet the raid manager in the kernel tells the array is degraded.I don't think you need to worry about the weird size message from the firmware. The firmware has the same info, and doesn't know what is happening either.Maybe the kernel log has some interesting info (dmesg), or the array itself. (mdadm --detail /dev/md0)0
-
Clarkson Mode ON.Some says... the firmware cannot understand mdadm?0
-
Clarkson?All ZyXEL NASses understand mdadm, even the single disk ones.0
-
dmesg shows an over and over repeating message:--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2mdadm --detail /dev/md0/dev/md0:
Version : 1.2
Creation Time : Thu Feb 19 19:54:55 2015
Raid Level : raid1
Array Size : 1952996792 (1862.52 GiB 1999.87 GB)
Used Dev Size : 1952996792 (1862.52 GiB 1999.87 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sat Oct 9 10:05:12 2021
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Name : NSA325-v2:0
UUID : dfcd2b59:6c86c85e:7ca3a57f:6afb15af
Events : 222350
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
2 8 18 1 spare rebuilding /dev/sdb2
I dont try an interpretation this time ..
0 -
lintux said:dmesg shows an over and over repeating message:--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:1, o:1, dev:sdb2Your system hit a bug, somewhere. This should be in the log only once per state change of the array, AFAIK.I *think* you have a 'pending read error' on your source disk. This is a (healthy) sector where the checksum doesn't match (some bit toggled), and so it cannot be read. It is not considered a hardware bug, as you can write the sector normally, after which it can be read again. So Smart doesn't disapprove the disk (although the pending read errors can be found in the details) but the rebuilding of the array stops there, and as the sector is never written, it doesn't solve over time.Don't know what would be a good strategy now. An option would be to backup everything, and create a new volume, if you have enough external storage for that.Complicating factor is that ZyXEL in their wisdom decided to switch off the package server for the EOL NASses, so you can't reinstall any package.Another option is to remove the new disk, and fill up the volume to the rim, in hope the pending read error is in a sector which is not in use by the filesystem.This can be done withdd if=/dev/zero of=/i-data/md0/admin/bigfile bs=16MAfter that remove the 'bigfile', and insert the new disk again, to let it rebuild again. Possibly the pending read error sector is overwritten. A problem with this approach is that you are stressing the disk, while it's twin brother already died. So what are the odds you kill this remaining disk?That is outside my cultural luggage, I'm afraid. Google was able to tell me you are speaking about some British television show I never heard of.0 -
Thanks for the help. I'll try to back up all data, replace the old HDD by a new model might be wise anyway to have not a pair of disk so different in age) and build the volumes from scratch (an approach I thought I could avoid with a Raid 1 in place).BR lintux0
-
[In]Genius. Thank you.0
Categories
- All Categories
- 415 Beta Program
- 2.4K Nebula
- 151 Nebula Ideas
- 98 Nebula Status and Incidents
- 5.7K Security
- 277 USG FLEX H Series
- 277 Security Ideas
- 1.4K Switch
- 74 Switch Ideas
- 1.1K Wireless
- 42 Wireless Ideas
- 6.4K Consumer Product
- 250 Service & License
- 395 News and Release
- 85 Security Advisories
- 29 Education Center
- 10 [Campaign] Zyxel Network Detective
- 3.6K FAQ
- 34 Documents
- 34 Nebula Monthly Express
- 85 About Community
- 75 Security Highlight