NAS 542 raid 5 volume down
All Replies
-
Ok, thanks for your input - quite a bit to take inIf you have pulled all disks except the 'ddrecued one', you can run 'mdadm --examine /dev/sd[abcd]3 on it, to see if it is indeed the problem one.I removed all disks except the ddrescued one (Disk3);~ # mdadm --examine /dev/sd[abcd]3
/dev/sdc3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 555ccb7e:e9b29adc:2b39eea0:9329542f
Name : NAS542:2 (local to host NAS542)
Creation Time : Wed Oct 5 13:17:25 2022
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5852270592 (2790.58 GiB 2996.36 GB)
Array Size : 8778405888 (8371.74 GiB 8989.09 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3e046eeb:3bac7e38:2c4e2408:d05d7a1d
Update Time : Thu Nov 10 16:51:14 2022
Checksum : 6084a10 - correct
Events : 1148
Layout : left-symmetric
Chunk Size : 64K
Device Role : Active device 3
Array State : .A.A ('A' == active, '.' == missing)
~ #But it says "Device Role : Active device 3"
you wrote:
In that case it's 'Device role' is 'Active device 2'. In that case this shouldn't have happened.
So now I don't know if that disk is the problem or not?
0 -
If all the info you have given me is correct, the 'ddrescued one' is not the problem:=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.11.10 16:46:58 =~=~=~=~=~=~=~=~=~=~=~=
<snip>
~ # mdadm --examine /dev/sd[abcd]3
<snip>
/dev/sdc3:
<snip>
Array UUID : 555ccb7e:e9b29adc:2b39eea0:9329542f
<snip>
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing)mdadm reads here the raid header from the disk. Each header is unique, so there is only one disk in the world with that 'Array UUID' and that 'Device Role'. This disk is the one which was dropped (the 'Array State' tells that. It shows a healthy array, as this header is not updated after the drop, and dmesg told the same.)The dump in your post is another disk. It's also called sdc, but unfortunately (in this case) that doesn't say much. On boot all disks get a device name, the first one found gets sda, the second sdb, ... . So the sequence of finding generates the device name, and that can change. For that reason you have to look at the content (or the serial number) to identify a disk.In this case I'm surprised the only disk in the box showed up as sdc, as I expected sda. Probably you have 2 usb sticks and/or SD cards? Or you hotpulled the disks?For this reason it's important to look at the 'Device Roles' of all disks before executing 'mdadm --create', as that command eats the volatile device names, and generates new headers, overwriting the old roles.
0 -
Hmm... I hotpulled the disks maybe not the smartest way? - Sorry!After rebooting the NAS (with only the ddrescued disk 3 inserted) I get;=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.11.11 11:46:11 =~=~=~=~=~=~=~=~=~=~=~=
login as: admin
admin@192.168.1.157's password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ $ su
Password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ # mdadm --examine /dev/sd[abcd]3
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 555ccb7e:e9b29adc:2b39eea0:9329542f
Name : NAS542:2 (local to host NAS542)
Creation Time : Wed Oct 5 13:17:25 2022
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5852270592 (2790.58 GiB 2996.36 GB)
Array Size : 8778405888 (8371.74 GiB 8989.09 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3e046eeb:3bac7e38:2c4e2408:d05d7a1d
Update Time : Thu Nov 10 16:51:14 2022
Checksum : 6084a10 - correct
Events : 1148
Layout : left-symmetric
Chunk Size : 64K
Device Role : Active device 3
Array State : .A.A ('A' == active, '.' == missing)
mdadm: cannot open /dev/sdb3: No such device or address
mdadm: cannot open /dev/sdc3: No such device or address
mdadm: cannot open /dev/sdd3: No such device or address
~ #Rebooting again with only disk 2 inserted;=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.11.11 11:54:50 =~=~=~=~=~=~=~=~=~=~=~=
login as: admin
admin@192.168.1.157's password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ $ su
Password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ # mdadm --examine /dev/sd[abcd]3
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 555ccb7e:e9b29adc:2b39eea0:9329542f
Name : NAS542:2 (local to host NAS542)
Creation Time : Wed Oct 5 13:17:25 2022
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5852270592 (2790.58 GiB 2996.36 GB)
Array Size : 8778405888 (8371.74 GiB 8989.09 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f7c083fd:fe37f383:55424937:52ec4bd2
Update Time : Thu Nov 10 16:51:14 2022
Checksum : 4788153a - correct
Events : 1148
Layout : left-symmetric
Chunk Size : 64K
Device Role : Active device 1
Array State : .AAA ('A' == active, '.' == missing)
mdadm: cannot open /dev/sdb3: No such device or address
mdadm: cannot open /dev/sdc3: No such device or address
mdadm: cannot open /dev/sdd3: No such device or address
~ #Rebooting again with only disk 4 inserted=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.11.11 12:01:35 =~=~=~=~=~=~=~=~=~=~=~=
login as: admin
admin@192.168.1.157's password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ $ su
Password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ # mdadm --examine /dev/sd[abcd]3
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 555ccb7e:e9b29adc:2b39eea0:9329542f
Name : NAS542:2 (local to host NAS542)
Creation Time : Wed Oct 5 13:17:25 2022
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5852270592 (2790.58 GiB 2996.36 GB)
Array Size : 8778405888 (8371.74 GiB 8989.09 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 803b4d20:57570076:2d7c62d1:e4bf567a
Update Time : Thu Nov 10 09:48:48 2022
Checksum : 9e815985 - correct
Events : 1148
Layout : left-symmetric
Chunk Size : 64K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing)
mdadm: cannot open /dev/sdb3: No such device or address
mdadm: cannot open /dev/sdc3: No such device or address
mdadm: cannot open /dev/sdd3: No such device or address
~ #Disk 2 is device 1Disk 3 is device 3Disk 4 is device 2
0 -
Hmm... I hotpulled the disks maybe not the smartest way? - Sorry!
No problem. Both the disks and the NAS support hotplugging and -pulling. It's just that the device node names get a bit unpredictable.
Disk 4 is device 2So that is the one which was dropped. Big question is now, is that disk also dying, or was this a sector which was not accessed for a long time (maybe ever), which happened to have an invalid checksum. The 'Current Pending Sector' value of SMART can tell if the 2nd option is possible.smartctl -a /dev/sda(If you didn't change the disks meanwhile.)0 -
Ok, here's the result;=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.11.12 15:24:32 =~=~=~=~=~=~=~=~=~=~=~=
login as: admin
admin@192.168.1.157's password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ $ su
Password:
BusyBox v1.19.4 (2022-08-11 15:13:21 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.
~ # smartctl -a /dev/sda
smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.2.54] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST3000DM001-1CH166
Serial Number: Z1F4GQ1T
LU WWN Device Id: 5 000c50 065c152e8
Firmware Version: CC27
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Nov 12 15:26:32 2022 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 339) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 188790814
3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 091 091 020 Old_age Always - 9763
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 23202616
9 Power_On_Hours 0x0032 030 030 000 Old_age Always - 61924
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 123
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 064 064 000 Old_age Always - 36
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 071 043 045 Old_age Always In_the_past 29 (Min/Max 26/47 #276)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 73
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 246965
194 Temperature_Celsius 0x0022 029 057 000 Old_age Always - 29 (0 10 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 24
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 24
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 7390h+46m+24.741s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 4025112595
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 21008283939
SMART Error Log Version: 1
ATA Error Count: 36 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 36 occurred at disk power-on lifetime: 61890 hours (2578 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 78 ff ff ff 4f 00 23:13:11.442 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:13:11.440 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:13:11.440 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:13:11.439 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:13:11.439 READ FPDMA QUEUED
Error 35 occurred at disk power-on lifetime: 61890 hours (2578 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 23:13:07.654 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:07.654 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:07.639 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:07.638 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:07.638 READ FPDMA QUEUED
Error 34 occurred at disk power-on lifetime: 61890 hours (2578 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 23:13:03.937 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:03.936 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:03.919 READ FPDMA QUEUED
60 00 70 ff ff ff 4f 00 23:13:03.918 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:03.918 READ FPDMA QUEUED
Error 33 occurred at disk power-on lifetime: 61890 hours (2578 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 23:13:00.189 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:00.189 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:00.189 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:00.188 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:13:00.188 READ FPDMA QUEUED
Error 32 occurred at disk power-on lifetime: 61890 hours (2578 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 38 ff ff ff 4f 00 23:12:56.453 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:12:56.441 READ FPDMA QUEUED
60 00 70 ff ff ff 4f 00 23:12:56.441 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 23:12:56.441 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 23:12:56.441 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
~ #
0 -
I wouldn't classify myself as a SMART expert, but this disk looks healthy to me. Not a single Reallocated_Sector_Ct, which is good. 24 Current_Pending_Sector, which could have caused the drop. The Raw_Read_Error_Rate and Seek_Error_Rate seem a bit high, but to be honest I don't know what would be sane numbers for a 7 year old disk with 21008283939 Total_LBAs_Read.You can re-create the disk as you did before. As 'Active Device 0' isn't there, the command ismdadm --create <arguments> /dev/md2 missing Disk2_3 Disk4_3 Disk3_3where you have to find the actual device names by reading the output of 'mdadm --examine /dev/sda[abcd]3'. <arguments> are in the beginning of this thread.After that first run e2fsck on /dev/md2 before rebooting or mounting the array.When the array is up again, you can't add the 4th disk directly. The same Current_Pending_Sector will stop the rebuild. You have the option to create a backup first, and then you can fill all unused sectors with zeros, as described in the thread I pointed to, to reset the Current_Pending_Sector's. It took 2 hours for 500GB, so it will take around 20 hours for your 4.5TB free space. You can run it in screen, of course.After that, the Current_Pending_Sector should be zero (and check also the other original disk), if not, there is one or more sectors in the data which are not readable, and I think that will stop the rebuild again.
0 -
Hmm, a bit wierd perhaps, but when I inserted the other two disks and rebooted the NAS, the Volume was again identified as 'degraded', and the box started beeping. When I opened the Storage Manager the webb interfaced showed a degraded volume, just as it did last time before I tried to repair the RAID with my latest disk.So this time I've started a backup job of my degraded Volume. It's currently being backed up to an external 5 TB USB-drive connected to the front USB port.Let's see if that goes well, and if all necessary data is successfully backed up, my plan is to create a new volume from scratch, instead of trying to repair the degraded one.And then restore the data back to the NAS from my backup drive.Would that work or is there something I'm missing here?0
-
The backup job was completed last night at 2022-11-15 04:40And when browsing my External USB disk I find 589 .dar-files and a single .lst-file0
Categories
- All Categories
- 415 Beta Program
- 2.4K Nebula
- 144 Nebula Ideas
- 94 Nebula Status and Incidents
- 5.6K Security
- 237 USG FLEX H Series
- 267 Security Ideas
- 1.4K Switch
- 71 Switch Ideas
- 1.1K Wireless
- 40 Wireless Ideas
- 6.3K Consumer Product
- 247 Service & License
- 384 News and Release
- 83 Security Advisories
- 29 Education Center
- 10 [Campaign] Zyxel Network Detective
- 3.2K FAQ
- 34 Documents
- 34 Nebula Monthly Express
- 83 About Community
- 71 Security Highlight