NAS540 Shows Healthy but RAID degraded.

jahmon · December 2022

~ # smartctl -a /dev/sdd

smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.2.54] (local build)

=== START OF INFORMATION SECTION ===

Model Family: Hitachi Ultrastar A7K2000

Device Model: Hitachi HUA722020ALA331

Serial Number: YBK0JV2F

LU WWN Device Id: 5 000cca 221ea85bb

Firmware Version: JKAOA3NH

User Capacity: 2,000,398,934,016 bytes [2.00 TB]

Sector Size: 512 bytes logical/physical

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 2.6, 3.0 Gb/s

Local Time is: Thu Dec 8 15:36:06 2022 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

See vendor-specific Attribute list for failed Attributes.

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (22624) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 377) minutes.

SCT capabilities: (0x003d) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000b 083 083 016 Pre-fail Always - 131506

2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 112

3 Spin_Up_Time 0x0007 116 116 024 Pre-fail Always - 620 (Average 620)

4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 2108

5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 1058

7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

8 Seek_Time_Performance 0x0005 123 123 020 Pre-fail Offline - 34

9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 6202

10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39

192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 2110

193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 2110

194 Temperature_Celsius 0x0002 120 120 000 Old_age Always - 50 (Min/Max 21/61)

196 Reallocated_Event_Count 0x0032 048 048 000 Old_age Always - 1127

197 Current_Pending_Sector 0x0022 044 044 000 Old_age Always - 1201

198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1

ATA Error Count: 19 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 19 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 2c 8c bc 68 03 Error: UNC 44 sectors at LBA = 0x0368bc8c = 57195660

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:55.130 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:55.084 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:55.076 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:55.075 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:55.074 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 18 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:38.602 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:38.555 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:38.547 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:38.546 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:38.545 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 17 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:22.061 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:21.261 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:21.253 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:21.252 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:21.251 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 16 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 3f c1 bc 68 03 Error: UNC 63 sectors at LBA = 0x0368bcc1 = 57195713

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:37.109 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:37.063 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:37.055 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:37.054 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:37.053 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 15 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 45 bb bc 68 03 Error: UNC 69 sectors at LBA = 0x0368bcbb = 57195707

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:12.391 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:12.344 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:12.336 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:12.335 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:12.334 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 58737 -

# 2 Short offline Completed without error 00% 57416 -

# 3 Short offline Completed without error 00% 57414 -

# 4 Short offline Completed without error 00% 53020 -

# 5 Short offline Completed without error 00% 53017 -

# 6 Short offline Completed without error 00% 53013 -

# 7 Short offline Completed without error 00% 53011 -

# 8 Short offline Completed without error 00% 53009 -

# 9 Short offline Completed without error 00% 45977 -

#10 Short offline Completed without error 00% 40631 -

#11 Short offline Completed without error 00% 38232 -

#12 Short offline Completed without error 00% 38228 -

#13 Short offline Completed without error 00% 38226 -

#14 Short offline Completed without error 00% 37252 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

~ #

jahmon · December 2022

There's a lot of info here. You are correct my config is sda and sdd. I may be missing something but this looks like it passes self-test. Your help appreciated!

Here's the logs from one as it is too long.

~ # smartctl -a /dev/sdd

smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.2.54] (local build)

=== START OF INFORMATION SECTION ===

Model Family: Hitachi Ultrastar A7K2000

Device Model: Hitachi HUA722020ALA331

Serial Number: YBK0JV2F

LU WWN Device Id: 5 000cca 221ea85bb

Firmware Version: JKAOA3NH

User Capacity: 2,000,398,934,016 bytes [2.00 TB]

Sector Size: 512 bytes logical/physical

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 2.6, 3.0 Gb/s

Local Time is: Thu Dec 8 15:36:06 2022 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

See vendor-specific Attribute list for failed Attributes.

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (22624) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 377) minutes.

SCT capabilities: (0x003d) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000b 083 083 016 Pre-fail Always - 131506

2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 112

3 Spin_Up_Time 0x0007 116 116 024 Pre-fail Always - 620 (Average 620)

4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 2108

5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 1058

7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

8 Seek_Time_Performance 0x0005 123 123 020 Pre-fail Offline - 34

9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 6202

10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39

192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 2110

193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 2110

194 Temperature_Celsius 0x0002 120 120 000 Old_age Always - 50 (Min/Max 21/61)

196 Reallocated_Event_Count 0x0032 048 048 000 Old_age Always - 1127

197 Current_Pending_Sector 0x0022 044 044 000 Old_age Always - 1201

198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1

ATA Error Count: 19 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 19 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 2c 8c bc 68 03 Error: UNC 44 sectors at LBA = 0x0368bc8c = 57195660

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:55.130 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:55.084 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:55.076 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:55.075 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:55.074 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 18 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:38.602 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:38.555 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:38.547 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:38.546 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:38.545 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 17 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:22.061 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:21.261 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:21.253 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:21.252 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:21.251 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 16 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 3f c1 bc 68 03 Error: UNC 63 sectors at LBA = 0x0368bcc1 = 57195713

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:37.109 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:37.063 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:37.055 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:37.054 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:37.053 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 15 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 45 bb bc 68 03 Error: UNC 69 sectors at LBA = 0x0368bcbb = 57195707

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:12.391 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:12.344 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:12.336 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:12.335 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:12.334 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 58737 -

# 2 Short offline Completed without error 00% 57416 -

# 3 Short offline Completed without error 00% 57414 -

# 4 Short offline Completed without error 00% 53020 -

# 5 Short offline Completed without error 00% 53017 -

# 6 Short offline Completed without error 00% 53013 -

# 7 Short offline Completed without error 00% 53011 -

# 8 Short offline Completed without error 00% 53009 -

# 9 Short offline Completed without error 00% 45977 -

#10 Short offline Completed without error 00% 40631 -

#11 Short offline Completed without error 00% 38232 -

#12 Short offline Completed without error 00% 38228 -

#13 Short offline Completed without error 00% 38226 -

#14 Short offline Completed without error 00% 37252 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Mijzelf · December 2022

Well, that is clear:

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

<snip>

5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 1058

So the problem is the Reallocated_Sector_Ct. I have no idea why that looks different in the GUI. The disk itself has recorded 4 errors at 5921 power-on hours (which is almost 300 hours ago), at sectors 57195660, 57195648, 57195713 and 57195707. So that 24 hours is a bit exaggerated. Assuming that are 4k sectors, that is around 218GB from the start of the disk, so that is well inside the data partition.

jahmon · December 2022

So I looked through it more carefully, right in the beginning of the "SMART Data Section" there is a line "SMART overall-health self-assessment test result:" Drive A shows "Passed", Drive D shows "Failed". As D is the RAID 5 parity drive, this explains why it's degraded, but I can still access the data and the rebuild fails while processing. Have I got it?

jahmon · December 2022

There's a lot of info here. You are correct my config is sda and sdd. I may be missing something but this looks like it passes self-test. Your help appreciated!

Here's the logs from one as it is too long.

~ # smartctl -a /dev/sdd

smartctl 6.3 2014-07-26 r3976 [armv7l-linux-3.2.54] (local build)

=== START OF INFORMATION SECTION ===

Model Family: Hitachi Ultrastar A7K2000

Device Model: Hitachi HUA722020ALA331

Serial Number: YBK0JV2F

LU WWN Device Id: 5 000cca 221ea85bb

Firmware Version: JKAOA3NH

User Capacity: 2,000,398,934,016 bytes [2.00 TB]

Sector Size: 512 bytes logical/physical

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 2.6, 3.0 Gb/s

Local Time is: Thu Dec 8 15:36:06 2022 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

See vendor-specific Attribute list for failed Attributes.

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (22624) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 377) minutes.

SCT capabilities: (0x003d) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000b 083 083 016 Pre-fail Always - 131506

2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 112

3 Spin_Up_Time 0x0007 116 116 024 Pre-fail Always - 620 (Average 620)

4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 2108

5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail Always FAILING_NOW 1058

7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

8 Seek_Time_Performance 0x0005 123 123 020 Pre-fail Offline - 34

9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 6202

10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 39

192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 2110

193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 2110

194 Temperature_Celsius 0x0002 120 120 000 Old_age Always - 50 (Min/Max 21/61)

196 Reallocated_Event_Count 0x0032 048 048 000 Old_age Always - 1127

197 Current_Pending_Sector 0x0022 044 044 000 Old_age Always - 1201

198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1

ATA Error Count: 19 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 19 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 2c 8c bc 68 03 Error: UNC 44 sectors at LBA = 0x0368bc8c = 57195660

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:55.130 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:55.084 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:55.076 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:55.075 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:55.074 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 18 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:38.602 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:38.555 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:38.547 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:38.546 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:38.545 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 17 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 38 80 bc 68 03 Error: UNC 56 sectors at LBA = 0x0368bc80 = 57195648

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 38 80 bc 68 e0 08 14d+16:29:22.061 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:29:21.261 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:29:21.253 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:29:21.252 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:29:21.251 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 16 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 3f c1 bc 68 03 Error: UNC 63 sectors at LBA = 0x0368bcc1 = 57195713

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:37.109 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:37.063 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:37.055 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:37.054 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:37.053 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 15 occurred at disk power-on lifetime: 5921 hours (246 days + 17 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 45 bb bc 68 03 Error: UNC 69 sectors at LBA = 0x0368bcbb = 57195707

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 48 b8 bc 68 e0 08 14d+16:28:12.391 READ DMA EXT

27 00 00 00 00 00 e0 08 14d+16:28:12.344 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

ec 00 00 00 00 00 a0 08 14d+16:28:12.336 IDENTIFY DEVICE

ef 03 46 00 00 00 a0 08 14d+16:28:12.335 SET FEATURES [Set transfer mode]

27 00 00 00 00 00 e0 08 14d+16:28:12.334 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 58737 -

# 2 Short offline Completed without error 00% 57416 -

# 3 Short offline Completed without error 00% 57414 -

# 4 Short offline Completed without error 00% 53020 -

# 5 Short offline Completed without error 00% 53017 -

# 6 Short offline Completed without error 00% 53013 -

# 7 Short offline Completed without error 00% 53011 -

# 8 Short offline Completed without error 00% 53009 -

# 9 Short offline Completed without error 00% 45977 -

#10 Short offline Completed without error 00% 40631 -

#11 Short offline Completed without error 00% 38232 -

#12 Short offline Completed without error 00% 38228 -

#13 Short offline Completed without error 00% 38226 -

#14 Short offline Completed without error 00% 37252 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Mijzelf · December 2022

jahmon said:

So I looked through it more carefully, right in the beginning of the "SMART Data Section" there is a line "SMART overall-health self-assessment test result:" Drive A shows "Passed", Drive D shows "Failed". As D is the RAID 5 parity drive, this explains why it's degraded, but I can still access the data and the rebuild fails while processing. Have I got it?

More or less. There is no parity drive in RAID5, the parity blocks are equally distributed over all disks. This is done to maximize the read speed (on a healthy raid array the parity blocks are not used for reading, and so it's a waste to not use a whole disk + it's bandwidth) and to minimize the penalty when a random disk fails.

The raid manager is pretty dumb. When rebuilding the array is simply calculates the content of the 'new' disk from the total surface of the 3 others (the raid manager doesn't know about filesystems, and so doesn't know if a particular sector is used or not), and writes that to the disk. When a write error occurs the new disk is dropped, and the rebuild fails. And worse, if a read error occurs the relevant disk is dropped, bringing the array down.

jahmon · December 2022

Thank you for the top notch support and patience!

NAS540 Shows Healthy but RAID degraded.

All Replies

Categories

Consumer Product Help Center