NAS540 degraded disk, tried repair, now volume down

Kuno
Kuno Posts: 24  Freshman Member
edited December 2021 in Personal Cloud Storage
I got a disk in the RAID 5 array, that went offline with a lot of "command timeouts". After a reboot of the NAS, it came back online, and teh NAS suggested that I repaired the volume. So I did that, and now the volume is down! there have not been anymore "command timeouts" on the problem disk.
I'm hopping someone has an idea, to how, if possible, I can get the array up and running.
Here is output from:
mdadm --examine /dev/sd[abcde]3

mdadm: cannot open /dev/sda3: No such device or address
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
           Name : NAS540:2  (local to host NAS540)
  Creation Time : Wed Apr  7 08:01:04 2021
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
     Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 14739b23:eaeee6f1:5d90fda3:d4c43395

    Update Time : Wed Dec  8 00:13:41 2021
       Checksum : 39e6164e - correct
         Events : 524

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AA.. ('A' == active, '.' == missing)
/dev/sdc3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
           Name : NAS540:2  (local to host NAS540)
  Creation Time : Wed Apr  7 08:01:04 2021
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
     Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8700f0d6:21358560:6ab88d29:7556bb54

    Update Time : Wed Dec  8 00:13:41 2021
       Checksum : a0f0a3a6 - correct
         Events : 524

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AA.. ('A' == active, '.' == missing)
/dev/sdd3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
           Name : NAS540:2  (local to host NAS540)
  Creation Time : Wed Apr  7 08:01:04 2021
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
     Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c0749a62:d560b5a8:0fcb26d4:46c23d65

    Update Time : Wed Dec  8 00:13:41 2021
       Checksum : 2fe6c20d - correct
         Events : 524

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : AA.. ('A' == active, '.' == missing)
/dev/sde3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
           Name : NAS540:2  (local to host NAS540)
  Creation Time : Wed Apr  7 08:01:04 2021
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
     Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
  Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c48a0e1b:48ca0d7c:72cd77a4:03fe7e41

    Update Time : Tue Dec  7 20:46:47 2021
       Checksum : 68494d74 - correct
         Events : 474

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing)

Best Answers

  • Kuno
    Kuno Posts: 24  Freshman Member
    Answer ✓
    I'm sorry to be a pain in your but, Mijzelf, but I can't find a share menu. Only the share section in control panel, where the shares are marked "lost". I can't do anything to them there. Is there another place, where I can access a shares menu?
  • Mijzelf
    Mijzelf Posts: 2,002  Guru Member
    Answer ✓
    In 'Control Panel'->'Privilege and Sharing'->'Shared Folders' all 'flat directories' in the volumes are listed as 'disabled'. You can enable them via the 'Edit share' icon.
    But now I see the given volume name is not the hexcode, but some user supplied name. As that name is stored on the filesystem, it is possible that the 'lost' shares hide the 'disabled' shares, as they are on different volumes, which yet have the same names. Confusing.

    To solve this you can change the user supplied volume name:

    cd /i-data/sysvol/.system/
    echo "Your own volume name" >name_label

    reboot
«1

All Replies

  • Mijzelf
    Mijzelf Posts: 2,002  Guru Member
    Can you also post the SMART values?
    smartctl -a /dev/sda
    smartctl -a /dev/sdb
    ...

    Two disks are dropped from the array, so now it's down. It might be possible to rebuild it degraded from 3 disks, depending on the damage of the disk. "Command timeouts" can point to a serious problem, so it's better not to include that disk in the array.

  • Kuno
    Kuno Posts: 24  Freshman Member
    Hi Mijzelf.
    It looks like /dev/sda is a USB disk, I have attached!


    smartctl -a /dev/sda:
    Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
    Device Model:     WDC WD30EZRX-00MMMB0
    Serial Number:    WD-WCAWZ1741961
    LU WWN Device Id: 5 0014ee 2b11bd91b
    Firmware Version: 80.00A80
    User Capacity:    3,000,592,982,016 bytes [3.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS (minor revision not indicated)
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)
    Local Time is:    Thu Dec  9 09:43:53 2021 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART Status command failed: scsi error medium or hardware error (serious)
    SMART overall-health self-assessment test result: PASSED
    Warning: This result is based on an Attribute check.

    General SMART Values:
    Offline data collection status:  (0x82) Offline data collection activity
                                            was completed without error.
                                            Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (49200) seconds.
    Offline data collection
    capabilities:                    (0x7b) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   2) minutes.
    Extended self-test routine
    recommended polling time:        ( 473) minutes.
    Conveyance self-test routine
    recommended polling time:        (   5) minutes.
    SCT capabilities:              (0x3035) SCT Status supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0027   150   139   021    Pre-fail  Always       -       9475
      4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7663
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
      9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       21356
     10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
     11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       4503
    192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       428
    193 Load_Cycle_Count        0x0032   184   184   000    Old_age   Always       -       48029
    194 Temperature_Celsius     0x0022   123   093   000    Old_age   Always       -       29
    196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
     
  • Kuno
    Kuno Posts: 24  Freshman Member
    smartctl -a /dev/sdb:

    Model Family:     Seagate NAS HDD
    Device Model:     ST4000VN000-1H4168
    Serial Number:    Z304CYC1
    LU WWN Device Id: 5 000c50 07ba1cae3
    Firmware Version: SC46
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5900 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Thu Dec  9 09:45:36 2021 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (  107) seconds.
    Offline data collection
    capabilities:                    (0x73) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            No Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        ( 510) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x10bd) SCT Status supported.
                                            SCT Error Recovery Control supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       10927672
      3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   086   086   020    Old_age   Always       -       15081
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       87843884
      9 Power_On_Hours          0x0032   041   041   000    Old_age   Always       -       52164
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       82
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   071   046   045    Old_age   Always       -       29 (Min/Max 22/31)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       46
    193 Load_Cycle_Count        0x0032   093   093   000    Old_age   Always       -       15081
    194 Temperature_Celsius     0x0022   029   054   000    Old_age   Always       -       29 (0 18 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
  • Kuno
    Kuno Posts: 24  Freshman Member

    smartctl -a /dev/sdc:

    Device Model:     ST4000DM004-2CV104
    Serial Number:    ZTT0AK1S
    LU WWN Device Id: 5 000c50 0c7f6500a
    Firmware Version: 0001
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5425 rpm
    Form Factor:      3.5 inches
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ACS-3 (unknown minor revision code: 0x006d)
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Thu Dec  9 09:47:58 2021 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (    0) seconds.
    Offline data collection
    capabilities:                    (0x73) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            No Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        ( 497) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x30a5) SCT Status supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   078   064   006    Pre-fail  Always       -       60417614
      3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2651
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   075   060   045    Pre-fail  Always       -       34839332
      9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5971 (82 189 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       14
    183 Runtime_Bad_Block       0x0032   086   086   000    Old_age   Always       -       14
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   070   053   040    Old_age   Always       -       30 (Min/Max 25/33)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       11
    193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5484
    194 Temperature_Celsius     0x0022   030   047   000    Old_age   Always       -       30 (0 19 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   078   064   000    Old_age   Always       -       60417614
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1910 (135 191 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       18980119668
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       10442242758

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
  • Kuno
    Kuno Posts: 24  Freshman Member
    smartctl -a /dev/sdd:

    Device Model:     ST4000DM004-2CV104
    Serial Number:    ZFN386QQ
    LU WWN Device Id: 5 000c50 0c58ef733
    Firmware Version: 0001
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5425 rpm
    Form Factor:      3.5 inches
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ACS-3 (unknown minor revision code: 0x006d)
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Thu Dec  9 09:51:23 2021 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (    0) seconds.
    Offline data collection
    capabilities:                    (0x73) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            No Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        ( 506) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x30a5) SCT Status supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   076   064   006    Pre-fail  Always       -       37671238
      3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2843
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   076   060   045    Pre-fail  Always       -       38944553
      9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       6055 (160 134 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       18
    183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       4295032833
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   069   052   040    Old_age   Always       -       31 (Min/Max 25/34)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       18
    193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5822
    194 Temperature_Celsius     0x0022   031   048   000    Old_age   Always       -       31 (0 20 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   076   064   000    Old_age   Always       -       37671238
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2058 (152 164 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       13829940399
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       16497836234

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.

  • Kuno
    Kuno Posts: 24  Freshman Member
    smartctl -a /dev/sde:

    Model Family:     Seagate NAS HDD
    Device Model:     ST4000VN000-1H4168
    Serial Number:    Z304TRAN
    LU WWN Device Id: 5 000c50 086d991d0
    Firmware Version: SC46
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5900 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Thu Dec  9 09:53:36 2021 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    See vendor-specific Attribute list for marginal Attributes.

    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                (   97) seconds.
    Offline data collection
    capabilities:                    (0x73) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            No Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        ( 496) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x10bd) SCT Status supported.
                                            SCT Error Recovery Control supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.

    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   100   099   006    Pre-fail  Always       -       24677966
      3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   087   087   020    Old_age   Always       -       14159
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       87649296
      9 Power_On_Hours          0x0032   041   041   000    Old_age   Always       -       52066
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       85
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   028   028   000    Old_age   Always       -       72
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   069   041   045    Old_age   Always   In_the_past 31 (2 66 35 22 0)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       50
    193 Load_Cycle_Count        0x0032   093   093   000    Old_age   Always       -       14159
    194 Temperature_Celsius     0x0022   031   059   000    Old_age   Always       -       31 (0 20 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       8
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       8
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

    SMART Error Log Version: 1
    ATA Error Count: 72 (device log contains only the most recent five errors)
            CR = Command Register [HEX]
            FR = Features Register [HEX]
            SC = Sector Count Register [HEX]
            SN = Sector Number Register [HEX]
            CL = Cylinder Low Register [HEX]
            CH = Cylinder High Register [HEX]
            DH = Device/Head Register [HEX]
            DC = Device Command Register [HEX]
            ER = Error register [HEX]
            ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.

    Error 72 occurred at disk power-on lifetime: 52043 hours (2168 days + 11 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 08 00 00 00 40 00   1d+03:30:32.423  READ FPDMA QUEUED
      60 00 f8 ff ff ff 4f 00   1d+03:30:32.396  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:32.396  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:32.395  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:32.395  READ FPDMA QUEUED

    Error 71 occurred at disk power-on lifetime: 52043 hours (2168 days + 11 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 00 ff ff ff 4f 00   1d+03:30:28.014  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:28.013  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:28.013  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:27.931  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:27.931  READ FPDMA QUEUED

    Error 70 occurred at disk power-on lifetime: 52043 hours (2168 days + 11 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 00 ff ff ff 4f 00   1d+03:30:23.473  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:23.472  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:23.472  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:23.471  READ FPDMA QUEUED
      60 00 00 ff ff ff 4f 00   1d+03:30:23.471  READ FPDMA QUEUED

    Error 69 occurred at disk power-on lifetime: 52043 hours (2168 days + 11 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 08 08 10 3d 40 00   1d+03:30:19.050  READ FPDMA QUEUED
      60 00 08 00 10 3d 40 00   1d+03:30:19.049  READ FPDMA QUEUED
      60 00 08 70 0f 7a 40 00   1d+03:30:19.049  READ FPDMA QUEUED
      60 00 08 00 0f 7a 40 00   1d+03:30:19.049  READ FPDMA QUEUED
      60 00 08 00 10 3d 40 00   1d+03:30:19.045  READ FPDMA QUEUED

    Error 68 occurred at disk power-on lifetime: 52043 hours (2168 days + 11 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 00 08 ff ff ff 4f 00   1d+03:30:14.556  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:14.535  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:14.535  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:14.535  READ FPDMA QUEUED
      60 00 08 ff ff ff 4f 00   1d+03:30:14.535  READ FPDMA QUEUED

    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]

    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
  • Kuno
    Kuno Posts: 24  Freshman Member
    USB disk is removed now. Here is a new output of

    mdadm --examine /dev/sd[abcd]3

    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Wed Apr  7 08:01:04 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 14739b23:eaeee6f1:5d90fda3:d4c43395

        Update Time : Wed Dec  8 00:13:41 2021
           Checksum : 39e6164e - correct
             Events : 524

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 0
       Array State : AA.. ('A' == active, '.' == missing)
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Wed Apr  7 08:01:04 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 8700f0d6:21358560:6ab88d29:7556bb54

        Update Time : Wed Dec  8 00:13:41 2021
           Checksum : a0f0a3a6 - correct
             Events : 524

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 1
       Array State : AA.. ('A' == active, '.' == missing)
    /dev/sdc3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Wed Apr  7 08:01:04 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660736 (11166.25 GiB 11989.67 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : c0749a62:d560b5a8:0fcb26d4:46c23d65

        Update Time : Wed Dec  8 00:13:41 2021
           Checksum : 2fe6c20d - correct
             Events : 524

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : spare
       Array State : AA.. ('A' == active, '.' == missing)
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Wed Apr  7 08:01:04 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : c48a0e1b:48ca0d7c:72cd77a4:03fe7e41

        Update Time : Tue Dec  7 20:46:47 2021
           Checksum : 68494d74 - correct
             Events : 474

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 3
       Array State : AAAA ('A' == active, '.' == missing)

  • Mijzelf
    Mijzelf Posts: 2,002  Guru Member
    OK, it looks like your last disk was causing the timeout errors, as it has a lot of READ FPDMA QUEUED errors, and it was dropped from the array first.
    So let's see if the array can be rebuilded without it. The command to create a new array around the existing filesystem is
    mdadm --stop /dev/md2
    mdadm --create --assume-clean --level=5  --raid-devices=4 --metadata=1.2 --chunk=64K  --layout=left-symmetric /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 missing
    That are two lines each starting with mdadm. The first command can error out, it is not clear to me if the array is up or not.
    Using this command the USB disk should not be connected. (It's not actually a problem, but the disk device names can be different)


  • Kuno
    Kuno Posts: 24  Freshman Member
    edited December 2021
    It said that there an  exiting array, should it create a new one? I wrote "y" and now I have a Volume 1 with disk 1,2 and 3, and it is marked orange and Zyxel says it's degraded and I should repair, but there is no repair button.  Should I have said no, to create a new array? Disk 4 is marked with status red and as hotspare in Zyxels overview. 
    New output:
    mdadm --examine /dev/sd[abcd]3
    /dev/sda3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8beaf982:073ce48c:da730e41:7c327a55
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Fri Dec 10 17:18:14 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : a131151d:27b790bf:ef167a59:a6501257

        Update Time : Fri Dec 10 17:31:40 2021
           Checksum : aa032c24 - correct
             Events : 78

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 0
       Array State : AAA. ('A' == active, '.' == missing)
    /dev/sdb3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8beaf982:073ce48c:da730e41:7c327a55
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Fri Dec 10 17:18:14 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 5f079e22:03e7208e:ed360eef:76c7bfa7

        Update Time : Fri Dec 10 17:31:40 2021
           Checksum : 645dc88e - correct
             Events : 78

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 1
       Array State : AAA. ('A' == active, '.' == missing)
    /dev/sdc3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8beaf982:073ce48c:da730e41:7c327a55
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Fri Dec 10 17:18:14 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 9d97bd47:8000d692:ccef5a0f:15e162b6

        Update Time : Fri Dec 10 17:31:40 2021
           Checksum : bd2244c7 - correct
             Events : 78

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 2
       Array State : AAA. ('A' == active, '.' == missing)
    /dev/sdd3:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : f0577b65:283a8be2:235c4865:251ac2c8
               Name : NAS540:2  (local to host NAS540)
      Creation Time : Wed Apr  7 09:01:04 2021
         Raid Level : raid5
       Raid Devices : 4

     Avail Dev Size : 7805773824 (3722.08 GiB 3996.56 GB)
         Array Size : 11708660160 (11166.25 GiB 11989.67 GB)
      Used Dev Size : 7805773440 (3722.08 GiB 3996.56 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : c48a0e1b:48ca0d7c:72cd77a4:03fe7e41

        Update Time : Tue Dec  7 20:46:47 2021
           Checksum : 68494d74 - correct
             Events : 474

             Layout : left-symmetric
         Chunk Size : 64K

       Device Role : Active device 3
       Array State : AAAA ('A' == active, '.' == missing)

  • Mijzelf
    Mijzelf Posts: 2,002  Guru Member
    Can you access your data? It is expected behavior that the array is degraded. It's a 4 disk array build from 3 disks. The 4th disk was dropped for a reason, it (or the slot it is in) has severe hardware errors.
    I think you have no repair button because you have no empty disks. The one which seems eligible (sdd) also seems to be member of another array.
    sdd has to be replaced, I think, but if this is your only copy of the data I think you should backup first. It's not clear why sdc was dropped, causing your initial problem. Maybe it's caused by some interference with sdd, when you tried to re-add sdd to the array, but it might also have an error not listed by SMART, which will blow your array again when you add a 4th disk.


Consumer Product Help Center