NAS540 Shows Healthy but RAID degraded.

When I look at my disks in storage manager they are all green.  When I check them with SMART, two show up as BAD.  So in theory I have Disk 1 and 4 BAD in RAID 5.  Yet, I can still get access to all my data, no problem.  

I ran 'repair' three times and the logs say it was successful but I keep getting the RAID degraded message.   My GUESS is that those two drive have a small number of sectors reallocated, so SMART is flagging them as bad, and that is triggering the degraded notification?  

I've done restarts twice, same results.  

TIA

Accepted Solution

  • Mijzelf
    Mijzelf Posts: 2,216
    100 Answers 1000 Comments Friend Collector Fifth Anniversary
     Guru Member
    edited December 2022 Answer ✓
    jahmon said:
    So I looked through it more carefully, right in the beginning of the "SMART Data Section" there is a line "SMART overall-health self-assessment test result:"  Drive A shows "Passed", Drive D shows "Failed".  As D is the RAID 5 parity drive, this explains why it's degraded, but I can still access the data and the rebuild fails while processing.  Have I got it?  
    More or less. There is no parity drive in RAID5, the parity blocks are equally distributed over all disks. This is done to maximize the read speed (on a healthy raid array the parity blocks are not used for reading, and so it's a waste to not use a whole disk + it's bandwidth) and to minimize the penalty when a random disk fails.
    The raid manager is pretty dumb. When rebuilding the array is simply calculates the content of the 'new' disk from the total surface of the 3 others (the raid manager doesn't know about filesystems, and so doesn't know if a particular sector is used or not), and writes that to the disk. When a write error occurs the new disk is dropped, and the rebuild fails. And worse, if a read error occurs the relevant disk is dropped, bringing the array down.

«1

All Replies

  • Mijzelf
    Mijzelf Posts: 2,216
    100 Answers 1000 Comments Friend Collector Fifth Anniversary
     Guru Member
    The raid manager will drop a disk as soon as it throws an I/O error on a read or write action. It doesn't keep track of bad sectors itself, but leaves that to the disks.
    A reallocated sector is, well, reallocated, and so it's healthy. When the sector is found bad, the disk reallocates it with a spare sector. It won't throw an 'bad' error a second time. Yet is is possible that there is a Current_Pending_Sector, which is a sector which is physically healthy, yet has a wrong checksum, so it can't be read. It has to be written first. That kind of sector will throw read errors on each attempt to read them, until they are written once.
    SMART won't call a disk bad for Current_Pending_Sector's.

    Having said that, when SMART says a disk is bad, it has to be replaced. If you ever have an I/O error on a degraded raid array, that disk will be dropped leaving you with a 'down' array. In that case there is no easy way to restore your array.

  • jahmon
    jahmon Posts: 12
    First Comment
    Thanks.
  • jahmon
    jahmon Posts: 12
    First Comment
    A follow-up here.  To repeat, I have 'healthy' showing everywhere on my NAS540 except for the 'RAID Degraded" and the SMART 'BAD' indications.  I looked more carefully and both of the disks that are 'BAD' are Hitachi's Ultrastar A7K2000, HUA722020ALA331.   I can write and read data no problem.  Is this a compatibility error?  The compatibility list shows "HDS" and my drives are 'HUA"   TIA!
  • Mijzelf
    Mijzelf Posts: 2,216
    100 Answers 1000 Comments Friend Collector Fifth Anniversary
     Guru Member
    Is this a compatibility error?

    Probably not. There is no 'compatibility list', only a 'verified hard disk list'. Sata is sata, and so all disks are supposed to work, but not all models are actually tested, of course.

    Does SMART give any information about why the disk is bad?


  • jahmon
    jahmon Posts: 12
    First Comment
    Yes - see attached.  These don't make any sense to me as written.  If this is accurage almost every parameter is overthreshold, and some thresholds are nonsensical.  For example temperature and operating hours
  • jahmon
    jahmon Posts: 12
    First Comment
    ...and still shows all disks 'green'  note 1 and 4 are the Hitachis flagged as 'BAD' by SMART. 
  • Mijzelf
    Mijzelf Posts: 2,216
    100 Answers 1000 Comments Friend Collector Fifth Anniversary
     Guru Member
    Indeed the SMART info in the webinterface is hardly usable. You'd better look at the output of the smartctl tool.
    Login in the NAS over ssh, and execute
    su
    smartctl -a /dev/sda
    smartctl -a /dev/sdd
    (I choose sda and sdd as I suppose that will be the device nodes of your both Hitachi's. That can be different when you have an USB disk or SD card connected, or if the NAS just acts weird. Have a look with 'cat /proc/partitions' to see all device nodes in use)
  • jahmon
    jahmon Posts: 12
    First Comment
    I tried to do this with PUTTY and WinSCP, but can't get it configured correctly.  Do you have a guide on establishing an SSH session?  thanks.
  • Mijzelf
    Mijzelf Posts: 2,216
    100 Answers 1000 Comments Friend Collector Fifth Anniversary
     Guru Member
    Did you enable the ssh server in config->network->terminal?
  • jahmon
    jahmon Posts: 12
    First Comment
    That worked, thanks.  Let me get the results and review.  Sorry for the slow response, I'm sort of doing this between other priorities.  

Consumer Product Help Center