NAS542: HDD issues, but RAID status "healthy"
I'm having some issues with my RAID on NAS542, 4x8TB (ST8000DM004).
The problem: Disk 2 is likely degraded, but still running. I would like to ask for some assistance resolving it.
I noticed that when I login to mycloud.zyxel.me, the NAS status shows "
RAID statusWarning", while the used disk space on one of the two logical partitions is displayed wrong (it shows it uses 350 GB, while the data that I can access via webdav or web interface are surely over 1TB). However, when I log in to the web interface, it only shows that the RAID is "healthy" (no warning). (I am not sure whether the read-out of the used space on the other partition is correct, but it is probably at least in the right ballpark).
I searched the forum/web for similar problems, I can state this:
-Opening tweaks and viewing the disk log, I get lots of errors similar to this
"[ 605.917511] EXT4-fs error (device dm-2): ext4_lookup:1047: inode #52101255: comm zyxel_file_moni: deleted inode referenced: 62128248"
-I logged in over ssh as root and ran "cat /proc/mdstat", which shows that the arrays are resyncing. However, after observing for several hours/days, it never finishes, it may reach 10 % and then "crash" and start over (it shows 10-13000 minutes to finish, a few hours ago I saw it being at 11+ %, now it shows 0.3 %, so it had to start over, probably after running into some errors).
-I ran "smartctl -a /dev/sdd" on the problematic disk, which gave me errors like this:
"Error 55 occurred at disk power-on lifetime: 31491 hours (1312 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
25 00 00 ff ff ff ef 00 3d+05:23:11.286 READ DMA EXT
ef 10 02 00 00 00 a0 00 3d+05:23:11.268 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 3d+05:23:11.241 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 3d+05:23:11.238 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 3d+05:23:11.226 SET FEATURES [Set transfer mode]"
The SMART check, however, displays
"SMART overall-health self-assessment test result: PASSED"
-All of the values in SMART read either "old-age" or "pre-fail".
-I should probably try to run e2fsck (or so I found), but I don't really know how (the drive/device needs to be unmounted and I don't know if I can do that when the RAID is trying to sync). I can access (I think) all of the data on the logical partition for now, but I'm afraid that the drive might fail soon.
-I tried to set up the "smartctl -t short" test on the problematic drive, but I am not sure if it did something or how to view the result, or even if it can repair any errors. (I did reboot the NAS after setting up the test, hoping it will run on startup.)
-Once before I had to replace a failed drive, the arrays then got rebuilt (I suppose - I was able to access the NAS again with all the data after about 10 days or rebuilding). Thus, while sd[cdf] have all ~32k hours running time, sde has "only" 18k hrs.
-Probably unrelated issues:
-I commonly have trouble logging in as admin into the web interface - I get an error for wrong credentials. Rebooting from root over ssh "resets" this issue, sometimes it helps just to log in and log out as regular user into the web interface.
-I started having trouble copying files over webdav - I get various errors in windows file manager resulting in no files being copied. I can, however, connect an external drive to the NAS USB ports and transfer the files using cp command from powershell. This problem doesn't occur always, typically it occurs when large files are involved (100MB+).
-One of the LAN connections shows only as 100MBit instead of 1GBit. The other port works faster.
What would be the best course of actions now? I am not experienced with linux commands at all, so if the guidance can be as much step-by-step as possible, it would be much appreciated.
- All Categories
- 300 Beta Program
- 1.9K Nebula
- 102 Nebula Ideas
- 72 Nebula Status and Incidents
- 4.8K Security
- 3 USG FLEX H Series
- 242 Security Ideas
- 1.1K Switch
- 54 Switch Ideas
- 807 WirelessLAN
- 29 WLAN Ideas
- 5.5K Consumer Product
- 178 Service & License
- 309 News and Release
- 122 Success Stories
- 69 Security Advisories
- 19 Education Center
- 5 [Campaign] Zyxel Network Detective
- 1.3K FAQ
- 554 Nebula FAQ
- 343 Security FAQ
- 118 Switch FAQ
- 146 WirelessLAN FAQ
- 31 Consumer Product FAQ
- 110 Service & License FAQ
- 34 Documents
- 34 Nebula Monthly Express
- 70 About Community
- 56 Security Highlight