NAS540 hard disk failure reconstruction abnormality

Mijzelf · December 2019

OK. You have to pull all disks, except sdd, and the new disk to which you want to copy.

Download the dd_rescue package here: https://zyxel.diskstation.eu/Users/Mijzelf/Tools/ , and put it on the nas, in /bin/. You can use WinSCP for that.

Then open a shell on the nas, and execute

cd /bin/

tar xf *.tgz

If you run dd_rescue now, you should get a warning that you have to specify the in- and output.

run

mdadm --examine /dev/sd[ab]3

This should show the new name of sdd, it's the one with 'Device Role : Active device 2'. The other one is the new disk.

Let's assume the old sdd now is sda, and the new disk is sdb, then the command is

dd_rescue /dev/sda /dev/sdb

This will take several hours, maybe days, depending on the quality of sdd. You'll have to keep the terminal open all that time.

After that, remove sdd, and put the other original 2 disks back, and repost the output of

mdadm --examine /dev/sd[abcd]3

cat /proc/mdstat

RiceC · December 2019

Dear Sir,

dd_rescue I've run it a few days ago, my current situation is: the original 2 disks and the bit-copy hard disk, 
I can see the RAID when I boot, but I have to insert a new hard disk to let him It is a normal four-disk RAID5 mechanism, 
but the error that occurred at the beginning will still occur. The new hard disk that I copied using dd_rescue still has the error of the disk sector. 
Is there an error to skip and allow the new hard disk to rebuild RAID What about the instructions?
Thank you.

Just like you said above:
So it is possible that if you re-create this (degraded) array from the command line, using --assume-clean, that you can copy away al your files, without triggering this error again.

Mijzelf · December 2019

OK. According to your post on 21 November, your array status went from [_UUU] when rebuild started to [_U_U] when the hardware failure occurred. So the array has to be rebuild from the 'Active devices' 1..3, as Active Device 0 is never completely synced.

According to your post on 5 December, the 'Active devices' 1..3 are the partitions sdb3, sdd3 and sdc3.

The command to recreate the array with these 3 members on these roles is

mdadm --stop /dev/md2

mdadm --create --assume-clean --level=5 --raid-devices=4 --metadata=1.2 --chunk=64K --layout=left-symmetric /dev/md2 missing /dev/sdb3 /dev/sdd3 /dev/sdc3

That are 2 lines, both starting with mdadm.

RiceC · December 2019

親愛的先生，

以下消息出現在命令中。我該怎麼辦？謝謝。

〜$ mdadm --stop /開發/ md2

mdadm：必須是超級用戶才能執行此操作

〜$ sudo mdadm --stop /開發/ md2

-sh：sudo：找不到

〜＃

〜$ su根

密碼：

BusyBox v1.19.4（2019-09-04 14:33:19 CST）內置外殼（ash）

輸入“幫助”以獲取內置命令列表。

〜＃

〜＃mdadm --stop / dev / md2

mdadm：無法獨占訪問/ dev / md2：也許正在運行的進程，已掛載的文件系統或活動的捲組？

〜＃

/dev/mapper # pvdisplay

--- Physical volume ---

PV Name /dev/md2

VG Name vg_28524431

PV Size 16.36 TiB / not usable 3.81 MiB

Allocatable yes (but full)

PE Size 4.00 MiB

Total PE 4289348

Free PE 0

Allocated PE 4289348

PV UUID 2L3zxx-baO6-JlSj-Y88b-Jr5I-hBo3-i20If6

Mijzelf · December 2019

Does the NAS support some eastern language? Amazing.

Anyway, the command to deactivate the logical volume is

vgchange -an

which has to be executed before the mdadm --stop.

RiceC · December 2019

Dear Sir,

Still some errors, can you please help?

~ # vgchange -an   Logical volume vg_28524431/vg_info_area contains a filesystem in use.   Can't deactivate volume group "vg_28524431" with 3 open logical volume(s) ~

# df -h Filesystem                Size      Used Available Use% Mounted on ubi7:ubi_rootfs2         90.7M     48.7M     37.4M 57% /firmware/mnt/nand /dev/md0                  1.9G    178.9M      1.6G 10% /firmware/mnt/sysdisk /dev/loop0              139.5M    123.1M     16.4M 88% /ram_bin /dev/loop0              139.5M    123.1M     16.4M 88% /usr /dev/loop0              139.5M    123.1M     16.4M 88% /lib/security /dev/loop0              139.5M    123.1M     16.4M 88% /lib/modules /dev/loop0              139.5M    123.1M     16.4M 88% /lib/locale /dev/ram0                 5.0M      4.0K      5.0M   0% /tmp/tmpfs /dev/ram0                 5.0M      4.0K      5.0M   0% /usr/local/etc ubi3:ubi_config           2.4M    160.0K      2.0M   7% /etc/zyxel /dev/mapper/vg_28524431-lv_7f8dcf8b                         366.3G    194.5M    366.1G   0% /i-data/7f8dcf8b /dev/mapper/vg_28524431-lv_7ec47419                          15.9T     10.3T      5.5T 65% /i-data/7ec47419 /dev/mapper/vg_28524431-vg_info_area                          96.9M      4.1M     92.8M   4% /mnt/vg_info_area/vg_28524431 /dev/mapper/vg_28524431-lv_7ec47419                          15.9T     10.3T      5.5T 65% /usr/local/apache/htdocs/desktop,/pkg /dev/mapper/vg_28524431-lv_7ec47419                          15.9T     10.3T      5.5T 65% /usr/local/mysql ~ #

~ # umount /i-data/7f8dcf8b ~ #

~ # umount /mnt/vg_info_area/vg_28524431 ~ #

~ # umount /usr/local/apache/htdocs/desktop,/pkg ~ #

~ # umount /usr/local/mysql ~ #

~ # umount /i-data/7ec47419 umount: /i-data/7ec47419: target is busy         (In some cases useful info about processes that          use the device is found by lsof(8) or fuser(1).)

~ # vgchange -an   Logical volume vg_28524431/lv_7ec47419 contains a filesystem in use.   Can't deactivate volume group "vg_28524431" with 1 open logical volume(s) ~ #

~ # vgdisplay
--- Volume group ---
VG Name vg_28524431
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 16.36 TiB
PE Size 4.00 MiB
Total PE 4289348
Alloc PE / Size 4289348 / 16.36 TiB
Free PE / Size 0 / 0
VG UUID B2vAgC-DwH6-jxC5-aHqz-sUmH-PZEv-ekuduP

~ # pvdisplay
--- Physical volume ---
PV Name /dev/md2
VG Name vg_28524431
PV Size 16.36 TiB / not usable 3.81 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 4289348
Free PE 0
Allocated PE 4289348
PV UUID 2L3zxx-baO6-JlSj-Y88b-Jr5I-hBo3-i20If6

~ # lvdisplay
--- Logical volume ---
LV Path /dev/vg_28524431/vg_info_area
LV Name vg_info_area
VG Name vg_28524431
LV UUID Mxq7Cr-NnAg-fKTi-WwWT-j0zh-ltOU-MKPtoX
LV Write Access read/write
LV Creation host, time NAS540, 2015-11-10 15:29:21 +0800
LV Status available
# open 0
LV Size 100.00 MiB
Current LE 25
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 3072
Block device 253:0

--- Logical volume ---
LV Path /dev/vg_28524431/lv_7ec47419
LV Name lv_7ec47419
VG Name vg_28524431
LV UUID uLjVxB-x3NU-l70j-tdT1-Gi26-JgLx-5jNzMc
LV Write Access read/write
LV Creation host, time NAS540, 2015-11-10 15:29:22 +0800
LV Status available
# open 1
LV Size 16.00 TiB
Current LE 4194048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 3072
Block device 253:1

--- Logical volume ---
LV Path /dev/vg_28524431/lv_7f8dcf8b
LV Name lv_7f8dcf8b
VG Name vg_28524431
LV UUID c0VVFV-imCI-5qLH-BFmV-oxPR-Xfpe-pMCvcJ
LV Write Access read/write
LV Creation host, time NAS540, 2016-01-24 15:18:37 +0800
LV Status available
# open 0
LV Size 372.17 GiB
Current LE 95275
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 3072
Block device 253:2

Mijzelf · December 2019

We are drifting away.

There was a raid5 array ABDC.
Disk A failed, degrading the array,
and was exchanged by a new disk A'.
On resync disk D appeared to have a hardware error,
causing the sync to stop,
and drop D from the array, which is now down.

2 possible solutions:

Recreate the (degraded) array _BDC, so that the content can be copied away, hoping the hardware error is in slack space, so it won't be triggered.
Make a bitwise copy of D to a new disk D', recreate the array _BD'C, and resync to A'BD'C.

Now you are trying to apply solution 1, but that fails because the array cannot be stopped, because it contains a logical volume, which is mounted.

If that is the case, the array is not down, but contains a valid filesystem, which, according to your df output, contains 10.3TB data. So you can copy away your data. Re-creating the array will change nothing.

NAS540 hard disk failure reconstruction abnormality

All Replies

Categories

Consumer Product Help Center