NAS540 hard disk failure reconstruction abnormality

RiceC · November 2019

Dear Sir,

After I replaced the failed hard disk, the place of resyncing has been stuck at 0.2%. What is the problem?

#NAS_Nov_2019

RiceC · November 2019

unused devices: <none>

~ $ cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md2 : active raid5 sdb3[4] sda3[5] sdc3[6] sdd3[2]

17569173504 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

[>....................] recovery = 0.2% (16664192/5856391168) finish=16375.7min speed=5943K/sec

md1 : active raid1 sda2[6] sdb2[5] sdc2[4] sdd2[7]

1998784 blocks super 1.2 [4/4] [UUUU]

md0 : active raid1 sda1[6] sdb1[5] sdc1[7] sdd1[4]

1997760 blocks super 1.2 [4/4] [UUUU]

~ $ cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md2 : active raid5 sdb3[4] sda3[5](S) sdc3[6] sdd3[2](F)

17569173504 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [_U_U]

md1 : active raid1 sda2[6] sdb2[5] sdc2[4] sdd2[7]

1998784 blocks super 1.2 [4/4] [UUUU]

md0 : active raid1 sda1[6] sdb1[5] sdc1[7] sdd1[4]

1997760 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

Mijzelf · November 2019

md2 : active raid5 sdb3[4] sda3[5](S) sdc3[6] sdd3[2](F)

Disk sdd (probably disk 4) failed while rebuilding the array. Now the array is down, as 2 disks is not enough to build the array.

Does SMART say anything about disk4?

RiceC · November 2019

Dear Sir,
The hard drive's SMART is normal. 
I feel that there is a problem with a certain block of the hard disk. Is there a solution?
Thanks a lot.

Mijzelf · November 2019

Your disk sdd has an hardware failure, sector 41578064 cannot be read. That is around 20GB, or around 16GB from the start of data partition. As you can read in the log, the array starts re-syncing at 78 seconds, and the failure pops up at 1200 seconds, which means the array was re-syncing at about 16GB/1122seconds=14.2MB/sec. That is low, so I think there are more problems with this disk. Strange that SMART is OK.

It is possible that sector 41578064 is not in use. The raidmanager cannot know, as it functions below the filesystem level, and so it syncs everything, no matter if in use or not.

So it is possible that if you re-create this (degraded) array from the command line, using --assume-clean, that you can copy away al your files, without triggering this error again.

However, as the slow sync-speed suggests that there is more wrong with that disk, it is possible that the disk will die during the copy.

When your data is valuable, I think the only sane way to handle this is by making a bitwise copy from disk sdd to a new disk, using dd_rescue or a similar tool. Then re-create the degraded array manually with the new disk, and using --assume-clean.

And then you can add a new 4th disk.

BTW, your array seems to be 16.3TiB in size. Are you running firmware <5.10? Since firmware 5.10 a volume can't exceed 16TiB.

RiceC · November 2019

Dear Sir,
Thank you very much for your reply. I tried to use the dd_rescue tool for bitwise copying. This is followed by re-creating the degraded array with --assume-clean and then adding a new disk. Any effect, I will come back again. The firmware currently in use is V5.21 (AATB.3)

RiceC · December 2019

Dear Sir,

I am not sure how to give instructions to ensure that data is not lost and restore the RAID mechanism. Can you please help?
ex: mdadm --create --assume-clean?

Mijzelf · December 2019

Can you post the output of

mdadm --examine /dev/sd[abcd]3

RiceC · December 2019

Dear Sir,

~ # mdadm --examine /dev/sd[abcd]3

/dev/sda3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x2

Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1

Name : NAS540:2 (local to host NAS540)

Creation Time : Tue Nov 10 15:29:20 2015

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569173504 (16755.27 GiB 17990.83 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

Recovery Offset : 33356152 sectors

State : clean

Device UUID : a40e16eb:f6263576:1bef532d:551ba599

Update Time : Thu Dec 5 11:34:31 2019

Checksum : 1c01ab93 - correct

Events : 234561

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 0

Array State : AA.A ('A' == active, '.' == missing)

/dev/sdb3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1

Name : NAS540:2 (local to host NAS540)

Creation Time : Tue Nov 10 15:29:20 2015

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569173504 (16755.27 GiB 17990.83 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 99f23022:bdaedd7f:c125470f:ef1827d9

Update Time : Thu Dec 5 11:34:31 2019

Checksum : 7c3d1e8b - correct

Events : 234561

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 1

Array State : AA.A ('A' == active, '.' == missing)

/dev/sdc3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1

Name : NAS540:2 (local to host NAS540)

Creation Time : Tue Nov 10 15:29:20 2015

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569173504 (16755.27 GiB 17990.83 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : d12d0b15:04babef0:d036cc64:dbc69dcb

Update Time : Thu Dec 5 11:34:31 2019

Checksum : 27f45190 - correct

Events : 234561

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 3

Array State : AA.A ('A' == active, '.' == missing)

/dev/sdd3:

Magic : a92b4efc

Version : 1.2

Feature Map : 0x0

Array UUID : 28524431:c959c258:2ab11b6d:2bb4adc1

Name : NAS540:2 (local to host NAS540)

Creation Time : Tue Nov 10 15:29:20 2015

Raid Level : raid5

Raid Devices : 4

Avail Dev Size : 11712782336 (5585.09 GiB 5996.94 GB)

Array Size : 17569173504 (16755.27 GiB 17990.83 GB)

Data Offset : 262144 sectors

Super Offset : 8 sectors

State : clean

Device UUID : 77d95206:b3029d75:0ee7a4e3:1c5b8cd8

Update Time : Thu Dec 5 11:28:22 2019

Checksum : 29e08eee - correct

Events : 234556

Layout : left-symmetric

Chunk Size : 64K

Device Role : Active device 2

Array State : AAAA ('A' == active, '.' == missing)

~ #

Mijzelf · December 2019

This is hard to interpret. According to this dump, the array is up, yet degraded.

The volume was created on Tue Nov 10 15:29 2015. Today, at 11:28 (localtime?) disk sdd is dropped, and the rest of the disks were last updated on 11:34. Those disks agree that they're up with 3 members.

So according to this dump it makes no sense to recreate the array, as it's up. Don't know what to say.

RiceC · December 2019

Dear Sir,

My situation is the same as described above. I can't read sector 41578064. What complete instructions should I use without triggering this error again? I making a bitwise copy from disk sdd to a new disk, using dd_rescue or a similar tool. I want to know what instructions can be re-create the degraded array manually with the new disk, and using --assume-clean.

NAS540 hard disk failure reconstruction abnormality

All Replies

Categories

Consumer Product Help Center