NAS326 problem

Janikovo · June 2020

@Mijzelf

first I disassemble NAS, remove both disks, then check one by one if they are working (rotating plates)...they are seems both working as they should.

Then I tried again
cat /proc/mdstat

Image: https://us.v-cdn.net/6029482/uploads/editor/y1/h0xeetq5q7vh.jpg

/proc/mdstat shows the status of the 'multi device' devices (raid arrays) - well, there is a change...now it shows 3 of them

cat /proc/partitions

cat /proc/mounts

Image: https://us.v-cdn.net/6029482/uploads/editor/b0/t4650xsishs6.jpg

mdadm --examine /dev/sda3

Image: https://us.v-cdn.net/6029482/uploads/editor/6v/oo873ndxe3fh.jpg

a big change when you could see the result of this command two days ago

mdadm --examine /dev/sdb3

Image: https://us.v-cdn.net/6029482/uploads/editor/wi/3wslmc4upt7k.jpg

Well, now, could anybody please explain if my NAS326 is OK ?

Because this one here md2: seems to be inactive...

Should I somehow activate it ? and how ?

Thanks in advance

Janikovo · June 2020

Well,

I am really confused, so I did try this

Image: https://us.v-cdn.net/6029482/uploads/editor/cf/k3wwq6mim92o.jpg

and then mdadm --examine /dev/sda1

Image: https://us.v-cdn.net/6029482/uploads/editor/xl/ycrlh2xq2xh3.jpg

mdadm --examine /dev/sda2

Image: https://us.v-cdn.net/6029482/uploads/editor/n7/hj5ug7tr6po8.jpg

mdadm --examine /dev/sda3

Image: https://us.v-cdn.net/6029482/uploads/editor/dm/9os1s0nos2u7.jpg

What am I missing ?
I am not expert, bud sda1 is alive as Homesan:0, sda2 is alive as Homesan:1 and sda3 is alive as Homesan:2.

But I have only two HDDs there.

Anybody could help me to understand and try to rebuild SAN with all data that seems to be on both drives ?

Thanks

Mijzelf · June 2020

What am I missing ?

I am not expert, bud sda1 is alive as Homesan:0, sda2 is alive as Homesan:1 and sda3 is alive as Homesan:2.

A firmware 5 ZyXEL NAS creates 2 raid1 raidarrays for own use. One for system, and one swap. It is raid1, so the NAS won't go down when a disk fails. So the 2 small partitions are for internal use, the big partition for data.

Anyway, it seems the raid headers are suddenly readable after reinserting the disk. Did you exchange them? In your first post sdb3 was 'Active device 1', and now it's 'Active device 0'. That data is written in the raid header, while the name of the disk is assigned by the disk slot. If you didn't exchange them, the sequence of detection is changed, which might be caused by a failing sata port, or something like that.

If you did exchange them, it's okay. The array manager can handle that. It's not clear to me why md2 is inactive. What says

mdadm --detail /dev/md2

BTW, you can copy text from PuTTY just by selecting it.

Janikovo · June 2020

This:

~ # mdadm --detail /dev/md2

mdadm: md device /dev/md2 does not appear to be active.

...and...

~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]

md2 : inactive sda3[1](S)

1949514744 blocks super 1.2

md1 : active raid1 sdb2[2](F) sda2[1]

1998784 blocks super 1.2 [2/1] [_U]

md0 : active raid1 sdb1[2](F) sda1[1]

1997760 blocks super 1.2 [2/1] [_U]

And system says again:

And what I found is also this:

So what now please ?
e2fsck ? running this command assumes that those disks should be unmounted...?

In fact how could I know which one is Disk2 ?
Is that one, which is place in Slot number 2 ?

And thanks Mijzelf you taking care, many many thanks.

Janikovo · June 2020

@Mijzelf

Well, now I am totally lost...
found this in dmesg:
...

md/raid1:md1: Disk failure on sdb2, disabling device.

md/raid1:md1: Operation continuing on 1 devices.

...

and then this with fdisk -l:

Disk /dev/loop0: 143 MiB, 149946368 bytes, 292864 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock0: 2 MiB, 2097152 bytes, 4096 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock1: 2 MiB, 2097152 bytes, 4096 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock2: 10 MiB, 10485760 bytes, 20480 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock3: 15 MiB, 15728640 bytes, 30720 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock4: 106 MiB, 111149056 bytes, 217088 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock5: 15 MiB, 15728640 bytes, 30720 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mtdblock6: 106 MiB, 111149056 bytes, 217088 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disklabel type: gpt

Disk identifier: BA8CDAD3-A2FC-4169-84F4-2DF43580FC17

Device Start End Sectors Size Type

/dev/sda1 2048 3999743 3997696 1.9G Linux RAID

/dev/sda2 3999744 7999487 3999744 1.9G Linux RAID

/dev/sda3 7999488 3907028991 3899029504 1.8T Linux RAID

Disk /dev/md0: 1.9 GiB, 2045706240 bytes, 3995520 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/md1: 1.9 GiB, 2046754816 bytes, 3997568 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 4096 bytes

I/O size (minimum/optimal): 4096 bytes / 4096 bytes

tried this:

~ # e2fsck /dev/sdb1

e2fsck 1.42.12 (29-Aug-2014)

/dev/sdb1 is in use.

e2fsck: Cannot continue, aborting.

~ # e2fsck /dev/sdb2

e2fsck 1.42.12 (29-Aug-2014)

/dev/sdb2 is in use.

e2fsck: Cannot continue, aborting.

~ # e2fsck /dev/sda3

e2fsck 1.42.12 (29-Aug-2014)

/dev/sda3 is in use.

e2fsck: Cannot continue, aborting.

and cannot see how to umount them or see why they are "in use".

Any advice pls ?

Mijzelf · June 2020

The disk which is now sdb is dying, and seems to be intermittently readable, as first the readheaders were gone, and later they were readable.

You should not try to repair the filesystem on this disk, as each write can add further damage. The 'classic' way to try to solve this kind of problems is to create a byte-by-byte copy of that disk, and to the filesystem repair on the copy.

In Linux there is a tool for that, ddrescue. The syntax is

ddrescue /dev/sda /dev/sdb [ /path/to/logfile ]

where sda is the source disk, sdb is the target, and logfile is a file where ddrescue keeps track of copied sectors. Using that logfile you can do several passes, where each time ddrescue tries to fill up sectors which failed last time. Without logfile it just tries to read every sector, repeatedly, if it fails, and if it ultimately fails the target sector is filled with zeros.

Maybe there are tools like this for other OSses, but I'm not aware of that.

So, if your data is valuable, get a new disk of at least the same size, connect it, together with the failing disk to a Linux box, and run ddrescue. Make sure and doublecheck that you have the target disk right. ddrescue will happily overwrite everything you offer as target. If that is your failing disk, your data is lost forever.

ddrecue is in most cases not by default installed. So you'll have to install it using the package manager of you Linux flavour.

The copying can take days, as ddrescue will try hard to read each sector. So a lot of unreadable sectors will slow down the process. On a healthy disk you can expect something like 100MB/sec

In fact how could I know which one is Disk2 ?
Is that one, which is place in Slot number 2 ?

I think so. But it's easy to know. Just pull that disk, boot the NAS and run

mdadm --examine /dev/sda3

It should show an 'Device Role : Active device 1', then the healthy disk is still in the NAS.

and cannot see how to umount them or see why they are "in use".

They are in use because they are part of a raid array. If you want to check them, you'll have to check the md (multi-disk) device. Which cannot be done for sdb2, as it's swap, without filesystem, and for sdb3 as the array is inactive due to disk errors.

Janikovo · June 2020

@Mijzelf

Thank you, I have one more question.
There were two HDDs in RAID1, which I understand the Disk1 has the same data as the Disk2.
They are "copy". RAID1 consists of an exact copy (or mirror) of a set of data on two disk.

Should I understand that all data are still alive on the healthy HDD ?

You mentioned that I can loss all data....my question is - when I buy second healthy disk, is not NAS 326 capable to repair itself ?

The second question - Why I lost "the volume" ? As I understood, when disk1 is failing, will be disconnected by NAS software itself and NAS should be working on that one disk with healthy status and all the data.

I was planning to buy new 2TB disk and simply put in into NAS326, creating "Disk group" and volume will get repaired itself.

Or by mdadm - assembe command.

As I dont have any Linux PC I have to buy something like this

https://www.alza.de/premiumcord-usb-2-0-konverter-gt-40-44-ide-oder-sata-2-5-und-3-5-hdd-netzteil-d66597.htm?o=7

or
https://www.alza.de/premiumcord-converter-usb-3-0-gt-sata-2-5-und-3-5-gerate-netzteil-d191813.htm?o=1

or

https://www.alza.de/premiumcord-usb-3-0-sata-iii-d251654.htm?o=2

Then Try to use some distribution of the "livecd" I dont know which is best for this.
SystemRescueCD ?, Arch Linux ? or Kali Linux ?

And then I can try those steps you mentioned - run ddrescue.

But again , I will buy new Western Digital Blue 2TB, I will put in into nas326 second slot next to healthy HDD...the NAS326 is not able to repair automatically itself ?

Thaks,
Jan

Mijzelf · June 2020

There were two HDDs in RAID1, which I understand the Disk1 has the same data as the Disk2.

They are "copy". RAID1 consists of an exact copy (or mirror) of a set of data on two disk.

You didn't have a raid1 array. The examine show the raid level is 'linear'. That level doesn't have redundancy. Your data volume had the size of 2 disks totaled, instead of the size of a single disk.

If you would have had a raid1 array, then yes, the NAS would have repaired it when you simply inserted a new, blank disk. Although it's not actually repairing, but re-adding redundancy. A degraded raid1 array isn't 'broken', it just lost it's redundancy. But that's nitpicking.

As I dont have any Linux PC I have to buy something like this

That doesn't have to do with Linux, but that you don't have a PC with available sata ports, I think? Else I don't know what you mean.

And yes, an USB-sata converter is a suitable way to connect the disk. But I don't think a 3.5" disk can be buspowered by USB, although I'm not sure about USB3. If you don't power the disk externally, check the specifications of the used USB3 port. It should at least be able to supply 15W, or something like that.

SystemRescueCD ?, Arch Linux ? or Kali Linux ?

Not Kali. That is a distro for vulnerability testing, and I wouldn't know if it has ddrescue in it's repo. Arch is bleeding edge. It will do, but it wouldn't be my choice when I need reliable functioning out of the box. SystemRescueCD sounds good. According to it's website it has ddrescue out of the box, which is actually the only tool you need.

Janikovo · June 2020

@Mijzelf and all others,

the SAN was working for one hour unexpectedly. I tried to backup, but stopped at 5% of the data. Then stopped again - "No Volume"

What I want to ask, how can I run e2fsck on that faulty disk ?

sd(a,b) 1,2,3 : not mounted

running e2fsck always says:

/dev/sd(a,b)1,2,3 is in use.

e2fsck: Cannot continue, aborting.

Thanks,
Jan

Mijzelf · June 2020

the SAN was working for one hour unexpectedly.

That was after a reboot, I guess?

What I want to ask, how can I run e2fsck on that faulty disk ?

The filesystem is not on the raw partitions. The partition contains a raid container, and you'll have to assemble the array to be able to run fsck on it.

The array can be assembled, the SAN was working for one hour unexpectedly. but it's not stable. On the first I/O error the array becomes inactive. So you can't feasibly run fsck on a not-redundent md array which contains a disk with hardware errors.

NAS326 problem

All Replies

Categories

Consumer Product Help Center