NSA320 process to replace a bad disk in RAID 1

Mijzelf · November 2021

The white screen is normal, there is no further confirmation. Now you can use a telnet client (PuTTY will do, make sure you select telnet as protocol) to get shell access to your NAS.

Maybe you'll have to re-open the Telnet backdoor, don't know if it closes on dis-activity.

ngibson · November 2021

Thanks for the help - I believe I've connected with telnet. (Sorry, all this is very unfamiliar to me)

Here's the entire session with the commands you listed; after "su" I typed in the admin password.
</code></div><div><code><br>

NSA320 login: admin<br>Password:<br><br><br>BusyBox v1.17.2 (2016-03-11 16:40:37 CST) built-in shell (ash)<br>Enter 'help' for a list of built-in commands.<br><br>~ $ cat /proc/partitions<br>major minor  #blocks  name<br><br>   7        0     140288 loop0<br>   8        0 2930266584 sda<br>   8        1     498688 sda1<br>   8        2 2929766400 sda2<br>   8       16 2930266584 sdb<br>   8       17     498688 sdb1<br>   8       18 2929766400 sdb2<br>  31        0       1024 mtdblock0<br>  31        1        512 mtdblock1<br>  31        2        512 mtdblock2<br>  31        3        512 mtdblock3<br>  31        4      10240 mtdblock4<br>  31        5      10240 mtdblock5<br>  31        6      48896 mtdblock6<br>  31        7      10240 mtdblock7<br>  31        8      48896 mtdblock8<br>   9        0 2929766264 md0<br>~ $ cat /proc/mdstat<br>Personalities : [linear] [raid0] [raid1]<br>md0 : active raid1 sda2[2] sdb2[1]<br>      2929766264 blocks super 1.0 [2/1] [_U]<br><br>unused devices: <none><br>~ $ su<br>Password:<br><br><br>BusyBox v1.17.2 (2016-03-11 16:40:37 CST) built-in shell (ash)<br>Enter 'help' for a list of built-in commands.<br><br>~ # mdadm --examine /dev/sd[ab]2<br>/dev/sda2:<br>          Magic : a92b4efc<br>        Version : 1.0<br>    Feature Map : 0x2<br>     Array UUID : 630ad246:af25be64:c3565d38:6dd85ddb<br>           Name : 0<br>  Creation Time : Sun Sep 22 01:50:31 2013<br>     Raid Level : raid1<br>   Raid Devices : 2<br><br> Avail Dev Size : 2929766264 (2794.04 GiB 3000.08 GB)<br>     Array Size : 2929766264 (2794.04 GiB 3000.08 GB)<br>   Super Offset : 5859532784 sectors<br>Recovery Offset : 5859532528 sectors<br>          State : clean<br>    Device UUID : 7346cb76:5a16c5d3:93ba274d:a613f4e0<br><br>    Update Time : Tue Nov 30 09:33:52 2021<br>       Checksum : a43bb60 - correct<br>         Events : 3509251<br><br><br>   Device Role : Active device 0<br>   Array State : AA ('A' == active, '.' == missing)<br>/dev/sdb2:<br>          Magic : a92b4efc<br>        Version : 1.0<br>    Feature Map : 0x0<br>     Array UUID : 630ad246:af25be64:c3565d38:6dd85ddb<br>           Name : 0<br>  Creation Time : Sun Sep 22 01:50:31 2013<br>     Raid Level : raid1<br>   Raid Devices : 2<br><br> Avail Dev Size : 2929766264 (2794.04 GiB 3000.08 GB)<br>     Array Size : 2929766264 (2794.04 GiB 3000.08 GB)<br>   Super Offset : 5859532784 sectors<br>          State : clean<br>    Device UUID : 9d3c2c12:358ea491:6104fcc1:4296fc9d<br><br>    Update Time : Tue Nov 30 09:33:52 2021<br>       Checksum : 381f96da - correct<br>         Events : 3509251<br><br><br>   Device Role : Active device 1<br>   Array State : AA ('A' == active, '.' == missing)<br>~ #<br>~ #

ngibson · November 2021

Here's the entire session; after "su" I typed the admin password again.

Thanks in advance for your suggestions.

NSA320 login: admin
Password:

BusyBox v1.17.2 (2016-03-11 16:40:37 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

~ $ cat /proc/partitions
major minor #blocks name

   7        0     140288 loop0
   8        0 2930266584 sda
   8        1     498688 sda1
   8        2 2929766400 sda2
   8       16 2930266584 sdb
   8       17     498688 sdb1
   8       18 2929766400 sdb2
31        0       1024 mtdblock0
31        1        512 mtdblock1
31        2        512 mtdblock2
31        3        512 mtdblock3
31        4      10240 mtdblock4
31        5      10240 mtdblock5
31        6      48896 mtdblock6
31        7      10240 mtdblock7
31        8      48896 mtdblock8
   9        0 2929766264 md0
~ $ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda2[2] sdb2[1]
      2929766264 blocks super 1.0 [2/1] [_U]

unused devices: <none>
~ $ su
Password:

BusyBox v1.17.2 (2016-03-11 16:40:37 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

~ # mdadm --examine /dev/sd[ab]2
/dev/sda2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x2
     Array UUID : 630ad246:af25be64:c3565d38:6dd85ddb
           Name : 0
Creation Time : Sun Sep 22 01:50:31 2013
     Raid Level : raid1
   Raid Devices : 2

Avail Dev Size : 2929766264 (2794.04 GiB 3000.08 GB)
     Array Size : 2929766264 (2794.04 GiB 3000.08 GB)
   Super Offset : 5859532784 sectors
Recovery Offset : 5859532528 sectors
          State : clean
    Device UUID : 7346cb76:5a16c5d3:93ba274d:a613f4e0

    Update Time : Tue Nov 30 09:33:52 2021
       Checksum : a43bb60 - correct
         Events : 3509251

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 630ad246:af25be64:c3565d38:6dd85ddb
           Name : 0
Creation Time : Sun Sep 22 01:50:31 2013
     Raid Level : raid1
   Raid Devices : 2

Avail Dev Size : 2929766264 (2794.04 GiB 3000.08 GB)
     Array Size : 2929766264 (2794.04 GiB 3000.08 GB)
   Super Offset : 5859532784 sectors
          State : clean
    Device UUID : 9d3c2c12:358ea491:6104fcc1:4296fc9d

    Update Time : Tue Nov 30 09:33:52 2021
       Checksum : 381f96da - correct
         Events : 3509251

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)
~ #
~ #

Mijzelf · November 2021

OK, according to your /proc/partitions both disks are exact the same size, and so are the data partitions.

I'm not sure what is happening here, but at least this is strange:

Super Offset : 5859532784 sectors
Recovery Offset : 5859532528 sectors

This version of raid has the headers on the end of the partition, so 'Super Offset' points to the end of the usable space. The recovery stopped at 256 sectors from the end of the array. That is a suspicious number, as it is a power of 2. Further if this indeed happened in 4 hours, it was recovering at >200MB/sec, which seems impossible to me.

Let's try to continue the recovery, and see what kernel log that gives.

su

dmesg -c >/dev/null

echo "recovery" >/sys/block/md0/md/sync_action<br>

wait some time

dmesg

'su' is an elevation command. 'admin' doesn't have the right credentials. The first 'dmesg' command clears the kernel log. The 'echo' line is supposed to continue the recovery. The second 'dmesg' shows the new log lines.

ngibson · November 2021

~ # su

BusyBox v1.17.2 (2016-03-11 16:40:37 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

~ # dmesg -c >/dev/null
~ # echo "recovery" > /sys/block/md0/md/sync_action<br>
sh: syntax error: unexpected newline
~ # echo "recovery" >/sys/block/md0/md/sync_action<br>
sh: syntax error: unexpected newline
~ #

Afraid I don't know where the syntax error is. I tried adding a space between the '>' and '/' to no avail. Also tried removing the <br>, which gave no error nor any other message.

I tried logging in as NsaRescueAngel also, but same results.

Mijzelf · November 2021

This forumsoftware is really ... . That <br> shouldn't be there, and I didn't type it. So retry without <br>. If you copy&paste, paste first to notepad, and copy it from there. That way no invisible stuff is pasted.

ngibson · December 2021

Got it - done and done.

Putty went back to a command prompt, nothing else happened.

I waited 10 minutes, then

~ # dmesg
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2

etc

After 20 minutes I got this:

RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
Uncached vma dbc6d758 (addr 40b3c000 flags 080000ff phy 1c370000) from pid 18342
Uncached vma dbc6d650 (addr 40b3f000 flags 080000ff phy 1c370000) from pid 18342
Uncached vma dbc6d650 (addr 406fe000 flags 080000ff phy 1c370000) from pid 1341
Uncached vma dbc6d650 (addr 406fe000 flags 080000ff phy 1c370000) from pid 1341
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
~ #
~ # RAID1 conf printout:
sh: RAID1: not found
~ # --- wd:1 rd:2
sh: ---: not found
~ # disk 0, wo:1, o:1, dev:sda2
sh: disk: not found
~ # disk 1, wo:0, o:1, dev:sdb2
sh: disk: not found
~ # Uncached vma dbc6d758 (addr 40b3c000 flags 080000ff phy 1c370000) from pid 1
8342
sh: syntax error: unexpected "("
~ # Uncached vma dbc6d650 (addr 40b3f000 flags 080000ff phy 1c370000) from pid 1
8342
sh: syntax error: unexpected "("
~ # Uncached vma dbc6d650 (addr 406fe000 flags 080000ff phy 1c370000) from pid 1
341
sh: syntax error: unexpected "("
~ # Uncached vma dbc6d650 (addr 406fe000 flags 080000ff phy 1c370000) from pid 1
341
sh: syntax error: unexpected "("
~ # RAID1 conf printout:
sh: RAID1: not found
~ # --- wd:1 rd:2
sh: ---: not found
~ # disk 0, wo:1, o:1, dev:sda2
sh: disk: not found
~ # disk 1, wo:0, o:1, dev:sdb2
sh: disk: not found
~ # RAID1 conf printout:
sh: RAID1: not found

ngibson · December 2021

Mijzelf · December 2021

Another déja vu:

RAID1 conf printout:

--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2

This time I could find it back: Link. That thread describes your problem, with the difference that there the recovery stops at ~70%, while yours stop at ~99.999%.

Don't know what to suggest now. In theory it's possible to read the last 256 sectors of the source disk, and write it back, to re-init that broken sector. But still 256 sectors is a suspicious number. It is probably less error prone to remove the source disk, create a new volume on the new disk, put the source disk back, and copy the data over, and finally add the old disk to the new volume.

ngibson · December 2021

Thanks for the suggestion.

What is considered the "source disk"? I have the original two disks ( "disk1", which seems was the bad one, and "disk2") plus the 3rd "NewDisk". So currently the NAS has the NewDisk in slot 1 and disk2 in slot 2.
Will creating a new volume on any disk delete the current contents? Do I need to back up the whole volume onto yet another disk?
Is there a method or specific steps to "copy the data over, and finally add the old disk to the new volume"?

NSA320 process to replace a bad disk in RAID 1

All Replies

Categories

Consumer Product Help Center