NAS326 bootup very slow ~ one hour

pdpbjorn · September 2020

Hi all
My NAS326 takes about an hour to boot - from I push the button (or Wake On LAN) and until the single beep declares the server ready.
I hoped that inspecting the logs would shed lights on what may be going on - or not...
But the log information found in the Control Panel, Maintenance section is rather scarce:

Image: https://us.v-cdn.net/6029482/uploads/editor/m5/6ixgjenmtc02.png

The one-hour gap between item 8 and item 7 reflect the waiting time
Does anybody know where to find more detailed logs? Or maybe someone would have an idear what is going on?

Mijzelf · September 2020

When it is booted, does it then have it's normal speed?

When you login over ssh, you can run 'dmesg' to see the kernel log. I might tell you when you have hardware errors.

pdpbjorn · October 2020

Hi Mijzelf
Thanks for taking the time to look into this!
Not sure I entirely understand your first question. But the NAS seems to function OK when at first it is ready, mainly delivering media through the Plex media server - I used your port. I think the 326 is actually under-configured for this task - but it works.

I dumped the log with dmesg - attached; Cannot say that I am an expert, but I dont really see much suspicious except for maybe the very last lines?

IPv6: ADDRCONF(NETDEV_CHANGE): egiga0: link becomes ready

EXT4-fs (md2): warning: mounting fs with errors, running e2fsck is recommended

EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: usrquota,data=ordered,barrier=1

EXT4-fs (md2): error count: 17

EXT4-fs (md2): initial error at 1589788703: ext4_mb_generate_buddy:755

EXT4-fs (md2): last error at 1597895714: ext4_mb_generate_buddy:755

Rounding down aligned max_sectors from 4294967295 to 4294967288

Cheers
/per

Mijzelf · October 2020

I don't see any screaming hardware errors in the log. It looks pretty normal to me. Unfortunately there are no timestamps, so we can't find the gap. There is some small filesystem error, as you had already seen. I don't think it can cause a one hour delay, yet it's good to repair it. The volume repair button in the webinterface should do.

I *think* the 'system ready' beep comes after the packages are started. So you can disable all packages to see if that solves the problem, if yes, re-enable them one by one to see which one seems to take a long time to start. It might also be an idea to try to ssh in before the system is ready. If that works, you can use 'top' to see which processes are active.

pdpbjorn · October 2020

Alright!
I can not find the volume repair button, but:
disabling Plex, MetaRepository and Tweahs from the App Center certainly made a difference - restart time is now back to normal.
Before disabling packages, I was actually able to log in via SSH during the wait;
TOP tells me that the problem may be Plex-related - however, I dont understand how a command such as chown would persistently use 1.3% of the cpu for an hour

PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND

2900 2885 root D 2972 0.5 0 1.3 chown -R nobody:nobody /i-data/b7d01d58/.PKG/PlexMediaServer/share/PlexMediaServer

3116 2 root SW 0 0.0 0 1.1 [kworker/u2:0]

1664 1 root S 122m 24.4 0 0.9 python /usr/local/apache/web_framework/main_wsgi.pyc

1822 2 root SW 0 0.0 0 0.1 [jbd2/md2-8]

396 2 root SW 0 0.0 0 0.1 [kswapd0]

2893 2308 nobody S 11452 2.2 0 0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf

2894 2308 nobody S 11452 2.2 0 0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf

2308 1 root S 9884 1.9 0 0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf

1626 1 root S 6000 1.1 0 0.0 /usr/sbin/uamd

Mijzelf · October 2020

That 'STAT = D' means 'uninterruptible sleep', which in most cases means the process is waiting for the kernel to finish some action. If chown is (semi-) permanently in that state, I'd assume either an unreachable filesystem (a stale nfs mount, or something like that), or a filesystem error.

I can not find the volume repair button

Here I wrote about injection telnetd in /etc/init.d/rc.shutdown, to be able to run e2fsck on the data volume. Read the thread to see if you can do it. This &%^@ forum software has mutilated the instructions over time, but if you skip the html tags it's still usable.

pdpbjorn · October 2020

Wow, that looks scary...
Just to be sure, in my case - after logging in with telnet - I should only execute umount /i-data/sysvol and e2fsck -f /dev/md2, not resize2fs /dev/md2 , right?

Mijzelf · October 2020

Wow, that looks scary...

You can hardly do anything wrong. The file is on ramdisk, so it will be reverted after a reboot, and umount and e2fsck are harmless, when used wrong. They'll just throw an error and do nothing.

Indeed, it's not intended to run resize2fs. (Although it shouldn't hurt. It should just say there's nothing to resize and exit. But why taunt fate?)

pdpbjorn · October 2020

That went well - but...
Following your instructions, e2fsck actually found something to work with:

e2fsck 1.42.12 (29-Aug-2014)
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 14024861 has zero dtime.  Fix<y>? yes
Deleted inode 96474457 has zero dtime.  Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(53025902--53025904) -(385979665--385979667)
Fix<y>? yes
Free blocks count wrong for group #1618 (1605, counted=1608).
Fix<y>? yes
Free blocks count wrong for group #11779 (17356, counted=17359).
Fix<y>? yes
Free blocks count wrong (254339881, counted=254339887).
Fix<y>? yes
Inode bitmap differences:  -14024861 -96474457
Fix<y>? yes
Free inodes count wrong for group #1712 (5211, counted=5212).
Fix<y>? yes
Free inodes count wrong for group #11776 (4934, counted=4935).
Fix<y>? yes
Free inodes count wrong (243019743, counted=243019745).
Fix<y>? yes

/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md2: 913439/243933184 files (4.6% non-contiguous), 721381793/975721680 blocks

and the box could restart fine (I had disabled Plex before doing the repair)

But when I went on to enabling the Plex server from the App Center it still hangs for a long time (not finished yet). TOP reports the same chown process hanging with STAT=R or - mainly - STAT=D

PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
6084  6072 root     D     2900  0.5   0  1.8 chown -R nobody:nobody /i-data/b7d01d58/.PKG/PlexMediaServer/share/PlexMediaServer

Any other suggestions?

Mijzelf · October 2020

I assume the filesystem error warning in dmesg is gone now.

Are you using the 'Install packages on USB stick' Tweak? If not, the only problem I can imagine is that the directory contains a massive amount of files. You can count then using

find /i-data/sysvol/.PKG/PlexMediaServer/share/PlexMediaServer | wc -l

If that is a really large number, I suppose PlexMediaServer is leaking files, somehow. You should navigate the tree if you can find some obvious suspect. Midnight commander is a good tool to navigate the tree.

pdpbjorn · October 2020

/ $ find /i-data/sysvol/.PKG/PlexMediaServer/share/PlexMediaServer | wc -l
485990

- that's an insane number - it took 30min to retrieve

Were you suspecting this?
Is this a known (mis-)behavior?

NAS326 bootup very slow ~ one hour

All Replies

Categories

Consumer Product Help Center