NAS326 bootup very slow ~ one hour

pdpbjorn
pdpbjorn Posts: 7  Freshman Member
Hi all
My NAS326 takes about an hour to boot - from I push the button (or Wake On LAN) and until the single beep declares the server ready.
I hoped that inspecting the logs would shed lights on what may be going on - or not...
But the log information found in the Control Panel, Maintenance section is rather scarce: 


The one-hour gap between item 8 and item 7 reflect the waiting time
Does anybody know where to find more detailed logs? Or maybe someone would have an idear what is going on?
«1

All Replies

  • Mijzelf
    Mijzelf Posts: 1,579  Guru Member
    When it is booted, does it then have it's normal speed?

    When you login over ssh, you can run 'dmesg' to see the kernel log. I might tell you when you have hardware errors.
  • pdpbjorn
    pdpbjorn Posts: 7  Freshman Member
    Hi Mijzelf
    Thanks for taking the time to look into this!
    Not sure I entirely understand your first question. But the NAS seems to function OK when at first it is ready, mainly delivering media through the Plex media server  - I used your port. I think the 326 is actually under-configured for this task - but it works.

    I dumped the log with dmesg - attached; Cannot say that I am an expert, but I dont really see much suspicious except for maybe the very last lines?

    IPv6: ADDRCONF(NETDEV_CHANGE): egiga0: link becomes ready
    EXT4-fs (md2): warning: mounting fs with errors, running e2fsck is recommended
    EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: usrquota,data=ordered,barrier=1
    EXT4-fs (md2): error count: 17
    EXT4-fs (md2): initial error at 1589788703: ext4_mb_generate_buddy:755
    EXT4-fs (md2): last error at 1597895714: ext4_mb_generate_buddy:755
    Rounding down aligned max_sectors from 4294967295 to 4294967288

    Cheers
    /per


  • Mijzelf
    Mijzelf Posts: 1,579  Guru Member
    I don't see any screaming hardware errors in the log. It looks pretty normal to me. Unfortunately there are no timestamps, so we can't find the gap. There is some small filesystem error, as you had already seen. I don't think it can cause a one hour delay, yet it's good to repair it. The volume repair button in the webinterface should do.

    I *think* the 'system ready' beep comes after the packages are started. So you can disable all packages to see if that solves the problem, if yes, re-enable them one by one to see which one seems to take a long time to start. It might also be an idea to try to ssh in before the system is ready. If that works, you can use 'top' to see which processes are active.

  • pdpbjorn
    pdpbjorn Posts: 7  Freshman Member
    Alright!
    I can not find the volume repair button, but:
    disabling Plex, MetaRepository and Tweahs from the App Center certainly made a difference - restart time is now back to normal.
    Before disabling packages, I was actually able to log in via SSH during the wait; 
    TOP tells me that the problem may be Plex-related - however, I dont understand how a command such as chown would persistently use 1.3% of the cpu for an hour

      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     2900  2885 root     D     2972  0.5   0  1.3 chown -R nobody:nobody /i-data/b7d01d58/.PKG/PlexMediaServer/share/PlexMediaServer
     3116     2 root     SW       0  0.0   0  1.1 [kworker/u2:0]
     1664     1 root     S     122m 24.4   0  0.9 python /usr/local/apache/web_framework/main_wsgi.pyc
     1822     2 root     SW       0  0.0   0  0.1 [jbd2/md2-8]
      396     2 root     SW       0  0.0   0  0.1 [kswapd0]
     2893  2308 nobody   S    11452  2.2   0  0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf
     2894  2308 nobody   S    11452  2.2   0  0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf
     2308     1 root     S     9884  1.9   0  0.0 /i-data/.system/zy-pkgs/pkg_httpd -f /etc/pkg_service_conf/httpd2.conf
     1626     1 root     S     6000  1.1   0  0.0 /usr/sbin/uamd
  • Mijzelf
    Mijzelf Posts: 1,579  Guru Member
    That 'STAT = D' means 'uninterruptible sleep', which in most cases means the process is waiting for the kernel to finish some action. If chown is (semi-) permanently in that state, I'd assume either an unreachable filesystem (a stale nfs mount, or something like that), or a filesystem error.
    I can not find the volume repair button
    Here I wrote about injection telnetd in /etc/init.d/rc.shutdown, to be able to run e2fsck on the data volume. Read the thread to see if you can do it. This &%^@ forum software has mutilated the instructions over time, but if you skip the html tags it's still usable.
  • pdpbjorn
    pdpbjorn Posts: 7  Freshman Member
    Wow, that looks scary...
    Just to be sure, in my case - after logging in with telnet - I should only execute umount /i-data/sysvol and e2fsck -f /dev/md2, not resize2fs /dev/md2 , right?
  • Mijzelf
    Mijzelf Posts: 1,579  Guru Member
    Wow, that looks scary...

    You can hardly do anything wrong. The file is on ramdisk, so it will be reverted after a reboot, and umount and e2fsck are harmless, when used wrong. They'll just throw an error and do nothing.

    Indeed, it's not intended to run resize2fs. (Although it shouldn't hurt. It should just say there's nothing to resize and exit. But why taunt fate?)



  • pdpbjorn
    pdpbjorn Posts: 7  Freshman Member
    That went well - but...
    Following your instructions, e2fsck actually found something to work with:
    e2fsck 1.42.12 (29-Aug-2014)
    Pass 1: Checking inodes, blocks, and sizes
    Deleted inode 14024861 has zero dtime.  Fix<y>? yes
    Deleted inode 96474457 has zero dtime.  Fix<y>? yes
    
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Block bitmap differences:  -(53025902--53025904) -(385979665--385979667)
    Fix<y>? yes
    Free blocks count wrong for group #1618 (1605, counted=1608).
    Fix<y>? yes
    Free blocks count wrong for group #11779 (17356, counted=17359).
    Fix<y>? yes
    Free blocks count wrong (254339881, counted=254339887).
    Fix<y>? yes
    Inode bitmap differences:  -14024861 -96474457
    Fix<y>? yes
    Free inodes count wrong for group #1712 (5211, counted=5212).
    Fix<y>? yes
    Free inodes count wrong for group #11776 (4934, counted=4935).
    Fix<y>? yes
    Free inodes count wrong (243019743, counted=243019745).
    Fix<y>? yes
    
    /dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
    /dev/md2: 913439/243933184 files (4.6% non-contiguous), 721381793/975721680 blocks

    and the box could restart fine (I had disabled Plex before doing the repair)

    But when I went on to enabling the Plex server from the App Center it still hangs for a long time (not finished yet). TOP reports the same chown process hanging with STAT=R or - mainly - STAT=D

    PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    6084  6072 root     D     2900  0.5   0  1.8 chown -R nobody:nobody /i-data/b7d01d58/.PKG/PlexMediaServer/share/PlexMediaServer
    
    Any other suggestions? 

  • Mijzelf
    Mijzelf Posts: 1,579  Guru Member
    I assume the filesystem error warning in dmesg is gone now.

    Are you using the 'Install packages on USB stick' Tweak? If not, the only problem I can imagine is that the directory contains a massive amount of files. You can count then using

    find /i-data/sysvol/.PKG/PlexMediaServer/share/PlexMediaServer | wc -l

    If that is a really large number, I suppose PlexMediaServer is leaking files, somehow. You should navigate the tree if you can find some obvious suspect. Midnight commander is a good tool to navigate the tree.
  • pdpbjorn
    pdpbjorn Posts: 7  Freshman Member
    edited October 2020
    / $ find /i-data/sysvol/.PKG/PlexMediaServer/share/PlexMediaServer | wc -l
    485990
    
    - that's an insane number - it took 30min to retrieve 
    Were you suspecting this?
    Is this a known (mis-)behavior?

Sign In to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click on this button!