Nas326 smbd ram usage nearly 100% during copy. NAS get unresponisve

Options
2»

All Replies

  • Mijzelf
    Mijzelf Posts: 2,613  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options

    why the ram is not freed after the copy task is finished?

    It might not be able to. Processes normally use malloc() to allocate memory, and free() to free it. But there is a complex mechanism below that. When a large block of memory is requested, the request is passed to the kernel, and on free it is given back to the kernel. But for smaller blocks an internal heap is used, created in a larger block requested from the kernel. In that case that larger block can only be returned to the kernel if all micro-allocations inside are free()'d. So it's perfectly possible that the process allocates small blocks of memory to buffer the in-streaming file, and some other blocks for internal use. The file buffer is free'd when the file is written, but the other blocks are not. The larger block around the small blocks still in use cannot be returned to the kernel, although it's no longer in use for the main part. The only way to clean that up is to kill the process and start a new one.

    But generally there is no need for that. When a process isn't using it's allocated memory, and the system needs some physical memory, the pages which are not touched for 'a long time' are swapped away. And nobody cares.

    On the other hand, your 'top' shows smbd uses 712M, while the box only has 512M. As I expect the mayor part of the memory usage of smbd will be file buffering, that means the OS had to swap out a large part of that file buffer to be able to fulfill the request for more memory. That is a problem. The file buffer became that big because the data was streaming in faster than it could be written to disk. And now the OS steps in, and uses the disk to access the swapfile, causing smbd to need even more file buffer.

    Mechanical disks are notoriously bad in multitasking. While a modern disk can read/write at 100-150MB/sec, outperforming the GigE networkport, to move the heads to another place, to access another file, takes typically 10 msec (random access time). So worst case the throughput can collapse to 100*512 bytes/sec, 50kB/sec. (That is why SSD's are so fast. Not their sustained throughput, but their access time of a few microseconds)

    Maybe the drives are slow because they are 81% filled (8.8TB out of 10.8TB)

    Maybe. The filesystem tries to keep the files as unfragmented as possible. (Each fragment change costs 10msec extra to access) But in practise that means that the free space becomes fragmented. And so new files bigger than the largest free block have to be fragmented. There is a command 'e2freefrag' for that, but I don't know if it's available on your NAS. You can try 'e2freefrag /dev/mapper/vg_cae2ed9e-lv_b3f36ee7 '. It shows in a table the sizes of the free blocks, so you can easily calculate how much fragments your 16GB file will be.

    Have you changed your backup program lately? There are two ways to copy a file. One is open a file on the target location, and keep writing until the file is copied. The other one is to open a file with a given size on the target location, and fill it. The difference is that in the first case the filesystem doesn't know how big the file is going to be, and it needs to allocate disk space on the fly. So it needs to search for free space, and mark them allocated, swinging the heads around and wasting time. And the filesystem cannot choose for the most fitting free space fragments.

  • deedstoke
    deedstoke Posts: 1
    First Comment
    edited May 2023
    Options

    Thanks for your answer. I did a reboot this morning and copied some data. Afterwards i did a dmesg.

    https://teatv.ltd/get-apk/

    https://hellodear.in

  • Daimonion666
    Daimonion666 Posts: 15  Freshman Member
    10 Comments Friend Collector
    Options

    @Mijzelf I completly agree with you. I didn't change the backup program. I just use windows explorer to copy the file by myself.

    e2freefrag is not available at the nas326 but e2fsck told something about 1-2% of fragmented data after cleaning the drive.

    I do have another nas326 at a customers side. Maybe i can take a look into this system to see if it behaves the same way.

  • Daimonion666
    Daimonion666 Posts: 15  Freshman Member
    10 Comments Friend Collector
    edited April 2023
    Options

    Hello Again.

    I managed to take a look at the other NAS326 and this behaves completely normal. The difference to my NAS is that this NAS does not make a JBOD from it's to hard drives. Also the drive Usage is something around 50% and the write performance over network is ~100MB/s

    My NAS however never had a a write performance greater than ~50MB/s. Even if the drives were empty.

    So i think there are several problems here, but the main thing in my opinion is why does smbd not free their memory after hours of not using?

    Would it be a temporary solution to kill the smbd process which uses that high amount of ram? Not directly after copying but some hours later?

  • Mijzelf
    Mijzelf Posts: 2,613  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options

    the main thing in my opinion is why does smbd not free their memory after hours of not using?

    I think the question is: why does smbd needs that much memory in the first place? Smbd keeps a memory buffer to pass the difference between network throughput and disk throughput. Network throughput is quite monotonous, given a powerful enough client, while disk throughput varies heavily depending on file size and other processes using the same disk. There is not much smbd can do to influence disk throughput, other than buffering, so it can read/write bigger chunks, and prevent the disk idling. Network throughput can be controlled. The server can tell the client to hold it's horses and lower the upload. But everything which is uploaded has to be written to disk, or buffered, when the disk throughput isn't high enough. You can't ask the client to upload a 2nd time.

    So either smbd is tricked to expect a higher disk throughput or a lower network throughput than actually attained, or the client doesn't listen to upload steering feedback. The latter is unlikely, and the disks are supposed to be healthy, so maybe there is something with the network? You are wired connected, I suppose? And if not, does wiring help? If you execute 'ifconfig' does the nic show errors?

  • Daimonion666
    Daimonion666 Posts: 15  Freshman Member
    10 Comments Friend Collector
    Options

    The wiring should be okay. Until last month i did have a wired connection thorugh a fritzbox and dedicated switch and now, since a month, i have a complete ubiquity setup with two usw-pro switches, fiber backbone inside my house and so on. Of course i have a wired connection.

    I agree with your writing, that the memory rising from smbd is because it expects higher disk throughput or a lower network throughput.

    Nevertheless it shall free the allocated memory after it is not used anymore.

    I did a test. I copied a 20 GB File and memory raised until 650M (125% percent or something like that) During this copy the smbd process took CPU Time.

    In the moment when the copy task was finished, the appropriate smbd process idled. After that i made a kill -9 and killed this process. At my windows client everything was fine and the file on the nas is completly okay. (copied it back and compared the sha-1 hash with the original)

    Is there a way to track the log of smbd process and have a look why it doesn't free the allocated memory?

  • Mijzelf
    Mijzelf Posts: 2,613  Guru Member
    First Anniversary 10 Comments Friend Collector First Answer
    Options

    Is there a way to track the log of smbd process and have a look why it doesn't free the allocated memory?

    I suppose you can edit /etc/samba/smb.conf an restart samba (/etc/init.d/samba.sh restart). Be aware that any change to smb.conf won't survive a reboot or webinterface initiated samba configuration change.

    The wiring should be okay.

    Have you tried pinging for a while to see if there is packet loss? And does ifconfig on the box show no errors? That you never exceeded 50MB/sec is suspicious. You can measure the disk thoughput:

    cd # should put you in the admin home directory on the data volume

    dd if=/dev/zero of=./bigfile bs=16M count=64

    This will write a 1GiB file in the admin share, and tell how long it took.

Consumer Product Help Center