What is the reason to disable mmap by default on v1.82? #1318
Replies: 2 comments
-
Non mmap loading used to be much slower, but nowadays it's pretty close. By not using mmap by default, I'll be able to add model unloading in future |
Beta Was this translation helpful? Give feedback.
-
To add to that mmap on Windows usually works great. Other operating systems however had issues with it incorrectly treating it as reserved ram even if all layers were on the GPU. The people for who mmap is a benefit are primarily ram limited windows users, and people who want to run multiple cpu instances of the same model (extremely rare use case), everyone else is better off without it. And yes when done properly mmap avoids filling swap which is great. When done improperly by the OS it xan range from taskklilling like I had on linux gpu rental providers to other things moving to swap. |
Beta Was this translation helpful? Give feedback.
-
The question is in the title. This is not a complaint by me, just curious. With previous version, if I disable mmap I would instantly get an OOM message with model larger than 15B. I only have 16 GiB RAM + 6GiB VRAM (+ 30 GiB swap) to play with.
I could be misremembering it but the swap space was never fill up. I thought it would do that if it could not fit everything in VRAM + RAM. But I definitely remember that without mmap, koboldcpp would fail with larger model.
But now, it can load up EVA-Qwen2.5-32B-v0.2 just fine. It's slow but quantkv help. (A bit of a shame that context shift is disabled, though.) I don't know if I set up something wrong before 🤔.
Beta Was this translation helpful? Give feedback.
All reactions