feat: support mmap for model loading #1059

wbruna · 2025-12-06T19:43:54Z

Introduces a new --use-mmap flag that replaces model loading I/O operations with mmap + memcpy.

In my tests, this helps model loading speed slightly, though the gain was never higher than half a second. Its primary benefit right now is validation of the mmap backend implementation. Later, I plan to extend this to allow the mapped file to serve directly as weight storage for backends that use main memory.

I used a non-default flag to be extra safe, but we could arguably follow llama.cpp approach, with a --no-mmap flag to disable it instead.

I was only able to test (and build...) it under Linux, so additional testing is very welcome 🙂

Green-Sky · 2025-12-07T11:19:47Z

How much value would it be if llama.cpp exported the mmap stuff as a library?

wbruna · 2025-12-09T01:27:42Z

How much value would it be if llama.cpp exported the mmap stuff as a library?

I don't think it'd help that much right now. The mmap part itself is more-or-less straightforward; replacing the current alloc+memcpy code with a buffer managed externally will be much trickier.

valkarias · 2025-12-10T10:15:51Z

Have you experimented with MMaping then copying to GPU?
In my experience. I've restricted MMapping only to CPU inference & loading. MMap -> copy to GPU became a bottleneck for some reason (I assume page size potentially?)

wbruna · 2025-12-10T12:19:38Z

Have you experimented with MMaping then copying to GPU? In my experience. I've restricted MMapping only to CPU inference & loading. MMap -> copy to GPU became a bottleneck for some reason (I assume page size potentially?)

Not yet. Right now I'm just reusing the I/O buffer; adding a separate code path to deliver the mapped area directly to the backend just to avoid a memcpy sounded like too much change for too little potential gain.

That behavior you describe sounds... odd. At least on Linux, large dynamically-allocated memory areas use mmap as backend anyway, so they should behave the same. Maybe it's a difference between file -backed and anonymous mappings.

wbruna added 2 commits December 11, 2025 17:42

feat: support mmap for model loading

89cc0ab

delay allocation until model loading time

73da9fb

wbruna force-pushed the sd_mmap_io branch from db1592e to 73da9fb Compare December 12, 2025 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support mmap for model loading #1059

feat: support mmap for model loading #1059

wbruna commented Dec 6, 2025

Uh oh!

Green-Sky commented Dec 7, 2025

Uh oh!

wbruna commented Dec 9, 2025

Uh oh!

valkarias commented Dec 10, 2025 •

edited

Loading

Uh oh!

wbruna commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: support mmap for model loading #1059

Are you sure you want to change the base?

feat: support mmap for model loading #1059

Conversation

wbruna commented Dec 6, 2025

Uh oh!

Green-Sky commented Dec 7, 2025

Uh oh!

wbruna commented Dec 9, 2025

Uh oh!

valkarias commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valkarias commented Dec 10, 2025 •

edited

Loading