Please add Devstral-Small-2-24B-Instruct-2512 too

#4
by Reverger - opened

Hi!
Nice work.
Since llama.cpp PR17889 was merged, can you please make these quants (IQ4_KSS) for it?

I wasn't sure how much demand there is for this smaller one, but IQ4_KSS would probably be a good size for the 24GB VRAM fam. I'll see if I have some time coming up during the holidays.

There seem to be some EXL3 and vllm type quants available also which might hold you over.. Hrm surprised no one has released an ik quant for it yet tho, especially with the new -sm graph features ik introduced recently.

@Reverger

Happy holidays 🎁 I'm about to upload a 12.069 GiB (4.398 BPW) Devstral Small 2 24B Instruct 2512 GGUF to huggingface. I did my usual imatrix treatment, but didn't measure perplexities etc.

fwiw it seems to be working fine for simple assistant chats, but something seems wrong with tool calling when testing with my pydantic-ai framework test code (which works fine with say GLM-4.7 for example).

If you want tool calling stuff / MCP / agent stuff working, you might need to look into some more PRs e.g.

Good luck and lemme know what you find out!

@Reverger

Okay, uploaded! Have fun and would love to hear if you figure out tool calling! https://huggingface.co/ubergarm/Devstral-Small-2-24B-Instruct-2512-GGUF

Thank you for the nice gift :)

Looks like there is no pre-compiled binaries for ik_llama.cpp and looking into how to start with nvidia toolchain now.
P.S. Merry Christmas!

If you're on windows you can check out these precompiled binaries: https://github.com/Thireus/ik_llama.cpp/releases/tag/main-b4471-4c5c685

Ideally yeah you're on Linux and can get the dependencies installed, holler if u get stuck! Cheers and merry xmas!

Sign up or log in to comment