Please add Devstral-Small-2-24B-Instruct-2512 too
Hi!
Nice work.
Since llama.cpp PR17889 was merged, can you please make these quants (IQ4_KSS) for it?
I wasn't sure how much demand there is for this smaller one, but IQ4_KSS would probably be a good size for the 24GB VRAM fam. I'll see if I have some time coming up during the holidays.
There seem to be some EXL3 and vllm type quants available also which might hold you over.. Hrm surprised no one has released an ik quant for it yet tho, especially with the new -sm graph features ik introduced recently.
Happy holidays π I'm about to upload a 12.069 GiB (4.398 BPW) Devstral Small 2 24B Instruct 2512 GGUF to huggingface. I did my usual imatrix treatment, but didn't measure perplexities etc.
fwiw it seems to be working fine for simple assistant chats, but something seems wrong with tool calling when testing with my pydantic-ai framework test code (which works fine with say GLM-4.7 for example).
If you want tool calling stuff / MCP / agent stuff working, you might need to look into some more PRs e.g.
- https://github.com/ggml-org/llama.cpp/compare/master...bartowski1182:llama.cpp:master (something bartowski mentioned a while back)
- might be another PR on mainline regarding Devstral tool stuff
Good luck and lemme know what you find out!
Okay, uploaded! Have fun and would love to hear if you figure out tool calling! https://huggingface.co/ubergarm/Devstral-Small-2-24B-Instruct-2512-GGUF
Thank you for the nice gift :)
Looks like there is no pre-compiled binaries for ik_llama.cpp and looking into how to start with nvidia toolchain now.
P.S. Merry Christmas!
If you're on windows you can check out these precompiled binaries: https://github.com/Thireus/ik_llama.cpp/releases/tag/main-b4471-4c5c685
Ideally yeah you're on Linux and can get the dependencies installed, holler if u get stuck! Cheers and merry xmas!