So fast great job!

#1
by ubergarm - opened

I just confirmed the Q8_0 is running well on CPU-only backend tested with both latest mainline llama.cpp and also ik_llama.cpp!

Thanks to you and the llama.cpp team PR17889 paving the way on this one!

i was able to run mistralai_Devstral-Small-2-24B-Instruct-2512-Q5_K_M.gguf using llama.cpp (git pull + rebuild the 15 december 2025). I use open webui and i am testing the web search and the knowledge. For now it behaves well and it is fast on my rtx 3090. Very good model! thanks!

Sign up or log in to comment