So fast great job!
#1
by
ubergarm
- opened
I just confirmed the Q8_0 is running well on CPU-only backend tested with both latest mainline llama.cpp and also ik_llama.cpp!
Thanks to you and the llama.cpp team PR17889 paving the way on this one!
i was able to run mistralai_Devstral-Small-2-24B-Instruct-2512-Q5_K_M.gguf using llama.cpp (git pull + rebuild the 15 december 2025). I use open webui and i am testing the web search and the knowledge. For now it behaves well and it is fast on my rtx 3090. Very good model! thanks!