Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
MonsterMMORPG 
posted an update 3 days ago
Post
3779
Compared Quality and Speed Difference (with CUDA 13 & Sage Attention) of BF16 vs GGUF Q8 vs FP8 Scaled vs NVFP4 for Z Image Turbo, FLUX Dev, FLUX SRPO, FLUX Kontext, FLUX 2 - Full 4K step by step tutorial also published

Full 4K tutorial : https://youtu.be/XDzspWgnzxI

Check above full 4K tutorial to learn more and see uncompressed original quality and size images

It was always wondered how much quality and speed difference exists between BF16, GGUF, FP8 Scaled and NVFP4 precisions. In this tutorial I have compared all these precision and quantization variants for both speed and quality. The results are pretty surprising. Moreover, we have developed and published NVFP4 model quant generator app and FP8 Scaled quant generator apps. The links of the apps are below if you want to use them. Furthermore, upgrading ComfyUI to CUDA 13 with properly compiled libraries is now very much recommended. We have observed some noticeable performance gains with CUDA 13. So for both SwarmUI and ComfyUI solo users, CUDA 13 ComfyUI is now recommended.

Full 4K tutorial : https://youtu.be/XDzspWgnzxI

01

Incredible

·

thanks a lot for comment

While there's a clear quality loss, it's not that much considering it's 4-bit. Too bad it's just Blackwell.

Maybe next gen we get NVFP2 and then NVBitNet 1.58bit for the subsequent one.

·

There is also NVFP8

If it brings same speed up I think it will be best

But I didn't see any model yet and ComfyUI not supporting yet as far as I know