Voxtral Mini 4B Realtime - 8-bit MLX

This is an 8-bit quantized MLX version of Voxtral Mini 4B Realtime by Mistral AI, converted using voxmlx.

This version was created for use with Supervoxtral, enabling blazingly-fast realtime transcription on MacOS.

Model Details

Description

Voxtral Mini is a speech-to-text model that supports 13+ languages with sub-500ms latency. This version has been quantized to 8-bit precision for efficient inference on Apple Silicon using the MLX framework.

Credits

Downloads last month
45
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx