--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - open4bits widget: - example_title: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - example_title: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac model-index: - name: whisper-base results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (clean) type: librispeech_asr config: clean split: test args: language: en metrics: - name: Test WER type: wer value: 5.008769117619326 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: LibriSpeech (other) type: librispeech_asr config: other split: test args: language: en metrics: - name: Test WER type: wer value: 12.84936273212057 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 11.0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: language: hi metrics: - name: Test WER type: wer value: 131 pipeline_tag: automatic-speech-recognition license: apache-2.0 base_model: - openai/whisper-base --- # Open4bits / Whisper Base FP16 This repository provides the **Whisper Base model converted to FP16 (float16) precision**, published by Open4bits to enable more efficient inference while maintaining transcription quality. The underlying Whisper model and architecture are **owned by OpenAI**. This repository contains only a precision-converted version of the original model weights. The model is designed for multilingual speech-to-text tasks and can be used in research, experimentation, and production ASR pipelines. --- ## Model Overview Whisper is a sequence-to-sequence transformer model developed by OpenAI for automatic speech recognition and speech translation. This release uses the **Base** variant and preserves the original architecture while reducing memory usage through FP16 precision. --- ## Model Details - **Architecture:** Whisper Base - **Parameters:** ~74 million - **Precision:** float16 (FP16) - **Task:** Automatic Speech Recognition (ASR) - **Languages:** Multilingual - **Weight tying:** Preserved - **Compatibility:** Hugging Face Transformers, PyTorch This conversion improves inference speed and lowers VRAM requirements compared to FP32 versions, making it suitable for deployment on consumer and server-grade GPUs. --- ## Intended Use This model is intended for: - Speech-to-text transcription - Multilingual ASR applications - Research and benchmarking - Efficient inference in low-memory environments --- ## Limitations * Performance depends on audio quality, language, and accent * Inherits known limitations of the Whisper Base architecture * Not fine-tuned for domain-specific or highly noisy audio --- ## License This model is released under the **Apache License 2.0**. The original Whisper model and associated intellectual property are owned by OpenAI. --- ## Support If you find this model useful, please consider supporting the project. Your support helps us continue releasing and maintaining high-quality open models. Support us with a heart.