Mistral 7B AWQ 4-bit (RAG-Optimized)

Research-grade quantization of Mistral 7B using AWQ with RAG-specific calibration dataset.

Model Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Quantization Method: AWQ (Activation-aware Weight Quantization)
  • Precision: 4-bit
  • Calibration: Custom RAG-formatted dataset (128 samples)
  • Quantization Time: 51.75 minutes

Calibration Dataset

The model was quantized using a custom RAG-specific calibration dataset with the following distribution:

  • Short context (32 samples, 256-512 tokens): Single document retrieval
  • Medium context (40 samples, 1024-2048 tokens): 2-3 document comparison
  • Long context (32 samples, 3072-4096 tokens): 5-7 document synthesis
  • Multi-hop (24 samples): 8-10 documents requiring complex reasoning

Data sources: SQuAD v2, HotpotQA, Wikipedia

Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model = AutoAWQForCausalLM.from_quantized("zahraase1im/mistral-7b-awq-4bit-rag", fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained("zahraase1im/mistral-7b-awq-4bit-rag")

# RAG-formatted prompt
prompt = """[QUERY]: What is the capital of France?

[RETRIEVED DOCUMENTS]:
Document 1: France is a country in Western Europe. Its capital is Paris...

[ANSWER]:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Research Context

This model is part of a comprehensive study on quantization methods for RAG systems. The calibration dataset is specifically designed to represent realistic RAG workloads with varying context lengths and complexity.

For more details, see the thesis: "Optimizing Large Language Model Quantization for Retrieval-Augmented Generation"

Citation

If you use this model in your research, please cite:

@misc{mistral7b-awq-rag,
  author = {zahraase1im},
  title = {Mistral 7B AWQ 4-bit RAG-Optimized},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/zahraase1im/mistral-7b-awq-4bit-rag}}
}
Downloads last month
1
Safetensors
Model size
7B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train zahraase1im/mistral-7b-awq-4bit-rag