Llama-3-8B-CLARA-MeD-QLoRA 🏥🇪🇸
- Developed by: Jordiett
- Task: Biomedical Text Simplification (Spanish)
- Base Model:
unsloth/llama-3-8b-Instruct-bnb-4bit - Dataset: CLARA-MeD (3,800 pairs of parallel medical texts)
Model Description
This model is a fine-tuned version of Llama-3-8B-Instruct optimized for simplifying complex medical texts into plain Spanish understandable by patients.
It was trained using Unsloth and QLoRA (4-bit quantization) on the CLARA-MeD dataset. The model outperforms baseline translation models (like NLLB) and previous generation LLMs (Llama-2) in simplification metrics, specifically achieving a high SARI score.
Key Features
- Domain: Clinical/Biomedical.
- Language: Spanish.
- Method: QLoRA (Quantized Low-Rank Adaptation) + Unsloth.
- Optimization: Inference parameters selected via Grid Search (Best: Greedy Decoding, Temp=0.0).
Performance 📊
The model was evaluated on the CLARA-MeD test set (10% split).
| Metric | Score | Description |
|---|---|---|
| SARI | 39.92 | Main simplification metric (Keep/Add/Del). |
| BLEU | 22.97 | N-gram precision against reference. |
| COMET | ~0.76 | Semantic similarity. |
| ROUGE-L | 0.44 | Recall-based metric (Longest Common Subsequence). |
Qualitative Example
The model demonstrates deep understanding of medical terminology, avoiding hallucinations common in zero-shot baselines.
| Type | Text |
|---|---|
| Original (Input) | "onicosimicotica y perionixis" |
| Reference (Gold) | "infección por hongos de la uña del pie" |
| Model Prediction | "infección de la uña del dedo del pie" |
How to Use 💻
To use this model, you need unsloth. It runs 2x faster and uses 60% less memory.
# 1. Install Unsloth
# !pip install "unsloth[colab-new] @ git+[https://github.com/unslothai/unsloth.git](https://github.com/unslothai/unsloth.git)"
# !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
from unsloth import FastLanguageModel
import torch
# 2. Load Model & Tokenizer
model_name = "Jordiett/llama3-8b-claramed-qlora"
max_seq_length = 512
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
# 3. Define the Prompt (Alpaca Style)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Actúa como un doctor experto. Simplifica el siguiente texto médico técnico al español claro para un paciente.
### Input:
{}
### Response:
"""
# 4. Run Inference
text_to_simplify = "El paciente presenta cefalea tensional crónica y odinofagia." # Example
inputs = tokenizer(
[
alpaca_prompt.format(text_to_simplify)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True, temperature = 0.0)
result = tokenizer.batch_decode(outputs, skip_special_tokens = True)
print(result[0].split("### Response:")[-1].strip())
# Output expected: "El paciente tiene dolor de cabeza constante y dolor al tragar."
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Jordiett/llama3-8b-claramed-qlora
Base model
unsloth/llama-3-8b-Instruct-bnb-4bit