gemma-4-E4B β Opus Reasoning + Claude Code LoRA
π§ Opus Reasoning + Claude Code LoRA
LoRA adapters trained on Claude Opus 4.6 reasoning traces and Claude Code tool-use patterns β applied on top of deadbydawn101/gemma-4-E4B-mlx-4bit to give Gemma 4 a reasoning-heavy, structured assistant style.
What this means: these adapters teach the model to think before answering β using
<think>tags for chain-of-thought, multi-step reasoning, and tool-invocation patterns extracted from real Claude Code sessions.
What's in this LoRA
| Source | Examples | Description |
|---|---|---|
| Crownelius/Opus-4.6-Reasoning-2100x-formatted | 2,054 | Claude Opus 4.6 reasoning traces formatted with <think> tags |
| Claude Code tool-use patterns | 140 files | Real Claude Code agentic patterns β file read/write, bash, search loops |
| Total | 2,163 | SFT dataset: assistant completions only (--train-on-completions) |
Training on completions only means the model learns the response style without memorizing specific facts β it generalizes to new prompts.
Adapter Details
| Property | Value |
|---|---|
| Base model | deadbydawn101/gemma-4-E4B-mlx-4bit |
| Adapter type | LoRA (MLX SFT) |
| File size | 658.8 MB |
| Rank | 8 |
| Alpha | 16.0 |
| Dropout | 0.0 |
| Trainable params | 325M / 7,993M total (4.07%) |
Training Config
| Setting | Value |
|---|---|
| Iterations | 1,000 |
| Batch size | 2 + grad accum Γ4 (eff. batch 8) |
| Learning rate | 1e-5 |
| Max seq length | 2,048 |
| Peak GPU memory | 7.876 GB |
| Hardware | Apple M4 Max 128GB |
Training Curve
Loss collapsed fast β the reasoning patterns absorbed cleanly:
Iter 10 β 2.277
Iter 20 β 0.097 β rapid style acquisition
Iter 50 β 0.00063
Iter 100 β 0.0000398
Iter 200 β 0.0000067 (checkpoint saved)
Iter 1000 β ~3.5e-7 (final)
Quickstart (MLX)
Install base model + adapters
pip install mlx-lm
from mlx_lm import load, generate
# Load base model with LoRA adapters
model, tokenizer = load(
"deadbydawn101/gemma-4-E4B-mlx-4bit",
adapter_path="deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora",
)
messages = [{"role": "user", "content": "Solve this step by step: A train leaves Chicago at 60mph..."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=1024, verbose=True)
CLI
mlx_lm.generate \
--model deadbydawn101/gemma-4-E4B-mlx-4bit \
--adapter-path deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora \
--prompt "Write a Python function to find prime numbers and explain your reasoning." \
--max-tokens 1024
Intended Use
Best for prompts where you want the model to:
- Think step by step before responding
- Handle multi-step problems (math, logic, code debugging)
- Follow agentic tool-use patterns (read β reason β act β verify)
- Produce well-structured, deliberate completions
Not ideal for:
- Short creative tasks (adds reasoning overhead)
- Casual chitchat
Files
| File | Description |
|---|---|
adapters.safetensors |
LoRA weights (658.8 MB) |
adapter_config.json |
Config: rank=8, alpha=16, dropout=0.0 |
β‘ TurboQuant-MLX Compatibility
Works alongside TurboQuant-MLX β combine LoRA fine-tuning with 4.6x KV cache compression for long-context reasoning with Claude-style behavior.
Related Models
| Model | Size | Description |
|---|---|---|
| deadbydawn101/gemma-4-E4B-mlx-4bit | 4.86 GB | Base model β load this first |
| deadbydawn101/gemma-4-E2B-Heretic-Uncensored-mlx-4bit | 3.34 GB | 2B uncensored abliterated variant |
Trained and released by deadbydawn101 Β· RavenX AI
Quantized
Model tree for deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora
Base model
google/gemma-4-E4B-it