gemma-4-E4B — Opus Reasoning + Claude Code LoRA

🧠 Opus Reasoning + Claude Code LoRA

LoRA adapters trained on Claude Opus 4.6 reasoning traces and Claude Code tool-use patterns — applied on top of deadbydawn101/gemma-4-E4B-mlx-4bit to give Gemma 4 a reasoning-heavy, structured assistant style.

What this means: these adapters teach the model to think before answering — using <think> tags for chain-of-thought, multi-step reasoning, and tool-invocation patterns extracted from real Claude Code sessions.

What's in this LoRA

Source	Examples	Description
Crownelius/Opus-4.6-Reasoning-2100x-formatted	2,054	Claude Opus 4.6 reasoning traces formatted with `<think>` tags
Claude Code tool-use patterns	140 files	Real Claude Code agentic patterns — file read/write, bash, search loops
Total	2,163	SFT dataset: assistant completions only (`--train-on-completions`)

Training on completions only means the model learns the response style without memorizing specific facts — it generalizes to new prompts.

Adapter Details

Property	Value
Base model	`deadbydawn101/gemma-4-E4B-mlx-4bit`
Adapter type	LoRA (MLX SFT)
File size	658.8 MB
Rank	8
Alpha	16.0
Dropout	0.0
Trainable params	325M / 7,993M total (4.07%)

Training Config

Setting	Value
Iterations	1,000
Batch size	2 + grad accum ×4 (eff. batch 8)
Learning rate	1e-5
Max seq length	2,048
Peak GPU memory	7.876 GB
Hardware	Apple M4 Max 128GB

Training Curve

Loss collapsed fast — the reasoning patterns absorbed cleanly:

Iter 10   →  2.277
Iter 20   →  0.097   ← rapid style acquisition
Iter 50   →  0.00063
Iter 100  →  0.0000398
Iter 200  →  0.0000067  (checkpoint saved)
Iter 1000 →  ~3.5e-7  (final)

Quickstart (MLX)

Install base model + adapters

pip install mlx-lm

from mlx_lm import load, generate

# Load base model with LoRA adapters
model, tokenizer = load(
    "deadbydawn101/gemma-4-E4B-mlx-4bit",
    adapter_path="deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora",
)

messages = [{"role": "user", "content": "Solve this step by step: A train leaves Chicago at 60mph..."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=1024, verbose=True)

CLI

mlx_lm.generate \
  --model deadbydawn101/gemma-4-E4B-mlx-4bit \
  --adapter-path deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora \
  --prompt "Write a Python function to find prime numbers and explain your reasoning." \
  --max-tokens 1024

Intended Use

Best for prompts where you want the model to:

Think step by step before responding
Handle multi-step problems (math, logic, code debugging)
Follow agentic tool-use patterns (read → reason → act → verify)
Produce well-structured, deliberate completions

Not ideal for:

Short creative tasks (adds reasoning overhead)
Casual chitchat

Files

File	Description
`adapters.safetensors`	LoRA weights (658.8 MB)
`adapter_config.json`	Config: `rank=8, alpha=16, dropout=0.0`

⚡ TurboQuant-MLX Compatibility

Works alongside TurboQuant-MLX — combine LoRA fine-tuning with 4.6x KV cache compression for long-context reasoning with Claude-style behavior.

→ TurboQuant-MLX on GitHub

Related Models

Model	Size	Description
deadbydawn101/gemma-4-E4B-mlx-4bit	4.86 GB	Base model — load this first
deadbydawn101/gemma-4-E2B-Heretic-Uncensored-mlx-4bit	3.34 GB	2B uncensored abliterated variant