gemma-4-E4B β€” Opus Reasoning + Claude Code LoRA

🧠 Opus Reasoning + Claude Code LoRA

LoRA adapters trained on Claude Opus 4.6 reasoning traces and Claude Code tool-use patterns β€” applied on top of deadbydawn101/gemma-4-E4B-mlx-4bit to give Gemma 4 a reasoning-heavy, structured assistant style.

What this means: these adapters teach the model to think before answering β€” using <think> tags for chain-of-thought, multi-step reasoning, and tool-invocation patterns extracted from real Claude Code sessions.

What's in this LoRA

Source Examples Description
Crownelius/Opus-4.6-Reasoning-2100x-formatted 2,054 Claude Opus 4.6 reasoning traces formatted with <think> tags
Claude Code tool-use patterns 140 files Real Claude Code agentic patterns β€” file read/write, bash, search loops
Total 2,163 SFT dataset: assistant completions only (--train-on-completions)

Training on completions only means the model learns the response style without memorizing specific facts β€” it generalizes to new prompts.

Adapter Details

Property Value
Base model deadbydawn101/gemma-4-E4B-mlx-4bit
Adapter type LoRA (MLX SFT)
File size 658.8 MB
Rank 8
Alpha 16.0
Dropout 0.0
Trainable params 325M / 7,993M total (4.07%)

Training Config

Setting Value
Iterations 1,000
Batch size 2 + grad accum Γ—4 (eff. batch 8)
Learning rate 1e-5
Max seq length 2,048
Peak GPU memory 7.876 GB
Hardware Apple M4 Max 128GB

Training Curve

Loss collapsed fast β€” the reasoning patterns absorbed cleanly:

Iter 10   β†’  2.277
Iter 20   β†’  0.097   ← rapid style acquisition
Iter 50   β†’  0.00063
Iter 100  β†’  0.0000398
Iter 200  β†’  0.0000067  (checkpoint saved)
Iter 1000 β†’  ~3.5e-7  (final)

Quickstart (MLX)

Install base model + adapters

pip install mlx-lm
from mlx_lm import load, generate

# Load base model with LoRA adapters
model, tokenizer = load(
    "deadbydawn101/gemma-4-E4B-mlx-4bit",
    adapter_path="deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora",
)

messages = [{"role": "user", "content": "Solve this step by step: A train leaves Chicago at 60mph..."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=1024, verbose=True)

CLI

mlx_lm.generate \
  --model deadbydawn101/gemma-4-E4B-mlx-4bit \
  --adapter-path deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora \
  --prompt "Write a Python function to find prime numbers and explain your reasoning." \
  --max-tokens 1024

Intended Use

Best for prompts where you want the model to:

  • Think step by step before responding
  • Handle multi-step problems (math, logic, code debugging)
  • Follow agentic tool-use patterns (read β†’ reason β†’ act β†’ verify)
  • Produce well-structured, deliberate completions

Not ideal for:

  • Short creative tasks (adds reasoning overhead)
  • Casual chitchat

Files

File Description
adapters.safetensors LoRA weights (658.8 MB)
adapter_config.json Config: rank=8, alpha=16, dropout=0.0

⚑ TurboQuant-MLX Compatibility

Works alongside TurboQuant-MLX β€” combine LoRA fine-tuning with 4.6x KV cache compression for long-context reasoning with Claude-style behavior.

β†’ TurboQuant-MLX on GitHub

Related Models

Model Size Description
deadbydawn101/gemma-4-E4B-mlx-4bit 4.86 GB Base model β€” load this first
deadbydawn101/gemma-4-E2B-Heretic-Uncensored-mlx-4bit 3.34 GB 2B uncensored abliterated variant

Trained and released by deadbydawn101 Β· RavenX AI

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora

Adapter
(1)
this model

Collection including deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora