VulnHunter: AI Security Agent

An AI agent trained with GRPO to detect and fix web application security vulnerabilities.

This model was trained 2x faster with Unsloth and Huggingface's TRL library.

Model Description

VulnHunter is a fine-tuned Qwen2.5-Coder-7B model specialized for security vulnerability detection and patching. It was trained using GRPO (Group Relative Policy Optimization) with a custom security reward function.

Capabilities

✅ SQL Injection Detection - Identifies unsanitized SQL queries
✅ XSS Detection - Finds unescaped user input in HTML
✅ Path Traversal Detection - Detects unchecked file paths
✅ Automatic Fix Generation - Suggests secure code patches

Quick Start

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "gateremark/vulnhunter-agent"
)

# Analyze vulnerable code
prompt = """Analyze this code for security vulnerabilities:
query = f"SELECT * FROM users WHERE id = {user_id}"
cursor.execute(query)
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Base Model

Model: Qwen2.5-Coder-7B-Instruct
Quantization: 4-bit (BitsAndBytes)
Framework: Unsloth + TRL

Why Qwen2.5-Coder?

Pre-trained on code - understands Python, SQL, security patterns
Instruct variant - follows instructions out-of-the-box
7B size - sweet spot between capability and cost
Unsloth support - 2x faster training

Training Configuration

Parameter	Value
Method	GRPO (Group Relative Policy Optimization)
Hardware	NVIDIA A100-SXM4-40GB
Training Time	~90 minutes
Steps	200
LoRA Rank	32
Learning Rate	2e-5
Batch Size	1 (4 gradient accumulation)
Group Size	4 generations

Why GRPO?

Method	Memory	Our Choice
SFT	Low	Too passive
PPO	High (needs critic)	Memory-prohibitive
DPO	Medium	Needs preference pairs
GRPO	Low	✅ Perfect for rewards

GRPO eliminates the critic model by comparing responses within groups, giving PPO-quality learning without 2x memory overhead.

Reward Function

Event	Reward
Identify vulnerability type	+0.3
Generate valid patch	+0.2
Patch blocks exploit	+1.0
Syntax error in patch	-0.2

Evaluation Results

Test Cases

SQL Injection:

# Input
query = f"SELECT * FROM users WHERE username = '{username}'"

# VulnHunter Output
# "SQL injection vulnerability. Use parameterized queries:
# query = 'SELECT * FROM users WHERE username = %s'
# cursor.execute(query, (username,))"

XSS:

# Input
return f"<h1>Results for: {query}</h1>"

# VulnHunter Output
# "XSS vulnerability. Escape user input:
# from html import escape
# return f'<h1>Results for: {escape(query)}</h1>'"

Path Traversal:

# Input
return send_file(f"/uploads/{filename}")

# VulnHunter Output
# "Path traversal vulnerability. Use safe_join:
# from werkzeug.utils import safe_join
# return send_file(safe_join('/uploads', filename))"

OpenEnv Environment

VulnHunter comes with an OpenEnv-compatible RL environment:

git clone https://github.com/gateremark/vulnhunter
cd vulnhunter
uvicorn vulnhunter.env_server.server:app --port 8000

Green Agent (A2A Protocol)

The model includes an A2A-compatible agent wrapper:

cd vulnhunter/green_agent
python server.py
# Agent at http://localhost:9009

Agent Card

{
  "name": "VulnHunter",
  "skills": [{"id": "analyze_code", "name": "Analyze Code"}]
}

Citation

@misc{vulnhunter2026,
  author = {gateremark},
  title = {VulnHunter: AI Security Agent with GRPO},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/gateremark/vulnhunter-agent}
}

Acknowledgments

Built for the AgentBeats OpenEnv Challenge sponsored by PyTorch, Hugging Face, and Unsloth.

Built with ❤️ by gateremark

Downloads last month: 15

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for gateremark/vulnhunter-agent

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Quantized

unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit

Finetuned

(42)

this model

gateremark
/

vulnhunter-agent

VulnHunter: AI Security Agent

Model Description

Capabilities

Quick Start

Training Details

Base Model

Why Qwen2.5-Coder?

Training Configuration

Why GRPO?

Reward Function

Evaluation Results

Test Cases

OpenEnv Environment

Green Agent (A2A Protocol)

Agent Card

Links

Citation

Acknowledgments

Model tree for gateremark/vulnhunter-agent