BrickGPT-DPO
Model Details
Model Description
BrickGPT-DPO is a specialized language model fine-tuned from AvaLovelace/BrickGPT (which is based on Llama-3.2-1B-Instruct).
This model has been aligned using Direct Preference Optimization (DPO) to explicitly prioritize physical stability in generated LEGO®-style brick structures.
Key Feature: Unlike the base model, which relies on rejection sampling and external physics solvers during inference, BrickGPT-DPO incorporates stability constraints directly into its weights.
Training Method: Trained on a preference dataset where physically stable continuations were "chosen" and unstable collapses were "rejected," using a novel Prompt Extension strategy to handle partial valid structures.
Developed by: Kshitij, Sreeharsha, Yitian, Carnegie Mellon University (CMU)
Model Type: Causal Language Model (LLM) using LoRA (Low-Rank Adaptation)
Language(s): English (Instructions), Custom Brick Syntax
License: Llama 3.2 Community License
Finetuned from: AvaLovelace/BrickGPT
Uses
Direct Use
The model's primary function is to generate sequences of brick commands (<dimensions> <coordinates>) from natural language descriptions.
- Input Example: "A high-backed chair with red cushions."
- Output Example: A list of valid 1x1, 2x4, etc., bricks that form a physically stable structure.
The output text is intended to be parsed within the BrickGPT ecosystem for rendering in 3D voxel grids or for robotic assembly.
Downstream Use
- Robotic Assembly: Stable sequences can be fed directly into motion planners for robotic arms (e.g., Yaskawa GP4) to build the object in the real world without collapsing.
- Structural Engineering Design: Generating initial, stable candidates for voxel-based CAD.
Out-of-Scope Use
- General Chat: The model is heavily overfitted to brick syntax and its general conversation capabilities are likely degraded.
- Non-Brick 3D Generation: The output is strictly discrete brick tokens, not meshes or point clouds.
Bias, Risks, and Limitations
- Geometric Hallucinations: The model may generate structures that are stable but do not perfectly match the semantic description (e.g., a "car" that looks like a "box").
- Grid Constraint: Limited to the $20\times20\times20$ voxel world defined during training.
- Physics Simplification: The stability preference is based on a static equilibrium solver; it does not account for dynamic forces (like shaking) or complex material properties beyond standard ABS plastic friction.
Training Details
Training Data
Trained on a Prompt-Extended DPO Dataset derived from the StableText2Brick dataset.
- Data Generation: The SFT policy AvaLovelace/BrickGPT generated structures, which were validated by a physics solver.
- Preference Pairs:
- Prompt ($x$): Original User Instruction.
- Chosen ($y_w$): A continuation leading to a fully stable structure.
- Rejected ($y_l$): An unstable continuation generated by the model before a physics-aware rollback.
Training Procedure
Trained using the TRL library's DPOTrainer with PEFT (LoRA).
| Hyperparameter | Value |
|---|---|
| Method | Direct Preference Optimization (DPO) |
| Precision | bfloat16 with 4-bit Quantization (QLoRA/NF4) |
| Optimizer | paged_adamw_8bit |
| Learning Rate | 5e-5 with Cosine Scheduler |
| Warmup Ratio | 0.1 |
| Batch Size | 1 per device (Effective batch size 16 via 16 Gradient Accumulation Steps) |
| Beta | 0.1 |
| Max Sequence Length | 2048 |
| Max Prompt Length | 256 |
| Epochs | 3 |
| LoRA Config | $r=32$, $\alpha=16$, Dropout=0.05, Targets: q_proj, v_proj |
Evaluation
Metrics
- Stability Rate: The percentage of generated structures that pass the static equilibrium test without needing external rollback/correction.
Technical Specifications
| Component | Detail |
|---|---|
| Base Model | AvaLovelace/BrickGPT |
| Library Name | peft |
| License | llama3.2 |
| Datasets | dpo_dataset.parquet |
| Language | en |
| Pipeline Tag | text-generation |
| Hardware | RTX 4080 |
| Software | PyTorch 2.4.0, Transformers 4.45.0, TRL 0.11.0, PEFT 0.15.2, BitsAndBytes |
- Downloads last month
- 3
Model tree for kshitij-hf/brickgpt-dpo-2048
Evaluation results
- Stability Rateself-reportedInsert Observed Stability Rate
- Instruction Following (CLIP Score)self-reportedInsert Observed CLIP Score