BrickGPT-DPO


Model Details

Model Description

BrickGPT-DPO is a specialized language model fine-tuned from AvaLovelace/BrickGPT (which is based on Llama-3.2-1B-Instruct).

This model has been aligned using Direct Preference Optimization (DPO) to explicitly prioritize physical stability in generated LEGO®-style brick structures.

  • Key Feature: Unlike the base model, which relies on rejection sampling and external physics solvers during inference, BrickGPT-DPO incorporates stability constraints directly into its weights.

  • Training Method: Trained on a preference dataset where physically stable continuations were "chosen" and unstable collapses were "rejected," using a novel Prompt Extension strategy to handle partial valid structures.

  • Developed by: Kshitij, Sreeharsha, Yitian, Carnegie Mellon University (CMU)

  • Model Type: Causal Language Model (LLM) using LoRA (Low-Rank Adaptation)

  • Language(s): English (Instructions), Custom Brick Syntax

  • License: Llama 3.2 Community License

  • Finetuned from: AvaLovelace/BrickGPT


Uses

Direct Use

The model's primary function is to generate sequences of brick commands (<dimensions> <coordinates>) from natural language descriptions.

  • Input Example: "A high-backed chair with red cushions."
  • Output Example: A list of valid 1x1, 2x4, etc., bricks that form a physically stable structure.

The output text is intended to be parsed within the BrickGPT ecosystem for rendering in 3D voxel grids or for robotic assembly.

Downstream Use

  • Robotic Assembly: Stable sequences can be fed directly into motion planners for robotic arms (e.g., Yaskawa GP4) to build the object in the real world without collapsing.
  • Structural Engineering Design: Generating initial, stable candidates for voxel-based CAD.

Out-of-Scope Use

  • General Chat: The model is heavily overfitted to brick syntax and its general conversation capabilities are likely degraded.
  • Non-Brick 3D Generation: The output is strictly discrete brick tokens, not meshes or point clouds.

Bias, Risks, and Limitations

  • Geometric Hallucinations: The model may generate structures that are stable but do not perfectly match the semantic description (e.g., a "car" that looks like a "box").
  • Grid Constraint: Limited to the $20\times20\times20$ voxel world defined during training.
  • Physics Simplification: The stability preference is based on a static equilibrium solver; it does not account for dynamic forces (like shaking) or complex material properties beyond standard ABS plastic friction.

Training Details

Training Data

Trained on a Prompt-Extended DPO Dataset derived from the StableText2Brick dataset.

  • Data Generation: The SFT policy AvaLovelace/BrickGPT generated structures, which were validated by a physics solver.
  • Preference Pairs:
    • Prompt ($x$): Original User Instruction.
    • Chosen ($y_w$): A continuation leading to a fully stable structure.
    • Rejected ($y_l$): An unstable continuation generated by the model before a physics-aware rollback.

Training Procedure

Trained using the TRL library's DPOTrainer with PEFT (LoRA).

Hyperparameter Value
Method Direct Preference Optimization (DPO)
Precision bfloat16 with 4-bit Quantization (QLoRA/NF4)
Optimizer paged_adamw_8bit
Learning Rate 5e-5 with Cosine Scheduler
Warmup Ratio 0.1
Batch Size 1 per device (Effective batch size 16 via 16 Gradient Accumulation Steps)
Beta 0.1
Max Sequence Length 2048
Max Prompt Length 256
Epochs 3
LoRA Config $r=32$, $\alpha=16$, Dropout=0.05, Targets: q_proj, v_proj

Evaluation

Metrics

  • Stability Rate: The percentage of generated structures that pass the static equilibrium test without needing external rollback/correction.

Technical Specifications

Component Detail
Base Model AvaLovelace/BrickGPT
Library Name peft
License llama3.2
Datasets dpo_dataset.parquet
Language en
Pipeline Tag text-generation
Hardware RTX 4080
Software PyTorch 2.4.0, Transformers 4.45.0, TRL 0.11.0, PEFT 0.15.2, BitsAndBytes

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kshitij-hf/brickgpt-dpo-2048

Adapter
(1)
this model

Evaluation results