jj701's picture
Add comprehensive model card with full training details
12efd82 verified
metadata
library_name: lerobot
license: mit
tags:
  - robotics
  - groot
  - manipulation
  - condiment-handover
  - asgard-robot
base_model: nvidia/GR00T-N1.5-3B
datasets:
  - asgard-robot/asgard_training_data_condiment
embodiment_tag: asgard_so101
model-index:
  - name: GROOT Condiment Handover Model
    results:
      - task:
          type: manipulation
          name: condiment-handover
        metrics:
          - name: training_loss
            type: loss
            value: ~0.006
          - name: loss_reduction_percent
            type: percentage
            value: ~99

GROOT Condiment Handover Model - Step 2000

Model Card Summary

  • Checkpoint: Step 2000 (Final checkpoint)
  • Base Model: nvidia/GR00T-N1.5-3B
  • Task: Condiment handover on ASGARD so101_follower robot
  • Training Status: Completed successfully
  • Final Loss: ~0.006

Model Details

Model Architecture

This is a fine-tuned NVIDIA GR00T N1.5-3B model specifically trained for condiment handover tasks.

  • Model Type: GROOT (Generalist Robot 00 Technology)
  • Policy Type: GR00T N1.5-3B
  • Robot Embodiment: asgard_so101 (single-arm 6 degrees of freedom)
  • Action Dimensions: 6 (joint positions + gripper)
  • Observation: Dual camera RGB (640×480×3 each)

Training Components

Frozen (Not Trained):

  • ❌ LLM (tune_llm=false) - Language model kept frozen
  • ❌ Vision Encoder (tune_visual=false) - Visual features frozen

Trainable Components:

  • ✅ Diffusion Transformer (tune_diffusion_model=true) - Action generation
  • ✅ Projector (tune_projector=true) - Vision-language to action mapping

Training Strategy

  • Approach: Full fine-tuning (no LoRA)
  • Rationale: 4× H100 GPUs with 320GB total VRAM allows full parameter updates
  • Precision: bf16 (mixed precision training)

Training Details

Dataset Information

Parameter Value Description
Dataset Repository asgard-robot/asgard_training_data_condiment Hugging Face dataset
Dataset Version v3.0 LeRobot format tag
Total Episodes 40 Number of demonstrations
Total Frames 31,522 Total training samples
Avg Frames/Episode ~788 Average trajectory length
Episode Duration ~26 seconds At 30 FPS
Robot Type so101_follower Single-arm 6 DOF
Task Condiment handover Primary objective
Format LeRobot v3.0 Parquet + MP4 videos (AV1 codec)

Training Hyperparameters

Parameter Value Justification
Total Training Steps 2,000 Full training cycle
Number of Epochs ~32 Effective epochs (31,522 frames ÷ 512 batch)
Checkpoints Saved 5 Steps: 400, 800, 1200, 1600, 2000
Learning Rate 1e-4 GROOT recommended value
Weight Decay 1e-5 L2 regularization
Gradient Clip Norm 1.0 Training stability
Warmup Ratio 0.05 Gradual learning rate ramp
Batch Size (per GPU) 128 Maximum VRAM utilization
Effective Batch Size 512 128 × 4 GPUs
Num Workers 16 DataLoader parallel loading
Video Backend torchcodec AV1 codec decoder
Mixed Precision bf16 Memory efficient training

Hardware Configuration

Component Specification Utilization
GPUs 4× NVIDIA H100 PCIe All 4 GPUs used
VRAM per GPU 80GB ~79.65GB usable
Total VRAM 320GB Peak usage: ~60-70GB per GPU
CPUs 124 AMD EPYC 9554 (64-Core) Data loading
System RAM 708GB Adequate for data loading
Storage 1.5TB ephemeral Checkpoint storage

Usage

Load Model

from lerobot import Policy

policy = Policy.from_pretrained("asgard-robot/groot-condiment-handover")

Run Inference

# The model expects observations with:
# - observation.images.wrist1: RGB camera (640×480×3)
# - observation.images.realsense: RGB camera (640×480×3)
# - observation.state: 6D joint positions

action = policy(observation)
# Returns: 6D action space (joint positions + gripper)

Action Space

The model outputs actions for 6 degrees of freedom:

  1. shoulder_pan.pos
  2. shoulder_lift.pos
  3. elbow_flex.pos
  4. wrist_flex.pos
  5. wrist_roll.pos
  6. gripper.pos

Citation

@software{groot_condiment_model_2024,
  author = {ASGARD Team},
  title = {GROOT Condiment Handover Model - Step 2000},
  model = {asgard-robot/groot-condiment-handover},
  year = {2024},
  month = {October},
  checkpoint = {2000},
  base_model = {nvidia/GR00T-N1.5-3B},
  dataset = {asgard-robot/asgard_training_data_condiment},
  training_hardware = {4× NVIDIA H100 PCIe GPUs}
}

Acknowledgments

  • Base Model: NVIDIA GR00T N1.5-3B
  • Framework: LeRobot (ASGARD teleop control branch)
  • Dataset: ASGARD Robot Datasets
  • Hardware: Shadeform H100 Multi-GPU Cluster