groot-condiment-handover / README.md

jj701

Add comprehensive model card with full training details

12efd82 verified 2 months ago

preview code

raw

history blame contribute delete

4.95 kB

metadata

library_name: lerobot
license: mit
tags:
  - robotics
  - groot
  - manipulation
  - condiment-handover
  - asgard-robot
base_model: nvidia/GR00T-N1.5-3B
datasets:
  - asgard-robot/asgard_training_data_condiment
embodiment_tag: asgard_so101
model-index:
  - name: GROOT Condiment Handover Model
    results:
      - task:
          type: manipulation
          name: condiment-handover
        metrics:
          - name: training_loss
            type: loss
            value: ~0.006
          - name: loss_reduction_percent
            type: percentage
            value: ~99

GROOT Condiment Handover Model - Step 2000

Model Card Summary

Checkpoint: Step 2000 (Final checkpoint)
Base Model: nvidia/GR00T-N1.5-3B
Task: Condiment handover on ASGARD so101_follower robot
Training Status: Completed successfully
Final Loss: ~0.006

Model Details

Model Architecture

This is a fine-tuned NVIDIA GR00T N1.5-3B model specifically trained for condiment handover tasks.

Model Type: GROOT (Generalist Robot 00 Technology)
Policy Type: GR00T N1.5-3B
Robot Embodiment: asgard_so101 (single-arm 6 degrees of freedom)
Action Dimensions: 6 (joint positions + gripper)
Observation: Dual camera RGB (640×480×3 each)

Training Components

Frozen (Not Trained):

❌ LLM (tune_llm=false) - Language model kept frozen
❌ Vision Encoder (tune_visual=false) - Visual features frozen

Trainable Components:

✅ Diffusion Transformer (tune_diffusion_model=true) - Action generation
✅ Projector (tune_projector=true) - Vision-language to action mapping

Training Strategy

Approach: Full fine-tuning (no LoRA)
Rationale: 4× H100 GPUs with 320GB total VRAM allows full parameter updates
Precision: bf16 (mixed precision training)

Training Details

Dataset Information

Parameter	Value	Description
Dataset Repository	asgard-robot/asgard_training_data_condiment	Hugging Face dataset
Dataset Version	v3.0	LeRobot format tag
Total Episodes	40	Number of demonstrations
Total Frames	31,522	Total training samples
Avg Frames/Episode	~788	Average trajectory length
Episode Duration	~26 seconds	At 30 FPS
Robot Type	so101_follower	Single-arm 6 DOF
Task	Condiment handover	Primary objective
Format	LeRobot v3.0	Parquet + MP4 videos (AV1 codec)

Training Hyperparameters

Parameter	Value	Justification
Total Training Steps	2,000	Full training cycle
Number of Epochs	~32	Effective epochs (31,522 frames ÷ 512 batch)
Checkpoints Saved	5	Steps: 400, 800, 1200, 1600, 2000
Learning Rate	1e-4	GROOT recommended value
Weight Decay	1e-5	L2 regularization
Gradient Clip Norm	1.0	Training stability
Warmup Ratio	0.05	Gradual learning rate ramp
Batch Size (per GPU)	128	Maximum VRAM utilization
Effective Batch Size	512	128 × 4 GPUs
Num Workers	16	DataLoader parallel loading
Video Backend	torchcodec	AV1 codec decoder
Mixed Precision	bf16	Memory efficient training

Hardware Configuration

Component	Specification	Utilization
GPUs	4× NVIDIA H100 PCIe	All 4 GPUs used
VRAM per GPU	80GB	~79.65GB usable
Total VRAM	320GB	Peak usage: ~60-70GB per GPU
CPUs	124 AMD EPYC 9554 (64-Core)	Data loading
System RAM	708GB	Adequate for data loading
Storage	1.5TB ephemeral	Checkpoint storage

Usage

Load Model

from lerobot import Policy

policy = Policy.from_pretrained("asgard-robot/groot-condiment-handover")

Run Inference

# The model expects observations with:
# - observation.images.wrist1: RGB camera (640×480×3)
# - observation.images.realsense: RGB camera (640×480×3)
# - observation.state: 6D joint positions

action = policy(observation)
# Returns: 6D action space (joint positions + gripper)

Action Space

The model outputs actions for 6 degrees of freedom:

shoulder_pan.pos
shoulder_lift.pos
elbow_flex.pos
wrist_flex.pos
wrist_roll.pos
gripper.pos

Citation

@software{groot_condiment_model_2024,
  author = {ASGARD Team},
  title = {GROOT Condiment Handover Model - Step 2000},
  model = {asgard-robot/groot-condiment-handover},
  year = {2024},
  month = {October},
  checkpoint = {2000},
  base_model = {nvidia/GR00T-N1.5-3B},
  dataset = {asgard-robot/asgard_training_data_condiment},
  training_hardware = {4× NVIDIA H100 PCIe GPUs}
}

Acknowledgments

Base Model: NVIDIA GR00T N1.5-3B
Framework: LeRobot (ASGARD teleop control branch)
Dataset: ASGARD Robot Datasets
Hardware: Shadeform H100 Multi-GPU Cluster