--- license: apache-2.0 tags: - diffusion-single-file - comfyui - distillation - LoRA - video - video genration base_model: - Wan-AI/Wan2.2-I2V-A14B - Wan-AI/Wan2.2-TI2V-5B - Wan-AI/Wan2.1-I2V-14B-720P pipeline_tags: - image-to-video - text-to-video library_name: diffusers --- # 🎨 LightVAE ## ⚡ Efficient Video Autoencoder (VAE) Model Collection *From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory* ![img_lightx2v](https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/tTnp8-ARpj3wGxfo5P55c.png) --- [![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v) [![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V) [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE) --- For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: **LightVAE** and **LightTAE**, which significantly reduce memory consumption and improve inference speed while maintaining high quality. ## 💡 Core Advantages

### 📊 Official VAE Features: Highest Quality ⭐⭐⭐⭐⭐ ✅ Best reconstruction accuracy ✅ Complete detail preservation ❌ Large memory usage (~8-12 GB) ❌ Slow inference speed	### 🚀 Open Source TAE Series Features: Fastest Speed ⚡⚡⚡⚡⚡ ✅ Minimal memory usage (~0.4 GB) ✅ Extremely fast inference ❌ Average quality ⭐⭐⭐ ❌ Potential detail loss
### 🎯 LightVAE Series (Our Optimization) Features: Best Balanced Solution ⚖️ ✅ Uses Causal 3D Conv (same as official) ✅ Quality close to official ⭐⭐⭐⭐ ✅ Memory reduced by ~50% (~4-5 GB) ✅ Speed increased by 2-3x ✅ Balances quality, speed, and memory 🏆	### ⚡ LightTAE Series (Our Optimization) Features: Fast Speed + Good Quality 🏆 ✅ Minimal memory usage (~0.4 GB) ✅ Extremely fast inference ✅ Quality close to official ⭐⭐⭐⭐ ✅ Significantly surpasses open source TAE

--- ## 📦 Available Models ### 🎯 Wan2.1 Series VAE | Model Name | Type | Architecture | Description | |:--------|:-----|:-----|:-----| | `Wan2.1_VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model
**Highest quality, large memory, slow speed** | | `taew2_1` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)
**Small memory, fast speed, average quality** | | **`lighttaew2_1`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_1`**
**Small memory, fast speed, quality close to official** ✨ | | **`lightvaew2_1`** | **LightVAE Series** | Causal Conv3D | **Our pruned 75% on WanVAE2.1 architecture then trained+distilled**
**Best balance: high quality + low memory + fast speed** 🏆 | ### 🎯 Wan2.2 Series VAE | Model Name | Type | Architecture | Description | |:--------|:-----|:-----|:-----| | `Wan2.2_VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model
**Highest quality, large memory, slow speed** | | `taew2_2` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)
**Small memory, fast speed, average quality** | | **`lighttaew2_2`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_2`**
**Small memory, fast speed, quality close to official** ✨ | --- ## 📊 Wan2.1 Series Performance Comparison - **Precision**: BF16 - **Test Hardware**: NVIDIA H100 ### Video Reconstruction (5s 81-frame video) |Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 | |:-----|:--------------|:------------|:---------------------|:-------------| | **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s | | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s | |GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 | |:-----|:--------------|:------------|:---------------------|:-------------| | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB | | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB | ### Video Generation Task: s2v(speech to video) Model: seko-talk

Wan2.1_VAE

taew2_1

lighttaew2_1

lightvaew2_1

## 📊 Wan2.2 Series Performance Comparison - **Precision**: BF16 - **Test Hardware**: NVIDIA H100 ### Video Reconstruction | Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 | |:-----|:--------------|:------------|:---------------------| | **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s | | **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s| | GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 | |:-----|:--------------|:------------|:---------------------| | **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB | | **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB | ### Video Generation Task: t2v(text to video) Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)

Wan2.2_VAE

taew2_2

lighttaew2_2

## 🎯 Model Selection Recommendations ### Selection by Use Case

#### 🏆 Pursuing Best Quality **Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE` - ✅ Official model, quality ceiling - ✅ Highest reconstruction accuracy - ✅ Suitable for final product output - ⚠️ **Large memory usage** (~8-12 GB) - ⚠️ **Slow inference speed**

#### ⚖️ **Best Balance** 🏆 **Recommended**: **`lightvaew2_1`** - ✅ **Uses Causal 3D Conv** (same as official) - ✅ **Excellent quality**, close to official - ✅ Memory reduced by **~50%** (~4-5 GB) - ✅ Speed increased by **2-3x** - ✅ **Close to official quality** ⭐⭐⭐⭐ **Use Cases**: Daily production, strongly recommended ⭐

#### ⚡ **Speed + Quality Balance** ✨ **Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`** - ✅ Extremely low memory usage (~0.4 GB) - ✅ Extremely fast inference - ✅ **Quality significantly surpasses open source TAE** - ✅ **Close to official quality** ⭐⭐⭐⭐ **Use Cases**: Development testing, rapid iteration

### 🔥 Our Optimization Results Comparison | Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** | |:------|:--------|:---------------------|:---------|:---------------------| | **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D | | **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) | | **Inference Speed** | Extremely Fast ⚡⚡⚡⚡⚡ | Extremely Fast ⚡⚡⚡⚡⚡ | Slow ⚡⚡ | Fast ⚡⚡⚡⚡ | | **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | ## 📑 Todo List - [x] LightX2V integration - [x] ComfyUI integration - [ ] Training & Distillation Code ## 🚀 Usage ### Download VAE Models ```bash # Download Wan2.1 official VAE huggingface-cli download lightx2v/Autoencoders \ --local-dir ./models/vae/ ``` ### 🧪 Video Reconstruction Test We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality. **Script Location**: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py` ```bash git clone https://github.com/ModelTC/LightX2V.git cd LightX2V ``` **1. Test Official VAE (Wan2.1)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/Wan2.1_VAE.pth \ --model_type vaew2_1 \ --device cuda \ --dtype bfloat16 ``` **2. Test Official VAE (Wan2.2)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/Wan2.2_VAE.pth \ --model_type vaew2_2 \ --device cuda \ --dtype bfloat16 ``` **3. Test LightTAE (Wan2.1)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/lighttaew2_1.pth \ --model_type taew2_1 \ --device cuda \ --dtype bfloat16 ``` **4. Test LightTAE (Wan2.2)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/lighttaew2_2.pth \ --model_type taew2_2 \ --device cuda \ --dtype bfloat16 ``` **5. Test LightVAE (Wan2.1)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/lightvaew2_1.pth \ --model_type vaew2_1 \ --device cuda \ --dtype bfloat16 \ --use_lightvae ``` **6. Test TAE (Wan2.1)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/taew2_1.pth \ --model_type taew2_1 \ --device cuda \ --dtype bfloat16 ``` **7. Test TAE (Wan2.2)** ```bash python -m lightx2v.models.video_encoders.hf.vid_recon \ input_video.mp4 \ --checkpoint ./models/vae/taew2_2.pth \ --model_type taew2_1 \ --device cuda \ --dtype bfloat16 ``` ### Use in LightX2V Specify the VAE path in the configuration file: **Using Official VAE Series:** ```json { "vae_path": "./models/vae/Wan2.1_VAE.pth" } ``` **Using LightVAE Series:** ```json { "use_lightvae": true, "vae_path": "./models/vae/lightvaew2_1.pth" } ``` **Using LightTAE Series:** ```json { "use_tae": true, "need_scaled": true, "tae_path": "./models/vae/lighttaew2_1.pth" } ``` **Using TAE Series:** ```json { "use_tae": true, "tae_path": "./models/vae/taew2_1.pth" } ``` Then run the inference script: ```bash cd LightX2V/scripts bash wan/run_wan_i2v.sh # or other inference scripts ``` ### Use in ComfyUI please refer to https://github.com/ModelTC/ComfyUI-LightVAE ## ⚠️ Important Notes ### 1. Compatibility - Wan2.1 series VAE only works with Wan2.1 backbone models - Wan2.2 series VAE only works with Wan2.2 backbone models - Do not mix different versions of VAE and backbone models ## 📚 Related Resources ### Documentation Links - **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html) - **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html) - **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV) ### Related Models - **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI) - **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) - **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v) --- ## 🤝 Community & Support - **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues - **HuggingFace**: https://huggingface.co/lightx2v - **LightX2V Homepage**: https://github.com/ModelTC/LightX2V If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)

### 📊 Official VAE Features: Highest Quality ⭐⭐⭐⭐⭐ ✅ Best reconstruction accuracy ✅ Complete detail preservation ❌ Large memory usage (~8-12 GB) ❌ Slow inference speed	### 🚀 Open Source TAE Series Features: Fastest Speed ⚡⚡⚡⚡⚡ ✅ Minimal memory usage (~0.4 GB) ✅ Extremely fast inference ❌ Average quality ⭐⭐⭐ ❌ Potential detail loss
### 🎯 LightVAE Series (Our Optimization) Features: Best Balanced Solution ⚖️ ✅ Uses Causal 3D Conv (same as official) ✅ Quality close to official ⭐⭐⭐⭐ ✅ Memory reduced by ~50% (~4-5 GB) ✅ Speed increased by 2-3x ✅ Balances quality, speed, and memory 🏆	### ⚡ LightTAE Series (Our Optimization) Features: Fast Speed + Good Quality 🏆 ✅ Minimal memory usage (~0.4 GB) ✅ Extremely fast inference ✅ Quality close to official ⭐⭐⭐⭐ ✅ Significantly surpasses open source TAE