|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
- comfyui |
|
|
- distillation |
|
|
- LoRA |
|
|
- video |
|
|
- video genration |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-I2V-A14B |
|
|
- Wan-AI/Wan2.2-TI2V-5B |
|
|
- Wan-AI/Wan2.1-I2V-14B-720P |
|
|
pipeline_tags: |
|
|
- image-to-video |
|
|
- text-to-video |
|
|
library_name: diffusers |
|
|
--- |
|
|
# π¨ LightVAE |
|
|
|
|
|
## β‘ Efficient Video Autoencoder (VAE) Model Collection |
|
|
|
|
|
*From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory* |
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
[](https://huggingface.co/lightx2v) |
|
|
[](https://github.com/ModelTC/LightX2V) |
|
|
[](LICENSE) |
|
|
|
|
|
--- |
|
|
|
|
|
For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: **LightVAE** and **LightTAE**, which significantly reduce memory consumption and improve inference speed while maintaining high quality. |
|
|
|
|
|
## π‘ Core Advantages |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
|
|
|
### π Official VAE |
|
|
**Features**: Highest Quality βββββ |
|
|
|
|
|
β
Best reconstruction accuracy |
|
|
β
Complete detail preservation |
|
|
β Large memory usage (~8-12 GB) |
|
|
β Slow inference speed |
|
|
|
|
|
</td> |
|
|
<td width="50%"> |
|
|
|
|
|
### π Open Source TAE Series |
|
|
**Features**: Fastest Speed β‘β‘β‘β‘β‘ |
|
|
|
|
|
β
Minimal memory usage (~0.4 GB) |
|
|
β
Extremely fast inference |
|
|
β Average quality βββ |
|
|
β Potential detail loss |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
|
|
|
### π― **LightVAE Series** (Our Optimization) |
|
|
**Features**: Best Balanced Solution βοΈ |
|
|
|
|
|
β
Uses **Causal 3D Conv** (same as official) |
|
|
β
**Quality close to official** ββββ |
|
|
β
Memory reduced by **~50%** (~4-5 GB) |
|
|
β
Speed increased by **2-3x** |
|
|
β
Balances quality, speed, and memory π |
|
|
|
|
|
</td> |
|
|
<td width="50%"> |
|
|
|
|
|
### β‘ **LightTAE Series** (Our Optimization) |
|
|
**Features**: Fast Speed + Good Quality π |
|
|
|
|
|
β
Minimal memory usage (~0.4 GB) |
|
|
β
Extremely fast inference |
|
|
β
**Quality close to official** ββββ |
|
|
β
**Significantly surpasses open source TAE** |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Available Models |
|
|
|
|
|
### π― Wan2.1 Series VAE |
|
|
|
|
|
| Model Name | Type | Architecture | Description | |
|
|
|:--------|:-----|:-----|:-----| |
|
|
| `Wan2.1_VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model<br>**Highest quality, large memory, slow speed** | |
|
|
| `taew2_1` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** | |
|
|
| **`lighttaew2_1`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_1`**<br>**Small memory, fast speed, quality close to official** β¨ | |
|
|
| **`lightvaew2_1`** | **LightVAE Series** | Causal Conv3D | **Our pruned 75% on WanVAE2.1 architecture then trained+distilled**<br>**Best balance: high quality + low memory + fast speed** π | |
|
|
|
|
|
### π― Wan2.2 Series VAE |
|
|
|
|
|
| Model Name | Type | Architecture | Description | |
|
|
|:--------|:-----|:-----|:-----| |
|
|
| `Wan2.2_VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model<br>**Highest quality, large memory, slow speed** | |
|
|
| `taew2_2` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** | |
|
|
| **`lighttaew2_2`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_2`**<br>**Small memory, fast speed, quality close to official** β¨ | |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## π Wan2.1 Series Performance Comparison |
|
|
- **Precision**: BF16 |
|
|
- **Test Hardware**: NVIDIA H100 |
|
|
|
|
|
### Video Reconstruction (5s 81-frame video) |
|
|
|
|
|
|Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 | |
|
|
|:-----|:--------------|:------------|:---------------------|:-------------| |
|
|
| **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s | |
|
|
| **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s | |
|
|
|
|
|
|GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 | |
|
|
|:-----|:--------------|:------------|:---------------------|:-------------| |
|
|
| **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB | |
|
|
| **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB | |
|
|
|
|
|
### Video Generation |
|
|
|
|
|
Task: s2v(speech to video) |
|
|
Model: seko-talk |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="25%" align="center"> |
|
|
<strong>Wan2.1_VAE</strong><br> |
|
|
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4"></video> |
|
|
</td> |
|
|
<td width="25%" align="center"> |
|
|
<strong>taew2_1</strong><br> |
|
|
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4"></video> |
|
|
</td> |
|
|
<td width="25%" align="center"> |
|
|
<strong>lighttaew2_1</strong><br> |
|
|
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4"></video> |
|
|
</td> |
|
|
<td width="25%" align="center"> |
|
|
<strong>lightvaew2_1</strong><br> |
|
|
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4"></video> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
## π Wan2.2 Series Performance Comparison |
|
|
- **Precision**: BF16 |
|
|
- **Test Hardware**: NVIDIA H100 |
|
|
|
|
|
### Video Reconstruction |
|
|
| Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 | |
|
|
|:-----|:--------------|:------------|:---------------------| |
|
|
| **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s | |
|
|
| **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s| |
|
|
|
|
|
| GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 | |
|
|
|:-----|:--------------|:------------|:---------------------| |
|
|
| **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB | |
|
|
| **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB | |
|
|
|
|
|
|
|
|
### Video Generation |
|
|
|
|
|
Task: t2v(text to video) |
|
|
Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="33%" align="center"> |
|
|
<strong>Wan2.2_VAE</strong><br> |
|
|
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4"></video> |
|
|
</td> |
|
|
<td width="33%" align="center"> |
|
|
<strong>taew2_2</strong><br> |
|
|
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4"></video> |
|
|
</td> |
|
|
<td width="33%" align="center"> |
|
|
<strong>lighttaew2_2</strong><br> |
|
|
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4"></video> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
|
|
|
## π― Model Selection Recommendations |
|
|
|
|
|
### Selection by Use Case |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="33%"> |
|
|
|
|
|
#### π Pursuing Best Quality |
|
|
**Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE` |
|
|
|
|
|
- β
Official model, quality ceiling |
|
|
- β
Highest reconstruction accuracy |
|
|
- β
Suitable for final product output |
|
|
- β οΈ **Large memory usage** (~8-12 GB) |
|
|
- β οΈ **Slow inference speed** |
|
|
|
|
|
</td> |
|
|
<td width="33%"> |
|
|
|
|
|
#### βοΈ **Best Balance** π |
|
|
**Recommended**: **`lightvaew2_1`** |
|
|
|
|
|
- β
**Uses Causal 3D Conv** (same as official) |
|
|
- β
**Excellent quality**, close to official |
|
|
- β
Memory reduced by **~50%** (~4-5 GB) |
|
|
- β
Speed increased by **2-3x** |
|
|
- β
**Close to official quality** ββββ |
|
|
|
|
|
**Use Cases**: Daily production, strongly recommended β |
|
|
|
|
|
</td> |
|
|
<td width="33%"> |
|
|
|
|
|
#### β‘ **Speed + Quality Balance** β¨ |
|
|
**Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`** |
|
|
|
|
|
- β
Extremely low memory usage (~0.4 GB) |
|
|
- β
Extremely fast inference |
|
|
- β
**Quality significantly surpasses open source TAE** |
|
|
- β
**Close to official quality** ββββ |
|
|
|
|
|
**Use Cases**: Development testing, rapid iteration |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
### π₯ Our Optimization Results Comparison |
|
|
|
|
|
| Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** | |
|
|
|:------|:--------|:---------------------|:---------|:---------------------| |
|
|
| **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D | |
|
|
| **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) | |
|
|
| **Inference Speed** | Extremely Fast β‘β‘β‘β‘β‘ | Extremely Fast β‘β‘β‘β‘β‘ | Slow β‘β‘ | Fast β‘β‘β‘β‘ | |
|
|
| **Generation Quality** | Average βββ | **Close to Official** ββββ | Highest βββββ | **Close to Official** ββββ | |
|
|
|
|
|
## π Todo List |
|
|
- [x] LightX2V integration |
|
|
- [x] ComfyUI integration |
|
|
- [ ] Training & Distillation Code |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Download VAE Models |
|
|
|
|
|
```bash |
|
|
# Download Wan2.1 official VAE |
|
|
huggingface-cli download lightx2v/Autoencoders \ |
|
|
--local-dir ./models/vae/ |
|
|
``` |
|
|
|
|
|
### π§ͺ Video Reconstruction Test |
|
|
|
|
|
We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality. |
|
|
|
|
|
**Script Location**: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py` |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/ModelTC/LightX2V.git |
|
|
cd LightX2V |
|
|
``` |
|
|
|
|
|
**1. Test Official VAE (Wan2.1)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/Wan2.1_VAE.pth \ |
|
|
--model_type vaew2_1 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
**2. Test Official VAE (Wan2.2)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/Wan2.2_VAE.pth \ |
|
|
--model_type vaew2_2 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
**3. Test LightTAE (Wan2.1)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/lighttaew2_1.pth \ |
|
|
--model_type taew2_1 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
**4. Test LightTAE (Wan2.2)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/lighttaew2_2.pth \ |
|
|
--model_type taew2_2 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
**5. Test LightVAE (Wan2.1)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/lightvaew2_1.pth \ |
|
|
--model_type vaew2_1 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 \ |
|
|
--use_lightvae |
|
|
``` |
|
|
|
|
|
|
|
|
**6. Test TAE (Wan2.1)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/taew2_1.pth \ |
|
|
--model_type taew2_1 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
**7. Test TAE (Wan2.2)** |
|
|
```bash |
|
|
python -m lightx2v.models.video_encoders.hf.vid_recon \ |
|
|
input_video.mp4 \ |
|
|
--checkpoint ./models/vae/taew2_2.pth \ |
|
|
--model_type taew2_1 \ |
|
|
--device cuda \ |
|
|
--dtype bfloat16 |
|
|
``` |
|
|
|
|
|
### Use in LightX2V |
|
|
|
|
|
Specify the VAE path in the configuration file: |
|
|
|
|
|
|
|
|
**Using Official VAE Series:** |
|
|
```json |
|
|
{ |
|
|
|
|
|
"vae_path": "./models/vae/Wan2.1_VAE.pth" |
|
|
} |
|
|
``` |
|
|
|
|
|
**Using LightVAE Series:** |
|
|
```json |
|
|
{ |
|
|
"use_lightvae": true, |
|
|
"vae_path": "./models/vae/lightvaew2_1.pth" |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
**Using LightTAE Series:** |
|
|
```json |
|
|
{ |
|
|
"use_tae": true, |
|
|
"need_scaled": true, |
|
|
"tae_path": "./models/vae/lighttaew2_1.pth" |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
**Using TAE Series:** |
|
|
```json |
|
|
{ |
|
|
"use_tae": true, |
|
|
"tae_path": "./models/vae/taew2_1.pth" |
|
|
} |
|
|
``` |
|
|
|
|
|
Then run the inference script: |
|
|
|
|
|
```bash |
|
|
cd LightX2V/scripts |
|
|
bash wan/run_wan_i2v.sh # or other inference scripts |
|
|
``` |
|
|
|
|
|
### Use in ComfyUI |
|
|
|
|
|
please refer to https://github.com/ModelTC/ComfyUI-LightVAE |
|
|
|
|
|
## β οΈ Important Notes |
|
|
|
|
|
### 1. Compatibility |
|
|
- Wan2.1 series VAE only works with Wan2.1 backbone models |
|
|
- Wan2.2 series VAE only works with Wan2.2 backbone models |
|
|
- Do not mix different versions of VAE and backbone models |
|
|
|
|
|
## π Related Resources |
|
|
|
|
|
### Documentation Links |
|
|
- **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html) |
|
|
- **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html) |
|
|
- **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV) |
|
|
|
|
|
### Related Models |
|
|
- **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI) |
|
|
- **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) |
|
|
- **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v) |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Community & Support |
|
|
|
|
|
- **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues |
|
|
- **HuggingFace**: https://huggingface.co/lightx2v |
|
|
- **LightX2V Homepage**: https://github.com/ModelTC/LightX2V |
|
|
|
|
|
If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V) |