PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper
•
2403.04692
•
Published
•
40
This model is a LoRA fine-tuned version of PixArt-alpha/PixArt-Sigma-XL-2-1024-MS on the lambdalabs/naruto-blip-captions dataset for generating anime style images.
from diffusers import PixArtSigmaPipeline
import torch
# Load pipeline
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
torch_dtype=torch.float16
).to("cuda")
# Load LoRA weights
pipe.load_lora_weights("matthew816/pixart-lora-anime")
# Generate image from text
prompt = "anime style, a cat sitting on a chair"
image = pipe(
prompt=prompt,
num_inference_steps=20,
guidance_scale=4.5
).images[0]
image.save("generated_anime_image.png")
This model generates images in anime style from text descriptions.
Example prompts:
If you use this model, please cite the original PixArt-Σ model and the dataset.
@article{chen2024pixart,
title={PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation},
author={Chen, Junsong and others},
journal={arXiv preprint arXiv:2403.04692},
year={2024}
}
Base model
PixArt-alpha/PixArt-Sigma-XL-2-1024-MS