LongCat-Image

Introduction

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.

LongCat-Image Generation Examples

Key Features

  • ๐ŸŒŸ Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
  • ๐ŸŒŸ Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
  • ๐ŸŒŸ Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.

๐ŸŽจ Showcase

LongCat-Image Generation Examples

Quick Start

Installation

Clone the repo:

git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image
cd LongCat-Image

Install dependencies:

# create conda environment
conda create -n longcat-image python=3.10
conda activate longcat-image

# install other requirements
pip install -r requirements.txt
python setup.py develop

Run Text-to-Image Generation

Leveraging a stronger LLM for prompt refinement can further enhance image generation quality. Please refer to inference_t2i.py for detailed usage instructions.

Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within quotes ("").

Reason: The tokenizer applies character-level encoding specifically to content found inside quotes. Failure to use explicit quotation marks will result in a significant degradation of text rendering quality.

import torch
from transformers import AutoProcessor
from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImagePipeline

device = torch.device('cuda')
checkpoint_dir = './weights/LongCat-Image'

text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer'  )
transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer', 
    torch_dtype=torch.bfloat16, use_safetensors=True).to(device)

pipe = LongCatImagePipeline.from_pretrained(
    checkpoint_dir,
    transformer=transformer,
    text_processor=text_processor
)
# pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM

prompt = 'ไธ€ไธชๅนด่ฝป็š„ไบš่ฃ”ๅฅณๆ€ง๏ผŒ่บซ็ฉฟ้ป„่‰ฒ้’ˆ็ป‡่กซ๏ผŒๆญ้…็™ฝ่‰ฒ้กน้“พใ€‚ๅฅน็š„ๅŒๆ‰‹ๆ”พๅœจ่†็›–ไธŠ๏ผŒ่กจๆƒ…ๆฌ้™ใ€‚่ƒŒๆ™ฏๆ˜ฏไธ€ๅ ต็ฒ—็ณ™็š„็ –ๅข™๏ผŒๅˆๅŽ็š„้˜ณๅ…‰ๆธฉๆš–ๅœฐๆด’ๅœจๅฅน่บซไธŠ๏ผŒ่ฅ้€ ๅ‡บไธ€็งๅฎ้™่€Œๆธฉ้ฆจ็š„ๆฐ›ๅ›ดใ€‚้•œๅคด้‡‡็”จไธญ่ท็ฆป่ง†่ง’๏ผŒ็ชๅ‡บๅฅน็š„็ฅžๆ€ๅ’Œๆœ้ฅฐ็š„็ป†่Š‚ใ€‚ๅ…‰็บฟๆŸ”ๅ’Œๅœฐๆ‰“ๅœจๅฅน็š„่„ธไธŠ๏ผŒๅผบ่ฐƒๅฅน็š„ไบ”ๅฎ˜ๅ’Œ้ฅฐๅ“็š„่ดจๆ„Ÿ๏ผŒๅขžๅŠ ็”ป้ข็š„ๅฑ‚ๆฌกๆ„ŸไธŽไบฒๅ’ŒๅŠ›ใ€‚ๆ•ดไธช็”ป้ขๆž„ๅ›พ็ฎ€ๆด๏ผŒ็ –ๅข™็š„็บน็†ไธŽ้˜ณๅ…‰็š„ๅ…‰ๅฝฑๆ•ˆๆžœ็›ธๅพ—็›Šๅฝฐ๏ผŒ็ชๆ˜พๅ‡บไบบ็‰ฉ็š„ไผ˜้›…ไธŽไปŽๅฎนใ€‚'

image = pipe(
    prompt,
    height=768,
    width=1344,
    guidance_scale=4.5,
    num_inference_steps=50,
    num_images_per_prompt=1,
    generator=torch.Generator("cpu").manual_seed(43),
    enable_cfg_renorm=True,
    enable_prompt_rewrite=True # Reusing the text encoder as a built-in prompt rewriter
).images[0]
image.save('./t2i_example.png')
Downloads last month
449
Inference Providers NEW

Model tree for meituan-longcat/LongCat-Image

Quantizations
1 model

Spaces using meituan-longcat/LongCat-Image 10