Configuration Parsing Warning: In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Stable Diffusion v1.5 - LoRA Fine-tuned on Dog Images

A fine-tuned Stable Diffusion v1.5 model using LoRA (Low-Rank Adaptation) technique, trained on custom dog images. This model can generate personalized images using the trigger word "sks dog".

Model Details

Model Description

This model is a LoRA adapter fine-tuned on top of Stable Diffusion v1.5, specifically trained to generate images of a particular dog subject. The model uses LoRA technique to efficiently adapt the base model with minimal parameter updates (~3MB adapter vs ~4GB full model), enabling fast training and easy sharing while maintaining high quality outputs.

The model was trained on 5 dog images for 300 epochs, achieving excellent convergence and the ability to generate the subject in various styles, poses, and scenarios.

  • Developed by: Eray Erdoğan (erdoganeray)
  • Model type: Text-to-Image Diffusion Model with LoRA Adapter
  • Language(s): English (prompts)
  • License: MIT
  • Finetuned from model: stable-diffusion-v1-5/stable-diffusion-v1-5

Model Sources

Uses

Direct Use

This model is designed for generating personalized images of a specific dog subject in various contexts, styles, and scenarios. Users can create:

  • Artistic renditions (oil painting, watercolor, pixel art)
  • Themed scenes (beach, snow, space, underwater)
  • Costume/accessory variations (sunglasses, crown, superhero cape)
  • Different artistic styles (Van Gogh, Renaissance, cyberpunk)

Important: Use the trigger word "sks dog" in your prompts for best results.

Downstream Use

This model can be integrated into:

  • Image generation applications
  • Creative tools and workflows
  • Custom Stable Diffusion pipelines
  • Multi-LoRA compositions for complex scene generation
  • Educational demonstrations of LoRA fine-tuning

Out-of-Scope Use

This model should not be used for:

  • Generating images of people or other animals not in the training set
  • Creating deepfakes or misleading content
  • Any applications that violate ethical guidelines or laws
  • Commercial use without proper attribution

Bias, Risks, and Limitations

  • Limited Subject Range: The model is trained on a single dog subject and may not generalize well to other subjects
  • Training Data Size: Only 5 images were used for training, which limits variation
  • Trigger Word Dependency: Best results require using "sks dog" in prompts
  • Inherited Biases: Inherits any biases present in the base Stable Diffusion v1.5 model
  • Quality Variation: Output quality depends heavily on prompt engineering

Recommendations

Users should:

  • Always use the trigger word "sks dog" for optimal results
  • Experiment with different prompts and parameters for best outputs
  • Be aware that this is a demonstration model trained on limited data
  • Use negative prompts to avoid unwanted artifacts (e.g., "blurry, bad quality, distorted")
  • Consider the ethical implications when generating and sharing images

How to Get Started with the Model

Installation

pip install diffusers transformers torch peft

Basic Usage

from diffusers import DiffusionPipeline
import torch

# Load base model
pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA weights
pipe.load_lora_weights("erdoganeray/finetune-demo")

# Generate image
prompt = "a photo of sks dog wearing sunglasses"
negative_prompt = "blurry, bad quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]

image.save("output.png")

Example Prompts

# Artistic styles
"an oil painting of sks dog in the style of Van Gogh, vibrant colors"
"sks dog as a watercolor painting, soft colors, artistic"
"sks dog as a pixel art character, 8-bit style, retro gaming"

# Themed scenes
"sks dog on a tropical beach at sunset, palm trees, ocean waves"
"sks dog playing in snow, winter wonderland, snowflakes falling"
"sks dog in an underwater scene with colorful coral and fish"

# Costume variations
"sks dog wearing a golden crown, sitting on a throne, royal"
"sks dog dressed as a superhero with a cape, heroic pose"
"sks dog as a cyberpunk character, neon lights, futuristic city"

Training Details

Training Data

The model was trained on a custom dataset consisting of 5 high-quality images of a specific dog subject. The dataset was preprocessed and augmented to maximize learning from limited data.

  • Dataset Size: 5 images
  • Image Resolution: 512x512 pixels
  • Subject: Single dog subject
  • Trigger Word: "sks dog"

Training Procedure

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Base Model: stable-diffusion-v1-5/stable-diffusion-v1-5
  • Epochs: 300
  • Learning Rate: 5e-5 (with Cosine Annealing scheduler)
  • Learning Rate Decay: Min LR: 5e-6
  • Optimizer: AdamW
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • LoRA Target Modules: to_k, to_q, to_v, to_out.0
  • Inference Steps: 50
  • Guidance Scale: 7.5
  • Noise Scheduler: DDPM (Denoising Diffusion Probabilistic Model)
  • Gradient Checkpointing: Enabled
  • Mixed Precision: fp16

Training Loss

  • Final Loss: 0.1804
  • Average Loss: 0.0876
  • Loss Trend: Steady decrease with occasional spikes, indicating good convergence

Checkpoints

Checkpoints saved at:

  • checkpoint-100 (epoch 100)
  • checkpoint-200 (epoch 200)
  • checkpoint-300 (epoch 300, final)

Speeds, Sizes, Times

  • Training Device: NVIDIA GPU (CUDA)
  • Total Training Time: ~300 epochs
  • Model Size: ~3MB (LoRA adapter only)
  • Base Model Size: ~4GB (not included in adapter)
  • Checkpoint Frequency: Every 100 epochs

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on 20 diverse text prompts covering:

  • Realistic scenes (beach, garden, living room)
  • Artistic styles (Van Gogh, watercolor, Renaissance, pixel art)
  • Costume variations (sunglasses, crown, superhero cape, wizard hat)
  • Themed scenarios (space, underwater, cyberpunk, fantasy)

Evaluation Approach

  • Qualitative Assessment: Visual inspection of generated images
  • Prompt Adherence: How well outputs match prompt descriptions
  • Subject Fidelity: Consistency in reproducing the trained dog subject
  • Style Versatility: Ability to generate diverse artistic styles

Metrics

  • Loss Convergence: Final training loss of 0.1804
  • Visual Quality: Subjective assessment of image clarity and coherence
  • Prompt Following: Successful incorporation of scene elements and styles
  • Subject Recognition: Clear identification of the trained dog subject

Results

The model demonstrates:

  • Strong subject fidelity - consistently recognizes and reproduces the dog subject
  • Style versatility - successfully generates diverse artistic styles
  • Good prompt adherence - incorporates scene elements accurately
  • Stable outputs - minimal artifacts or distortions

Sample outputs show the model can handle:

  • Various environmental settings (indoor/outdoor)
  • Different artistic interpretations
  • Multiple costume and accessory variations
  • Complex scene compositions

Environmental Impact

Training this model has environmental costs that should be considered.

  • Hardware Type: NVIDIA GPU (CUDA-enabled)
  • Training Duration: 300 epochs
  • Compute Efficiency: LoRA technique reduces training time and memory by ~60-70% compared to full fine-tuning
  • Model Size: 3MB adapter (vs 4GB for full model) - significantly reduces storage and transfer costs

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

  • Base Architecture: U-Net based diffusion model (Stable Diffusion v1.5)
  • Adaptation Method: LoRA (Low-Rank Adaptation)
  • LoRA Configuration:
    • Rank: 32
    • Alpha: 64
    • Target modules: Attention layers (to_k, to_q, to_v, to_out.0)
  • Text Encoder: CLIP ViT-L/14
  • VAE: Variational Autoencoder for latent space encoding/decoding
  • Objective: Denoising diffusion probabilistic model training

Compute Infrastructure

Hardware

  • GPU: NVIDIA CUDA-enabled GPU
  • Memory: Sufficient VRAM for fp16 mixed precision training (recommended: 8GB+)
  • Storage: Minimal requirements (~3MB for adapter)

Software

  • Framework: PyTorch 2.0+
  • Libraries:
    • diffusers >= 0.25.0
    • transformers >= 4.35.0
    • peft >= 0.10.0
    • accelerate >= 0.25.0
  • Python Version: 3.8+
  • CUDA: Required for GPU acceleration

Citation

If you use this model in your work, please cite:

BibTeX:

@misc{erdogan2026sdlora,
  author = {Erdoğan, Eray},
  title = {Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/erdoganeray/finetune-demo}
}

APA:

Erdoğan, E. (2026). Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images [Computer software]. Hugging Face. https://huggingface.co/erdoganeray/finetune-demo

More Information

For more details about the training process, implementation, and examples:

  • GitHub Repository: stable_diffusion_lora_finetuning
  • Training Notebook: Available in the repository under notebook/
  • Example Code: See examples/ directory for inference scripts
  • Generated Samples: 20 example outputs available in outputs/generated_images/

Model Card Authors

Eray Erdoğan (@erdoganeray)

Model Card Contact

For questions, issues, or feedback:

Framework Versions

  • PEFT: 0.10.0+
  • Diffusers: 0.25.0+
  • Transformers: 4.35.0+
  • PyTorch: 2.0.0+
  • Accelerate: 0.25.0+
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for erdoganeray/finetune-demo

Adapter
(592)
this model

Paper for erdoganeray/finetune-demo