Configuration Parsing Warning: In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Stable Diffusion v1.5 - LoRA Fine-tuned on Dog Images

A fine-tuned Stable Diffusion v1.5 model using LoRA (Low-Rank Adaptation) technique, trained on custom dog images. This model can generate personalized images using the trigger word "sks dog".

Model Details

Model Description

This model is a LoRA adapter fine-tuned on top of Stable Diffusion v1.5, specifically trained to generate images of a particular dog subject. The model uses LoRA technique to efficiently adapt the base model with minimal parameter updates (~3MB adapter vs ~4GB full model), enabling fast training and easy sharing while maintaining high quality outputs.

The model was trained on 5 dog images for 300 epochs, achieving excellent convergence and the ability to generate the subject in various styles, poses, and scenarios.

Developed by: Eray Erdoğan (erdoganeray)
Model type: Text-to-Image Diffusion Model with LoRA Adapter
Language(s): English (prompts)
License: MIT
Finetuned from model: stable-diffusion-v1-5/stable-diffusion-v1-5

Model Sources

Repository: stable_diffusion_lora_finetuning
Hugging Face Model: erdoganeray/finetune-demo

Uses

Direct Use

This model is designed for generating personalized images of a specific dog subject in various contexts, styles, and scenarios. Users can create:

Artistic renditions (oil painting, watercolor, pixel art)
Themed scenes (beach, snow, space, underwater)
Costume/accessory variations (sunglasses, crown, superhero cape)
Different artistic styles (Van Gogh, Renaissance, cyberpunk)

Important: Use the trigger word "sks dog" in your prompts for best results.

Downstream Use

This model can be integrated into:

Image generation applications
Creative tools and workflows
Custom Stable Diffusion pipelines
Multi-LoRA compositions for complex scene generation
Educational demonstrations of LoRA fine-tuning

Out-of-Scope Use

This model should not be used for:

Generating images of people or other animals not in the training set
Creating deepfakes or misleading content
Any applications that violate ethical guidelines or laws
Commercial use without proper attribution

Bias, Risks, and Limitations

Limited Subject Range: The model is trained on a single dog subject and may not generalize well to other subjects
Training Data Size: Only 5 images were used for training, which limits variation
Trigger Word Dependency: Best results require using "sks dog" in prompts
Inherited Biases: Inherits any biases present in the base Stable Diffusion v1.5 model
Quality Variation: Output quality depends heavily on prompt engineering

Recommendations

Users should:

Always use the trigger word "sks dog" for optimal results
Experiment with different prompts and parameters for best outputs
Be aware that this is a demonstration model trained on limited data
Use negative prompts to avoid unwanted artifacts (e.g., "blurry, bad quality, distorted")
Consider the ethical implications when generating and sharing images

How to Get Started with the Model

Installation

pip install diffusers transformers torch peft

Basic Usage

from diffusers import DiffusionPipeline
import torch

# Load base model
pipe = DiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA weights
pipe.load_lora_weights("erdoganeray/finetune-demo")

# Generate image
prompt = "a photo of sks dog wearing sunglasses"
negative_prompt = "blurry, bad quality, distorted"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]

image.save("output.png")

Example Prompts

# Artistic styles
"an oil painting of sks dog in the style of Van Gogh, vibrant colors"
"sks dog as a watercolor painting, soft colors, artistic"
"sks dog as a pixel art character, 8-bit style, retro gaming"

# Themed scenes
"sks dog on a tropical beach at sunset, palm trees, ocean waves"
"sks dog playing in snow, winter wonderland, snowflakes falling"
"sks dog in an underwater scene with colorful coral and fish"

# Costume variations
"sks dog wearing a golden crown, sitting on a throne, royal"
"sks dog dressed as a superhero with a cape, heroic pose"
"sks dog as a cyberpunk character, neon lights, futuristic city"

Training Details

Training Data

The model was trained on a custom dataset consisting of 5 high-quality images of a specific dog subject. The dataset was preprocessed and augmented to maximize learning from limited data.

Dataset Size: 5 images
Image Resolution: 512x512 pixels
Subject: Single dog subject
Trigger Word: "sks dog"

Training Procedure

Training Hyperparameters

Training regime: Mixed precision (fp16)
Base Model: stable-diffusion-v1-5/stable-diffusion-v1-5
Epochs: 300
Learning Rate: 5e-5 (with Cosine Annealing scheduler)
Learning Rate Decay: Min LR: 5e-6
Optimizer: AdamW
LoRA Rank: 32
LoRA Alpha: 64
LoRA Target Modules: to_k, to_q, to_v, to_out.0
Inference Steps: 50
Guidance Scale: 7.5
Noise Scheduler: DDPM (Denoising Diffusion Probabilistic Model)
Gradient Checkpointing: Enabled
Mixed Precision: fp16

Training Loss

Final Loss: 0.1804
Average Loss: 0.0876
Loss Trend: Steady decrease with occasional spikes, indicating good convergence

Checkpoints

Checkpoints saved at:

checkpoint-100 (epoch 100)
checkpoint-200 (epoch 200)
checkpoint-300 (epoch 300, final)

Speeds, Sizes, Times

Training Device: NVIDIA GPU (CUDA)
Total Training Time: ~300 epochs
Model Size: ~3MB (LoRA adapter only)
Base Model Size: ~4GB (not included in adapter)
Checkpoint Frequency: Every 100 epochs

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on 20 diverse text prompts covering:

Realistic scenes (beach, garden, living room)
Artistic styles (Van Gogh, watercolor, Renaissance, pixel art)
Costume variations (sunglasses, crown, superhero cape, wizard hat)
Themed scenarios (space, underwater, cyberpunk, fantasy)

Evaluation Approach

Qualitative Assessment: Visual inspection of generated images
Prompt Adherence: How well outputs match prompt descriptions
Subject Fidelity: Consistency in reproducing the trained dog subject
Style Versatility: Ability to generate diverse artistic styles

Metrics

Loss Convergence: Final training loss of 0.1804
Visual Quality: Subjective assessment of image clarity and coherence
Prompt Following: Successful incorporation of scene elements and styles
Subject Recognition: Clear identification of the trained dog subject

Results

The model demonstrates:

Strong subject fidelity - consistently recognizes and reproduces the dog subject
Style versatility - successfully generates diverse artistic styles
Good prompt adherence - incorporates scene elements accurately
Stable outputs - minimal artifacts or distortions

Sample outputs show the model can handle:

Various environmental settings (indoor/outdoor)
Different artistic interpretations
Multiple costume and accessory variations
Complex scene compositions

Environmental Impact

Training this model has environmental costs that should be considered.

Hardware Type: NVIDIA GPU (CUDA-enabled)
Training Duration: 300 epochs
Compute Efficiency: LoRA technique reduces training time and memory by ~60-70% compared to full fine-tuning
Model Size: 3MB adapter (vs 4GB for full model) - significantly reduces storage and transfer costs

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

Base Architecture: U-Net based diffusion model (Stable Diffusion v1.5)
Adaptation Method: LoRA (Low-Rank Adaptation)
LoRA Configuration:
- Rank: 32
- Alpha: 64
- Target modules: Attention layers (to_k, to_q, to_v, to_out.0)
Text Encoder: CLIP ViT-L/14
VAE: Variational Autoencoder for latent space encoding/decoding
Objective: Denoising diffusion probabilistic model training

Compute Infrastructure

Hardware

GPU: NVIDIA CUDA-enabled GPU
Memory: Sufficient VRAM for fp16 mixed precision training (recommended: 8GB+)
Storage: Minimal requirements (~3MB for adapter)

Software

Framework: PyTorch 2.0+
Libraries:
- diffusers >= 0.25.0
- transformers >= 4.35.0
- peft >= 0.10.0
- accelerate >= 0.25.0
Python Version: 3.8+
CUDA: Required for GPU acceleration

Citation

If you use this model in your work, please cite:

BibTeX:

@misc{erdogan2026sdlora,
  author = {Erdoğan, Eray},
  title = {Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/erdoganeray/finetune-demo}
}

APA:

Erdoğan, E. (2026). Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images [Computer software]. Hugging Face. https://huggingface.co/erdoganeray/finetune-demo

More Information

For more details about the training process, implementation, and examples:

GitHub Repository: stable_diffusion_lora_finetuning
Training Notebook: Available in the repository under notebook/
Example Code: See examples/ directory for inference scripts
Generated Samples: 20 example outputs available in outputs/generated_images/