Stable Diffusion v1.5 - LoRA Fine-tuned on Dog Images
A fine-tuned Stable Diffusion v1.5 model using LoRA (Low-Rank Adaptation) technique, trained on custom dog images. This model can generate personalized images using the trigger word "sks dog".
Model Details
Model Description
This model is a LoRA adapter fine-tuned on top of Stable Diffusion v1.5, specifically trained to generate images of a particular dog subject. The model uses LoRA technique to efficiently adapt the base model with minimal parameter updates (~3MB adapter vs ~4GB full model), enabling fast training and easy sharing while maintaining high quality outputs.
The model was trained on 5 dog images for 300 epochs, achieving excellent convergence and the ability to generate the subject in various styles, poses, and scenarios.
- Developed by: Eray Erdoğan (erdoganeray)
- Model type: Text-to-Image Diffusion Model with LoRA Adapter
- Language(s): English (prompts)
- License: MIT
- Finetuned from model: stable-diffusion-v1-5/stable-diffusion-v1-5
Model Sources
- Repository: stable_diffusion_lora_finetuning
- Hugging Face Model: erdoganeray/finetune-demo
Uses
Direct Use
This model is designed for generating personalized images of a specific dog subject in various contexts, styles, and scenarios. Users can create:
- Artistic renditions (oil painting, watercolor, pixel art)
- Themed scenes (beach, snow, space, underwater)
- Costume/accessory variations (sunglasses, crown, superhero cape)
- Different artistic styles (Van Gogh, Renaissance, cyberpunk)
Important: Use the trigger word "sks dog" in your prompts for best results.
Downstream Use
This model can be integrated into:
- Image generation applications
- Creative tools and workflows
- Custom Stable Diffusion pipelines
- Multi-LoRA compositions for complex scene generation
- Educational demonstrations of LoRA fine-tuning
Out-of-Scope Use
This model should not be used for:
- Generating images of people or other animals not in the training set
- Creating deepfakes or misleading content
- Any applications that violate ethical guidelines or laws
- Commercial use without proper attribution
Bias, Risks, and Limitations
- Limited Subject Range: The model is trained on a single dog subject and may not generalize well to other subjects
- Training Data Size: Only 5 images were used for training, which limits variation
- Trigger Word Dependency: Best results require using "sks dog" in prompts
- Inherited Biases: Inherits any biases present in the base Stable Diffusion v1.5 model
- Quality Variation: Output quality depends heavily on prompt engineering
Recommendations
Users should:
- Always use the trigger word "sks dog" for optimal results
- Experiment with different prompts and parameters for best outputs
- Be aware that this is a demonstration model trained on limited data
- Use negative prompts to avoid unwanted artifacts (e.g., "blurry, bad quality, distorted")
- Consider the ethical implications when generating and sharing images
How to Get Started with the Model
Installation
pip install diffusers transformers torch peft
Basic Usage
from diffusers import DiffusionPipeline
import torch
# Load base model
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# Load LoRA weights
pipe.load_lora_weights("erdoganeray/finetune-demo")
# Generate image
prompt = "a photo of sks dog wearing sunglasses"
negative_prompt = "blurry, bad quality, distorted"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
Example Prompts
# Artistic styles
"an oil painting of sks dog in the style of Van Gogh, vibrant colors"
"sks dog as a watercolor painting, soft colors, artistic"
"sks dog as a pixel art character, 8-bit style, retro gaming"
# Themed scenes
"sks dog on a tropical beach at sunset, palm trees, ocean waves"
"sks dog playing in snow, winter wonderland, snowflakes falling"
"sks dog in an underwater scene with colorful coral and fish"
# Costume variations
"sks dog wearing a golden crown, sitting on a throne, royal"
"sks dog dressed as a superhero with a cape, heroic pose"
"sks dog as a cyberpunk character, neon lights, futuristic city"
Training Details
Training Data
The model was trained on a custom dataset consisting of 5 high-quality images of a specific dog subject. The dataset was preprocessed and augmented to maximize learning from limited data.
- Dataset Size: 5 images
- Image Resolution: 512x512 pixels
- Subject: Single dog subject
- Trigger Word: "sks dog"
Training Procedure
Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Base Model: stable-diffusion-v1-5/stable-diffusion-v1-5
- Epochs: 300
- Learning Rate: 5e-5 (with Cosine Annealing scheduler)
- Learning Rate Decay: Min LR: 5e-6
- Optimizer: AdamW
- LoRA Rank: 32
- LoRA Alpha: 64
- LoRA Target Modules: to_k, to_q, to_v, to_out.0
- Inference Steps: 50
- Guidance Scale: 7.5
- Noise Scheduler: DDPM (Denoising Diffusion Probabilistic Model)
- Gradient Checkpointing: Enabled
- Mixed Precision: fp16
Training Loss
- Final Loss: 0.1804
- Average Loss: 0.0876
- Loss Trend: Steady decrease with occasional spikes, indicating good convergence
Checkpoints
Checkpoints saved at:
- checkpoint-100 (epoch 100)
- checkpoint-200 (epoch 200)
- checkpoint-300 (epoch 300, final)
Speeds, Sizes, Times
- Training Device: NVIDIA GPU (CUDA)
- Total Training Time: ~300 epochs
- Model Size: ~3MB (LoRA adapter only)
- Base Model Size: ~4GB (not included in adapter)
- Checkpoint Frequency: Every 100 epochs
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on 20 diverse text prompts covering:
- Realistic scenes (beach, garden, living room)
- Artistic styles (Van Gogh, watercolor, Renaissance, pixel art)
- Costume variations (sunglasses, crown, superhero cape, wizard hat)
- Themed scenarios (space, underwater, cyberpunk, fantasy)
Evaluation Approach
- Qualitative Assessment: Visual inspection of generated images
- Prompt Adherence: How well outputs match prompt descriptions
- Subject Fidelity: Consistency in reproducing the trained dog subject
- Style Versatility: Ability to generate diverse artistic styles
Metrics
- Loss Convergence: Final training loss of 0.1804
- Visual Quality: Subjective assessment of image clarity and coherence
- Prompt Following: Successful incorporation of scene elements and styles
- Subject Recognition: Clear identification of the trained dog subject
Results
The model demonstrates:
- Strong subject fidelity - consistently recognizes and reproduces the dog subject
- Style versatility - successfully generates diverse artistic styles
- Good prompt adherence - incorporates scene elements accurately
- Stable outputs - minimal artifacts or distortions
Sample outputs show the model can handle:
- Various environmental settings (indoor/outdoor)
- Different artistic interpretations
- Multiple costume and accessory variations
- Complex scene compositions
Environmental Impact
Training this model has environmental costs that should be considered.
- Hardware Type: NVIDIA GPU (CUDA-enabled)
- Training Duration: 300 epochs
- Compute Efficiency: LoRA technique reduces training time and memory by ~60-70% compared to full fine-tuning
- Model Size: 3MB adapter (vs 4GB for full model) - significantly reduces storage and transfer costs
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Technical Specifications
Model Architecture and Objective
- Base Architecture: U-Net based diffusion model (Stable Diffusion v1.5)
- Adaptation Method: LoRA (Low-Rank Adaptation)
- LoRA Configuration:
- Rank: 32
- Alpha: 64
- Target modules: Attention layers (to_k, to_q, to_v, to_out.0)
- Text Encoder: CLIP ViT-L/14
- VAE: Variational Autoencoder for latent space encoding/decoding
- Objective: Denoising diffusion probabilistic model training
Compute Infrastructure
Hardware
- GPU: NVIDIA CUDA-enabled GPU
- Memory: Sufficient VRAM for fp16 mixed precision training (recommended: 8GB+)
- Storage: Minimal requirements (~3MB for adapter)
Software
- Framework: PyTorch 2.0+
- Libraries:
diffusers>= 0.25.0transformers>= 4.35.0peft>= 0.10.0accelerate>= 0.25.0
- Python Version: 3.8+
- CUDA: Required for GPU acceleration
Citation
If you use this model in your work, please cite:
BibTeX:
@misc{erdogan2026sdlora,
author = {Erdoğan, Eray},
title = {Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/erdoganeray/finetune-demo}
}
APA:
Erdoğan, E. (2026). Stable Diffusion v1.5 LoRA Fine-tuned on Dog Images [Computer software]. Hugging Face. https://huggingface.co/erdoganeray/finetune-demo
More Information
For more details about the training process, implementation, and examples:
- GitHub Repository: stable_diffusion_lora_finetuning
- Training Notebook: Available in the repository under
notebook/ - Example Code: See
examples/directory for inference scripts - Generated Samples: 20 example outputs available in
outputs/generated_images/
Model Card Authors
Eray Erdoğan (@erdoganeray)
Model Card Contact
For questions, issues, or feedback:
- GitHub Issues: stable_diffusion_lora_finetuning/issues
- Hugging Face: @erdoganeray
Framework Versions
- PEFT: 0.10.0+
- Diffusers: 0.25.0+
- Transformers: 4.35.0+
- PyTorch: 2.0.0+
- Accelerate: 0.25.0+
- Downloads last month
- 19
Model tree for erdoganeray/finetune-demo
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5