--- library_name: transformers license: apache-2.0 base_model: google/vit-base-patch16-224 tags: - image-classification - cifar10 - computer-vision - vision-transformer - transfer-learning metrics: - accuracy model-index: - name: vit-base-cifar10-augmented results: - task: type: image-classification name: Image Classification dataset: name: CIFAR-10 type: cifar10 metrics: - type: accuracy value: 0.9554 --- # vit-base-cifar10-augmented This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) using data augmentation. It achieves the following results on the evaluation set: - **Loss:** 0.0445 - **Accuracy:** 95.54% ## 🧠 Model Description The base model is a Vision Transformer (ViT) originally trained on ImageNet-21k. This version has been fine-tuned on CIFAR-10, a standard image classification benchmark, using PyTorch and Hugging Face Transformers. Training was done using extensive **data augmentation**, including random crops, flips, rotations, and color jitter to improve generalization on small input images (32×32, resized to 224×224). ## ✅ Intended Uses & Limitations ### Intended uses - Educational and research use on small image classification tasks - Benchmarking transfer learning for ViT on CIFAR-10 - Demonstrating the impact of data augmentation on fine-tuning performance ### Limitations - Not optimized for real-time inference - Fine-tuned only on CIFAR-10; not suitable for general-purpose image classification - Requires resized input (224×224) ## 📦 Training and Evaluation Data - **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) - **Size**: 60,000 images (10 classes) - **Split**: 75% training, 25% test All images were resized to 224×224 and normalized using ViT’s original mean/std values. ## ⚙️ Training Procedure ### Hyperparameters - Learning rate: `1e-4` - Optimizer: `Adam` - Batch size: `8` - Epochs: `10` - Scheduler: `ReduceLROnPlateau` ### Data Augmentation Used - `RandomResizedCrop(224)` - `RandomHorizontalFlip()` - `RandomRotation(10)` - `ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)` ### Training Results | Epoch | Training Loss | Test Accuracy | |-------|---------------|---------------| | 1 | 0.1969 | 94.62% | | 2 | 0.1189 | 95.05% | | 3 | 0.0899 | **95.54%** | | 4 | 0.0720 | 94.68% | | 5 | 0.0650 | 94.84% | | 6 | 0.0576 | 94.76% | | 7 | 0.0560 | 95.33% | | 8 | 0.0488 | 94.31% | | 9 | 0.0499 | 95.42% | | 10 | 0.0445 | 94.33% | ## 🧪 Framework Versions - `transformers`: 4.50.0 - `torch`: 2.6.0+cu124 - `datasets`: 3.4.1 - `tokenizers`: 0.21.1