--- library_name: transformers license: apache-2.0 base_model: google/vit-base-patch16-224 tags: - image-classification - chihiro - studio-ghibli - custom-dataset metrics: - accuracy - precision - recall model-index: - name: chihiro-classifier-vit results: - task: type: image-classification name: Image Classification dataset: name: Custom Ghibli Dataset type: imagefolder metrics: - name: Test Accuracy type: accuracy value: 0.9333 - name: Zero-shot CLIP Accuracy type: accuracy value: 0.8667 - name: Zero-shot Precision type: precision value: 0.8909 - name: Zero-shot Recall type: recall value: 0.8667 --- # chihiro-classifier-vit This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) trained on a small, custom binary classification dataset consisting of images labeled either "chihiro" or "not chihiro" (from Studio Ghibli films). It was trained using PyTorch with transfer learning on a dataset of approximately 148 images. ## Model description The model classifies images into one of two categories: **Chihiro** or **Not Chihiro**. It uses a Vision Transformer (ViT) backbone with a custom classification head for binary output. Data augmentation was used during training to improve generalization. Techniques included random horizontal flip, rotation (30°), color jitter, and random resized crop. ## Intended uses & limitations **Intended Uses:** - Student computer vision project **Limitations:** - Small dataset may limit real-world performance - Not robust to domain shift or artistic variation - Not intended for production deployment ## Training and evaluation data - Custom image dataset of Chihiro vs. non-Chihiro characters - Loaded using Hugging Face's `imagefolder` format - Split: 80% train, 10% validation, 10% test - Augmentation applied during training; deterministic preprocessing during eval ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam - num_epochs: 12 ### Training results | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | |:-----:|:----------:|:---------:|:--------:|:-------:| | 1 | 0.8325 | 58.47% | 0.7285 | 46.67% | | 2 | 0.6038 | 55.08% | 0.6931 | 60.00% | | 3 | 0.6047 | 67.80% | 0.6170 | 66.67% | | 4 | 0.4854 | 77.97% | 0.7272 | 66.67% | | 5 | 0.3989 | 79.66% | 0.5494 | 66.67% | | 6 | 0.3091 | 88.14% | 0.4649 | 86.67% | | 7 | 0.2651 | 88.98% | 0.5736 | 73.33% | | 8 | 0.2043 | 94.07% | 0.5335 | 73.33% | | 9 | 0.2668 | 87.29% | 0.5765 | 80.00% | | 10 | 0.2408 | 87.29% | 0.5346 | 73.33% | | 11 | 0.1047 | 95.76% | 0.4125 | 73.33% | | 12 | 0.1297 | 94.07% | 0.4084 | 86.67% | ### Final Test Evaluation - `Test Loss`: 0.3677 - `Test Accuracy`: 0.7333 ## 🧪 Zero-Shot CLIP Comparison Evaluated using `openai/clip-vit-base-patch32` with no fine-tuning: - `Zero-shot Accuracy`: 86.67% - `Precision`: 0.8909 - `Recall`: 0.8667 ## Framework versions - Transformers: not used (custom PyTorch) - PyTorch: 2.x - Datasets: 2.x - Tokenizers: N/A