test-upload-model / README.md

Update model card with comprehensive documentation

0c67a17 verified 6 months ago

8.99 kB

	---
	license: mit
	base_model: MCG-NJU/videomae-base
	tags:
	- video-classification
	- crime-detection
	- violence-detection
	- videomae
	- computer-vision
	- security
	- surveillance
	- generated_from_trainer
	language:
	- en
	datasets:
	- jinmang2/ucf_crime
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	pipeline_tag: video-classification
	model-index:
	- name: test-upload-model
	results:
	- task:
	name: Violence Detection
	type: video-classification
	dataset:
	name: UCF Crime Dataset (Subset)
	type: jinmang2/ucf_crime
	args: violence_detection
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.5000
	- name: Precision
	type: precision
	value: 0.2500
	- name: Recall
	type: recall
	value: 0.5000
	- name: F1
	type: f1
	value: 0.3333
	---

	# Nikeytas/Test Upload Model

	This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with event-based binary classification. It achieves the following results on the evaluation set:

	- Loss: 0.5847
	- Accuracy: 0.5000
	- Precision: 0.2500
	- Recall: 0.5000
	- F1 Score: 0.3333

	## 🎯 Model Overview

	This VideoMAE model has been fine-tuned for binary violence detection in video content. The model classifies videos into two categories:
	- Violent Crime (1): Videos containing violent criminal activities
	- Non-Violent Incident (0): Videos with non-violent or normal activities

	The model is based on the VideoMAE architecture and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.

	## 📊 Dataset & Training

	### Dataset Composition

	Total Videos: 20
	- Violent Crime Videos: 10
	- Non-Violent Incident Videos: 10

	Class Balance: 50.0% violent crimes

	Event Distribution:
	- Arrest: 20 videos
	- Arson: 20 videos

	Data Splits:
	- Training: 12 videos
	- Validation: 4 videos
	- Test: 4 videos

	## 🎯 Performance

	### Performance Metrics

	Validation Performance:
	- eval_loss: 0.5847
	- eval_accuracy: 0.5000
	- eval_precision: 0.2500
	- eval_recall: 0.5000
	- eval_f1: 0.3333
	- eval_runtime: 0.6636
	- eval_samples_per_second: 6.0270
	- eval_steps_per_second: 3.0140
	- epoch: 1.0000

	Test Performance:
	- eval_loss: 0.6700
	- eval_accuracy: 0.5000
	- eval_precision: 0.2500
	- eval_recall: 0.5000
	- eval_f1: 0.3333
	- eval_runtime: 0.4271
	- eval_samples_per_second: 9.3660
	- eval_steps_per_second: 4.6830
	- epoch: 1.0000

	Training Information:
	- Training Time: 0.1 minutes
	- Best Accuracy Achieved: 0.5000
	- Model Architecture: VideoMAE Base (fine-tuned)
	- Fine-tuning Approach: Event-based binary classification

	## 🚀 Training Procedure

	### Training Hyperparameters

	The following hyperparameters were used during training:
	- Learning Rate: 5e-05
	- Train Batch Size: 2
	- Eval Batch Size: 2
	- Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
	- LR Scheduler Type: Linear
	- Training Epochs: 1
	- Weight Decay: 0.01

	### Training Results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|---------------\|-------\|------\|-----------------\|----------\|
	\| 0.5 \| 1.00 \| N/A \| 0.5847 \| 0.5000 \|

	### Framework Versions

	- Transformers: 4.30.2+
	- PyTorch: 2.0.1+
	- Datasets: Latest
	- Device: Apple Silicon MPS / CUDA / CPU (Auto-detected)

	## 🚀 Quick Start

	### Installation

	```bash
	pip install transformers torch torchvision opencv-python pillow
	```

	### Basic Usage

	```python
	import torch
	from transformers import AutoModelForVideoClassification, AutoProcessor
	import cv2
	import numpy as np

	# Load model and processor
	model = AutoModelForVideoClassification.from_pretrained("Nikeytas/test-upload-model")
	processor = AutoProcessor.from_pretrained("Nikeytas/test-upload-model")

	# Process video
	def classify_video(video_path, num_frames=16):
	# Extract frames
	cap = cv2.VideoCapture(video_path)
	frames = []

	total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
	indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)

	for idx in indices:
	cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
	ret, frame = cap.read()
	if ret:
	frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frames.append(frame_rgb)

	cap.release()

	# Process with model
	inputs = processor(frames, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1).item()
	confidence = predictions[0][predicted_class].item()

	label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
	return label, confidence

	# Example usage
	video_path = "path/to/your/video.mp4"
	prediction, confidence = classify_video(video_path)
	print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
	```

	### Batch Processing

	```python
	import os
	from pathlib import Path

	def process_video_directory(video_dir, output_file="results.txt"):
	results = []

	for video_file in Path(video_dir).glob("*.mp4"):
	try:
	prediction, confidence = classify_video(str(video_file))
	results.append({
	"file": video_file.name,
	"prediction": prediction,
	"confidence": confidence
	})
	print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})")
	except Exception as e:
	print(f"❌ Error processing {video_file.name}: {e}")

	# Save results
	with open(output_file, "w") as f:
	for result in results:
	f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")

	return results

	# Process all videos in a directory
	results = process_video_directory("./videos/")
	```

	## 📈 Technical Specifications

	- Base Model: MCG-NJU/videomae-base
	- Architecture: Vision Transformer (ViT) adapted for video
	- Input Resolution: 224x224 pixels per frame
	- Temporal Resolution: 16 frames per video clip
	- Output Classes: 2 (Binary classification)
	- Training Framework: HuggingFace Transformers
	- Optimization: AdamW optimizer with learning rate 5e-5

	## ⚠️ Limitations

	1. Dataset Scope: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
	2. Temporal Context: Uses 16-frame clips which may miss context in longer sequences
	3. Environmental Bias: Performance may vary with different lighting, camera angles, and video quality
	4. False Positives: May misclassify intense but non-violent activities (sports, action movies)
	5. Real-time Performance: Processing time depends on hardware capabilities

	## 🔒 Ethical Considerations

	### Intended Use
	- Primary: Research and development in video analysis
	- Secondary: Security system enhancement with human oversight
	- Educational: Computer vision and AI safety research

	### Prohibited Uses
	- Surveillance without consent: Do not use for unauthorized monitoring
	- Discriminatory profiling: Avoid bias against specific groups or communities
	- Automated punishment: Never use for automated legal or disciplinary actions
	- Privacy violation: Respect privacy laws and individual rights

	### Bias and Fairness
	- Model trained on specific dataset that may not represent all populations
	- Regular evaluation needed for bias detection and mitigation
	- Human oversight required for critical applications
	- Consider demographic representation in deployment scenarios

	## 📝 Model Card Information

	- Developed by: Research Team
	- Model Type: Video Classification (Binary)
	- Training Data: UCF Crime Dataset (Subset)
	- Training Date: 2025-06-08 15:19:08 UTC
	- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score
	- Intended Users: Researchers, Security Professionals, Developers

	## 📚 Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{Nikeytas_test_upload_model,
	title={VideoMAE Fine-tuned for Crime Detection},
	author={Research Team},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/Nikeytas/test-upload-model}
	}
	```

	## 🤝 Contributing

	We welcome contributions to improve the model! Please:
	1. Report issues with specific examples
	2. Suggest improvements for bias reduction
	3. Share evaluation results on new datasets
	4. Contribute to documentation and examples

	## 📞 Contact

	For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.

	---

	Last updated: 2025-06-08 15:19:08 UTC
	Model version: 1.0
	Framework: HuggingFace Transformers