Quick Start
This is a LoRA adapter and cannot be loaded directly with AutoModel. Load it as follows:
from transformers import Qwen2VLForConditionalGeneration
from peft import PeftModel
# Load base model
base_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Amirhossein75/qwen2-vl-2b-mmhs150k-lora")
Model Card for Model ID
Model Details
Model Description
- multimodal
- vision-language
- hate-speech
- Developed by: [More Information Needed]
- Developed by: Amirhossein Yousefi
Qwen2-VL LoRA adapter for MMHS150K hateful content classification
This repository contains a LoRA adapter fine-tuned on MMHS150K (Multi-Modal Hate Speech) for multi-label hateful content detection from paired text + image inputs.
The approach follows the project at https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm: instead of training a classification head, the model is prompted to generate a strict JSON array of labels, which is then parsed and scored as multi-label predictions.
Model Details
- Developed by: Amirhossein Yousefi
- Model type: LoRA adapter (PEFT) for Qwen2-VL
- Base model: Qwen/Qwen2-VL-2B-Instruct
- Task: Multi-label classification via JSON generation (text + image → label list)
- Labels:
racist,sexist,homophobe,religion,otherhate - Repository (training code + methodology): https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm
Intended Use
Direct use
- Hateful content classification for research/experimentation on MMHS150K-like data.
- Produces a JSON array of zero or more labels from the fixed label set above.
Out-of-scope use
- Moderation decisions without human review.
- Domains/languages far from MMHS150K without further validation.
Bias, Risks, and Limitations
- This model is trained on hate-speech related data; outputs can be sensitive and may reflect dataset/model biases.
- Generative classification can fail to follow formatting (non-JSON, extra text); downstream code should do robust parsing.
- The label set is fixed; forcing predictions outside this taxonomy is unsupported.
How to Use
Load the adapter (PEFT)
import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
base_id = "Qwen/Qwen2-VL-2B-Instruct"
adapter_id = "Amirhossein75/qwen2-vl-2b-mmhs150k-lora" # this repo
processor = AutoProcessor.from_pretrained(base_id)
model = Qwen2VLForConditionalGeneration.from_pretrained(
base_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
image = Image.open("path/to/image.jpg").convert("RGB")
text = "Some text to analyze"
labels = ["racist", "sexist", "homophobe", "religion", "otherhate"]
system = "Return JSON only."
user = (
"Given the image and text, return a JSON array containing zero or more of these labels: "
+ ", ".join([f"\"{l}\"" for l in labels])
)
messages = [
{"role": "system", "content": system},
{"role": "user", "content": [
{"type": "text", "text": user + "\n\nText: " + text},
{"type": "image", "image": image},
]},
]
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[prompt], images=[image], return_tensors="pt", padding=True).to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(processor.decode(out[0], skip_special_tokens=True))
Training Data
- Dataset: MMHS150K (Multi-Modal Hate Speech)
- Expected format (from the associated code repo): CSV with
text,image_path,labelsand animages/directory.
Training Procedure
- Method: LoRA (PEFT)
- LoRA config (from adapter config): rank
r=4,lora_alpha=32,lora_dropout=0.05, target modulesq_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Objective: Causal LM with instruction prompting; classification is obtained by constrained JSON generation.
Hardware Used
As reported in the associated training/evaluation repository (see link above), the Qwen2-VL + LoRA/QLoRA runs were trained on:
- GPU: NVIDIA GeForce RTX 3080 Laptop GPU (16GB)
- Platform: Local Windows
- Notes: NVIDIA driver 581.57, CUDA 13.0 (per
nvidia-smi)
Evaluation
Metrics follow the associated code repo: multi-label scores computed from generated JSON labels.
- Validation (this adapter): micro F1
0.6172, macro F10.5077, subset accuracy0.4366, hamming loss0.14276 - Test (this adapter): micro F1
0.6110, macro F10.4992
License
- Training/inference code referenced above is released under MIT in the upstream repository.
- This repository contains an adapter trained from a base model; please follow the base model’s license/terms (Qwen/Qwen2-VL-2B-Instruct) when using the weights.
- Downloads last month
- 23