YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CheXficient

CheXficient is a vision-language foundation model for chest X-ray (CXR) interpretation, developed to enhance both data- and computation-efficiency. It enables joint image-text representation learning and supports prompt-based zero-shot classification.

This repository provides a Hugging Face-compatible implementation for seamless integration into research workflows.


Model Overview

  • Architecture: Vision-Language dual encoder
  • Input: Chest X-ray image + text prompts
  • Output: Image-text similarity logits and embeddings
  • Framework: PyTorch + Hugging Face Transformers
  • Intended Use: Research in medical AI and multimodal learning

Installation

pip install torch torchvision transformers pillow

Load the Model

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer, AutoImageProcessor

repo_id = "StanfordAIMI/CheXficient"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModel.from_pretrained(
    repo_id,
    trust_remote_code=True
).to(device)

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(repo_id, trust_remote_code=True)

model.eval()

Zero-Shot Classification Example

image = Image.open("./CXR/images/5AF3BB6C1BCC83C.png").convert("RGB")
text = ["Pneumonia", "no Pneumonia"]

image_inputs = image_processor(images=image, return_tensors="pt").to(device)
text_inputs = tokenizer(text, padding=True, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(
        pixel_values=image_inputs["pixel_values"],
        text_tokens=text_inputs,
    )

print(outputs)

Optional probability conversion:

import torch.nn.functional as F

logits = outputs["logits_per_image"]
probs = F.softmax(logits, dim=-1)
print(probs)

Intended Use

  • Zero-shot CXR findings classification
  • Prompt-based disease detection

Citation

@article{chexficient2024,
  title={CheXficient: Efficient Vision-Language Learning for Chest X-ray Understanding},
  author={...},
  journal={...},
  year={2024}
}
Downloads last month
70
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support