Spaces:
Running
title: ragbench-rag-eval
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
RAGBench RAG Evaluation Project
This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.
RAGBench RAG Evaluation Project
This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.
1. Setup (local, no Docker)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\\Scripts\\activate
pip install --upgrade pip
pip install -r requirements.txt
Copy .env.example to .env and fill in:
- HF_TOKEN (if using Hugging Face models)
- GROQ_API_KEY (if using Groq)
- RAGBENCH_LLM_PROVIDER = groq or hf
- RAGBENCH_GEN_MODEL
- RAGBENCH_JUDGE_MODEL
Also open prompts/ragbench_judge_prompt.txt and paste the official JSON
annotation prompt from the RAGBench paper (Appendix 9.4), with placeholders:
{documents}, {question}, {answer}.
Run an experiment from CLI
python -m scripts.run_experiment --domain biomedical --k 3 --max_examples 10
2. Run FastAPI locally (no Docker)
uvicorn app.main:app --host 0.0.0.0 --port 7860
Then open:
http://localhost:7860/healthhttp://localhost:7860/docs(Swagger UI)- POST
/run_domainwith JSON:
{
"domain": "biomedical",
"k": 3,
"max_examples": 10,
"split": "test"
}
3. Run with Docker (local laptop)
Build and run:
docker compose build
docker compose up
The API will be available at http://localhost:8000.
4. Deploy to Hugging Face Space (Docker)
Create a new Space with SDK = Docker.
Push this repo to the Space Git URL.
On the Space settings, add variables/secrets:
- HF_TOKEN
- GROQ_API_KEY
- RAGBENCH_LLM_PROVIDER
- RAGBENCH_GEN_MODEL
- RAGBENCH_JUDGE_MODEL
Once the Space builds successfully, open
/docson the Space URL to run/run_domainfor each domain via Swagger UI.