ragbench-rag-eval / README.md
Renangi's picture
Initial commit without secrets
c8dfbc0
metadata
title: ragbench-rag-eval
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

RAGBench RAG Evaluation Project

This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.

RAGBench RAG Evaluation Project

This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.

1. Setup (local, no Docker)

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\\Scripts\\activate
pip install --upgrade pip
pip install -r requirements.txt

Copy .env.example to .env and fill in:

  • HF_TOKEN (if using Hugging Face models)
  • GROQ_API_KEY (if using Groq)
  • RAGBENCH_LLM_PROVIDER = groq or hf
  • RAGBENCH_GEN_MODEL
  • RAGBENCH_JUDGE_MODEL

Also open prompts/ragbench_judge_prompt.txt and paste the official JSON annotation prompt from the RAGBench paper (Appendix 9.4), with placeholders: {documents}, {question}, {answer}.

Run an experiment from CLI

python -m scripts.run_experiment --domain biomedical --k 3 --max_examples 10

2. Run FastAPI locally (no Docker)

uvicorn app.main:app --host 0.0.0.0 --port 7860

Then open:

  • http://localhost:7860/health
  • http://localhost:7860/docs (Swagger UI)
  • POST /run_domain with JSON:
{
  "domain": "biomedical",
  "k": 3,
  "max_examples": 10,
  "split": "test"
}

3. Run with Docker (local laptop)

Build and run:

docker compose build
docker compose up

The API will be available at http://localhost:8000.

4. Deploy to Hugging Face Space (Docker)

  1. Create a new Space with SDK = Docker.

  2. Push this repo to the Space Git URL.

  3. On the Space settings, add variables/secrets:

    • HF_TOKEN
    • GROQ_API_KEY
    • RAGBENCH_LLM_PROVIDER
    • RAGBENCH_GEN_MODEL
    • RAGBENCH_JUDGE_MODEL
  4. Once the Space builds successfully, open /docs on the Space URL to run /run_domain for each domain via Swagger UI.