Spaces:

Renangi
/

ragbench-rag-eval

Running

App Files Files Community

ragbench-rag-eval / README.md

Renangi

Initial commit without secrets

c8dfbc0 26 days ago

preview code

raw

history blame contribute delete

2.06 kB

metadata

title: ragbench-rag-eval
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

RAGBench RAG Evaluation Project

This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.

RAGBench RAG Evaluation Project

This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance.

1. Setup (local, no Docker)

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\\Scripts\\activate
pip install --upgrade pip
pip install -r requirements.txt

Copy .env.example to .env and fill in:

HF_TOKEN (if using Hugging Face models)
GROQ_API_KEY (if using Groq)
RAGBENCH_LLM_PROVIDER = groq or hf
RAGBENCH_GEN_MODEL
RAGBENCH_JUDGE_MODEL

Also open prompts/ragbench_judge_prompt.txt and paste the official JSON annotation prompt from the RAGBench paper (Appendix 9.4), with placeholders: {documents}, {question}, {answer}.

Run an experiment from CLI

python -m scripts.run_experiment --domain biomedical --k 3 --max_examples 10

2. Run FastAPI locally (no Docker)

uvicorn app.main:app --host 0.0.0.0 --port 7860

Then open:

http://localhost:7860/health
http://localhost:7860/docs (Swagger UI)
POST /run_domain with JSON:

{
  "domain": "biomedical",
  "k": 3,
  "max_examples": 10,
  "split": "test"
}

3. Run with Docker (local laptop)

Build and run:

docker compose build
docker compose up

The API will be available at http://localhost:8000.

4. Deploy to Hugging Face Space (Docker)

Create a new Space with SDK = Docker.
Push this repo to the Space Git URL.
On the Space settings, add variables/secrets:
- HF_TOKEN
- GROQ_API_KEY
- RAGBENCH_LLM_PROVIDER
- RAGBENCH_GEN_MODEL
- RAGBENCH_JUDGE_MODEL
Once the Space builds successfully, open /docs on the Space URL to run /run_domain for each domain via Swagger UI.