--- title: ragbench-rag-eval emoji: "📊" colorFrom: blue colorTo: indigo sdk: docker pinned: false --- # RAGBench RAG Evaluation Project This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance. # RAGBench RAG Evaluation Project This project evaluates a RAG system on the RAGBench dataset across 5 domains: Biomedical, General Knowledge, Legal, Customer Support, and Finance. ## 1. Setup (local, no Docker) ```bash python -m venv .venv source .venv/bin/activate # Windows: .venv\\Scripts\\activate pip install --upgrade pip pip install -r requirements.txt ``` Copy `.env.example` to `.env` and fill in: - HF_TOKEN (if using Hugging Face models) - GROQ_API_KEY (if using Groq) - RAGBENCH_LLM_PROVIDER = groq or hf - RAGBENCH_GEN_MODEL - RAGBENCH_JUDGE_MODEL Also open `prompts/ragbench_judge_prompt.txt` and paste the official JSON annotation prompt from the RAGBench paper (Appendix 9.4), with placeholders: `{documents}`, `{question}`, `{answer}`. ### Run an experiment from CLI ```bash python -m scripts.run_experiment --domain biomedical --k 3 --max_examples 10 ``` ## 2. Run FastAPI locally (no Docker) ```bash uvicorn app.main:app --host 0.0.0.0 --port 7860 ``` Then open: - `http://localhost:7860/health` - `http://localhost:7860/docs` (Swagger UI) - POST `/run_domain` with JSON: ```json { "domain": "biomedical", "k": 3, "max_examples": 10, "split": "test" } ``` ## 3. Run with Docker (local laptop) Build and run: ```bash docker compose build docker compose up ``` The API will be available at `http://localhost:8000`. ## 4. Deploy to Hugging Face Space (Docker) 1. Create a new Space with SDK = Docker. 2. Push this repo to the Space Git URL. 3. On the Space settings, add variables/secrets: - HF_TOKEN - GROQ_API_KEY - RAGBENCH_LLM_PROVIDER - RAGBENCH_GEN_MODEL - RAGBENCH_JUDGE_MODEL 4. Once the Space builds successfully, open `/docs` on the Space URL to run `/run_domain` for each domain via Swagger UI.