ragbench-rag-eval / README.md
Renangi's picture
Initial commit without secrets
c8dfbc0
---
title: ragbench-rag-eval
emoji: "πŸ“Š"
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# RAGBench RAG Evaluation Project
This project evaluates a RAG system on the RAGBench dataset across 5 domains:
Biomedical, General Knowledge, Legal, Customer Support, and Finance.
# RAGBench RAG Evaluation Project
This project evaluates a RAG system on the RAGBench dataset across 5 domains:
Biomedical, General Knowledge, Legal, Customer Support, and Finance.
## 1. Setup (local, no Docker)
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\\Scripts\\activate
pip install --upgrade pip
pip install -r requirements.txt
```
Copy `.env.example` to `.env` and fill in:
- HF_TOKEN (if using Hugging Face models)
- GROQ_API_KEY (if using Groq)
- RAGBENCH_LLM_PROVIDER = groq or hf
- RAGBENCH_GEN_MODEL
- RAGBENCH_JUDGE_MODEL
Also open `prompts/ragbench_judge_prompt.txt` and paste the official JSON
annotation prompt from the RAGBench paper (Appendix 9.4), with placeholders:
`{documents}`, `{question}`, `{answer}`.
### Run an experiment from CLI
```bash
python -m scripts.run_experiment --domain biomedical --k 3 --max_examples 10
```
## 2. Run FastAPI locally (no Docker)
```bash
uvicorn app.main:app --host 0.0.0.0 --port 7860
```
Then open:
- `http://localhost:7860/health`
- `http://localhost:7860/docs` (Swagger UI)
- POST `/run_domain` with JSON:
```json
{
"domain": "biomedical",
"k": 3,
"max_examples": 10,
"split": "test"
}
```
## 3. Run with Docker (local laptop)
Build and run:
```bash
docker compose build
docker compose up
```
The API will be available at `http://localhost:8000`.
## 4. Deploy to Hugging Face Space (Docker)
1. Create a new Space with SDK = Docker.
2. Push this repo to the Space Git URL.
3. On the Space settings, add variables/secrets:
- HF_TOKEN
- GROQ_API_KEY
- RAGBENCH_LLM_PROVIDER
- RAGBENCH_GEN_MODEL
- RAGBENCH_JUDGE_MODEL
4. Once the Space builds successfully, open `/docs` on the Space URL to run
`/run_domain` for each domain via Swagger UI.