Spaces:
Runtime error
Runtime error
| title: RagBenchCapstone10 | |
| emoji: π | |
| colorFrom: green | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 5.16.0 | |
| app_file: app.py | |
| pinned: false | |
| short_description: RagBench Dataset development by Saiteja | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # RAG Benchmark Evaluation System | |
| ## Overview | |
| This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models. | |
| ## Features | |
| - Multiple LLM support (LLaMA 3.3, Mistral 7B) | |
| - Various reranking models: | |
| - MS MARCO MiniLM | |
| - MS MARCO TinyBERT | |
| - MonoT5 Base | |
| - MonoT5 Small | |
| - MonoT5 3B | |
| - Vector similarity search using Milvus | |
| - Automatic document chunking and retrieval | |
| - Performance metrics calculation | |
| - Interactive Gradio interface | |
| ## Prerequisites | |
| - Python 3.8+ | |
| - CUDA-compatible GPU (optional, for faster processing) | |
| ## Installation | |
| 1. Clone the repository: | |
| bash | |
| git clone https://github.com/yourusername/rag-benchmark.git | |
| cd rag-benchmark | |
| 2. Install dependencies: | |
| - pip install -r requirements.txt | |
| 3. Configure the models: | |
| - Create a `models` directory and add your language model files. | |
| - Create a `rerankers` directory and add your reranking model files. | |
| - Run the application: | |
| - python app.py | |
| ## Usage | |
| 1. Start the application: | |
| 2. Access the web interface at `http://localhost:7860` | |
| 3. Enter your question and select: | |
| - LLM Model (LLaMA 3.3 or Mistral 7B) | |
| - Reranking Model (MS MARCO or MonoT5 variants) | |
| 4. Click "Evaluate Model" to get results | |
| ## Metrics | |
| The system calculates several performance metrics: | |
| - RMSE Context Relevance | |
| - RMSE Context Utilization | |
| - AUCROC Adherence | |
| - Processing Time | |
| ## Reranking Models Comparison | |
| ### MS MARCO Models | |
| - **MiniLM**: Fast and efficient, good general performance | |
| - **TinyBERT**: Lightweight, slightly lower accuracy but faster | |
| ### MonoT5 Models | |
| - **Small**: Compact and fast, suitable for limited resources | |
| - **Base**: Balanced performance and speed | |
| - **3B**: Highest accuracy, requires more computational resources | |
| ## Error Handling | |
| - Automatic fallback to fewer documents if token limits are exceeded | |
| - Graceful handling of API timeouts | |
| - Comprehensive error logging | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create your feature branch (`git checkout -b feature/AmazingFeature`) | |
| 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) | |
| 4. Push to the branch (`git push origin feature/AmazingFeature`) | |
| 5. Open a Pull Request | |
| ## Dependencies | |
| - gradio | |
| - torch | |
| - transformers | |
| - sentence-transformers | |
| - pymilvus | |
| - numpy | |
| - pandas | |
| - scikit-learn | |
| - tiktoken | |
| - groq | |
| - huggingface_hub | |
| ## License | |
| [Your License Here] | |
| ## Acknowledgments | |
| - RAGBench dataset | |
| - Hugging Face Transformers | |
| - Milvus Vector Database | |
| - Groq API | |