Spaces:

alrahrooh
/

cgt-3

Runtime error

arahrooh commited on 14 days ago

Commit

3c837c8

1 Parent(s): c5842f1

Add RAG chatbot functionality with OAuth authentication

- Modified app.py to use OAuth token pattern from template
- Added bot.py with RAG functionality
- Added requirements.txt with all dependencies
- Added chroma_db vector database
- Updated README.md with full description and usage instructions

Files changed (11) hide show

.gitattributes +1 -0
README.md +57 -3
app.py +900 -48
bot.py +1777 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/data_level0.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/header.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/index_metadata.pickle +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/length.bin +3 -0
chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/link_lists.bin +3 -0
chroma_db/chroma.sqlite3 +3 -0
requirements.txt +56 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.sqlite3 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Cgt 3
-emoji: 💬
 colorFrom: yellow
 colorTo: purple
 sdk: gradio
@@ -13,4 +13,58 @@ hf_oauth_scopes:
 license: mit
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

 ---
+title: CGT-LLM-Beta RAG Chatbot
+emoji: 🧬
 colorFrom: yellow
 colorTo: purple
 sdk: gradio
 license: mit
 ---
+# 🧬 CGT-LLM-Beta: Genetic Counseling RAG Chatbot
+A Retrieval-Augmented Generation (RAG) chatbot for genetic counseling, cascade genetic testing, hereditary cancer syndromes, and related topics.
+## Features
+- **RAG System**: Provides evidence-based answers from medical literature using vector database retrieval
+- **Multiple Models**: Choose from various LLM models (Llama, Mistral, MediPhi, etc.)
+- **Education Level Adaptation**: Answers are tailored to different education levels (Middle School, High School, College, Doctoral)
+- **Source Citations**: View retrieved document chunks with similarity scores
+- **Readability Scoring**: Flesch-Kincaid grade level scores for each answer
+- **OAuth Authentication**: Secure access using Hugging Face OAuth tokens
+## How to Use
+1. **Log in**: Click the "Login" button in the sidebar to authenticate with your Hugging Face account
+2. **Ask a question**: Enter your question about genetic counseling, hereditary cancer, or related topics
+3. **Select options**:
+   - Choose your preferred LLM model
+   - Select your education level for personalized answers
+   - Adjust advanced settings (retrieval count, temperature, max tokens)
+4. **View results**: See the answer, readability score, source documents, and similarity scores
+## Example Questions
+The chatbot includes 100+ example questions covering topics like:
+- BRCA1/BRCA2 mutations and cancer risk
+- Lynch Syndrome (MLH1, MSH2, MSH6, PMS2, EPCAM)
+- Genetic testing recommendations
+- Family communication about genetic results
+- Insurance and legal considerations (GINA)
+- Screening and prevention strategies
+## Technical Details
+- **Vector Database**: ChromaDB for fast semantic search
+- **Embeddings**: Sentence-transformers (all-MiniLM-L6-v2)
+- **Inference**: Hugging Face Inference API (via OAuth)
+- **Interface**: Gradio 5.42.0
+## Important Notes
+⚠️ **Medical Disclaimer**: This chatbot provides informational answers based on medical literature. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult with qualified healthcare providers for medical decisions.
+## Resources
+The chatbot's knowledge base includes:
+- NCCN Guidelines
+- Medical literature on hereditary cancer syndromes
+- Genetic counseling resources
+- Patient education materials
+---
+Built with [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

app.py CHANGED Viewed

@@ -1,70 +1,922 @@
 import gradio as gr
-from huggingface_hub import InferenceClient
-def respond(
-    message,
-    history: list[dict[str, str]],
-    system_message,
-    max_tokens,
-    temperature,
-    top_p,
-    hf_token: gr.OAuthToken,
-):
-    """
-    For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
-    """
-    client = InferenceClient(token=hf_token.token, model="openai/gpt-oss-20b")
-    messages = [{"role": "system", "content": system_message}]
-    messages.extend(history)
-    messages.append({"role": "user", "content": message})
-    response = ""
-    for message in client.chat_completion(
-        messages,
-        max_tokens=max_tokens,
-        stream=True,
         temperature=temperature,
         top_p=top_p,
-    ):
-        choices = message.choices
-        token = ""
-        if len(choices) and choices[0].delta.content:
-            token = choices[0].delta.content
-        response += token
-        yield response
 """
-For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
-"""
-chatbot = gr.ChatInterface(
-    respond,
-    type="messages",
-    additional_inputs=[
-        gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
-        gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
-        gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
-        gr.Slider(
             minimum=0.1,
             maximum=1.0,
-            value=0.95,
-            step=0.05,
-            label="Top-p (nucleus sampling)",
-        ),
-    ],
 )
-with gr.Blocks() as demo:
-    with gr.Sidebar():
-        gr.LoginButton()
-    chatbot.render()
 if __name__ == "__main__":
     demo.launch()

+"""
+Gradio Chatbot Interface for CGT-LLM-Beta RAG System
+This application provides a web interface for the RAG chatbot with OAuth authentication.
+It uses Hugging Face Inference API with OAuth tokens for authentication.
+"""
 import gradio as gr
+import argparse
+import sys
+import os
+from typing import Tuple, Optional, List
+import logging
+import textstat
+import torch
+# Set up logging first (before any logger usage)
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Import from bot.py - wrap in try/except to handle import errors gracefully
+try:
+    from bot import RAGBot, parse_args, Chunk
+    BOT_AVAILABLE = True
+except ImportError as e:
+    logger.error(f"Failed to import bot module: {e}")
+    BOT_AVAILABLE = False
+    # Create dummy classes so the module can still load
+    class RAGBot:
+        pass
+    class Chunk:
+        pass
+    def parse_args():
+        return None
+# For Hugging Face Inference API
+try:
+from huggingface_hub import InferenceClient
+    HF_INFERENCE_AVAILABLE = True
+except ImportError:
+    HF_INFERENCE_AVAILABLE = False
+    logger.warning("huggingface_hub not available, InferenceClient will not work")
+# Model mapping: short name -> full HuggingFace path
+MODEL_MAP = {
+    "Llama-3.2-3B-Instruct": "meta-llama/Llama-3.2-3B-Instruct",
+    "Mistral-7B-Instruct-v0.2": "mistralai/Mistral-7B-Instruct-v0.2",
+    "Llama-4-Scout-17B-16E-Instruct": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
+    "MediPhi-Instruct": "microsoft/MediPhi-Instruct",
+    "MediPhi": "microsoft/MediPhi",
+    "Phi-4-reasoning": "microsoft/Phi-4-reasoning",
+}
+# Education level mapping
+EDUCATION_LEVELS = {
+    "Middle School": "middle_school",
+    "High School": "high_school",
+    "College": "college",
+    "Doctoral": "doctoral"
+}
+# Example questions from the results CSV (hardcoded for easy access)
+EXAMPLE_QUESTIONS = [
+    "Can a BRCA2 variant skip a generation?",
+    "Can a PMS2 variant skip a generation?",
+    "Can an EPCAM/MSH2 variant skip a generation?",
+    "Can an MLH1 variant skip a generation?",
+    "Can an MSH2 variant skip a generation?",
+    "Can an MSH6 variant skip a generation?",
+    "Can I pass this MSH2 variant to my kids?",
+    "Can only women carry a BRCA inherited mutation?",
+    "Does GINA cover life or disability insurance?",
+    "Does having a BRCA1 mutation mean I will definitely have cancer?",
+    "Does having a BRCA2 mutation mean I will definitely have cancer?",
+    "Does having a PMS2 mutation mean I will definitely have cancer?",
+    "Does having an EPCAM/MSH2 mutation mean I will definitely have cancer?",
+    "Does having an MLH1 mutation mean I will definitely have cancer?",
+    "Does having an MSH2 mutation mean I will definitely have cancer?",
+    "Does having an MSH6 mutation mean I will definitely have cancer?",
+    "Does this BRCA1 genetic variant affect my cancer treatment?",
+    "Does this BRCA2 genetic variant affect my cancer treatment?",
+    "Does this EPCAM/MSH2 genetic variant affect my cancer treatment?",
+    "Does this MLH1 genetic variant affect my cancer treatment?",
+    "Does this MSH2 genetic variant affect my cancer treatment?",
+    "Does this MSH6 genetic variant affect my cancer treatment?",
+    "Does this PMS2 genetic variant affect my cancer treatment?",
+    "How can I cope with this diagnosis?",
+    "How can I get my kids tested?",
+    "How can I help others with my condition?",
+    "How might my genetic test results change over time?",
+    "I don't talk to my family/parents/sister/brother. How can I share this with them?",
+    "I have a BRCA pathogenic variant and I want to have children, what are my options?",
+    "Is genetic testing for my family members covered by insurance?",
+    "Is new research being done on my condition?",
+    "Is this BRCA1 variant something I inherited?",
+    "Is this BRCA2 variant something I inherited?",
+    "Is this EPCAM/MSH2 variant something I inherited?",
+    "Is this MLH1 variant something I inherited?",
+    "Is this MSH2 variant something I inherited?",
+    "Is this MSH6 variant something I inherited?",
+    "Is this PMS2 variant something I inherited?",
+    "My relative doesn't have insurance. What should they do?",
+    "People who test positive for a genetic mutation are they at risk of losing their health insurance?",
+    "Should I contact my male and female relatives?",
+    "Should my family members get tested?",
+    "What are the Risks and Benefits of Risk-Reducing Surgeries for Lynch Syndrome?",
+    "What are the recommendations for my family members if I have a BRCA1 mutation?",
+    "What are the recommendations for my family members if I have a BRCA2 mutation?",
+    "What are the recommendations for my family members if I have a PMS2 mutation?",
+    "What are the recommendations for my family members if I have an EPCAM/MSH2 mutation?",
+    "What are the recommendations for my family members if I have an MLH1 mutation?",
+    "What are the recommendations for my family members if I have an MSH2 mutation?",
+    "What are the recommendations for my family members if I have an MSH6 mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have a BRCA mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have an EPCAM/MSH2 mutation?",
+    "What are the surveillance and preventions I can take to reduce my risk of cancer or detecting cancer early if I have an MSH2 mutation?",
+    "What does a BRCA1 genetic variant mean for me?",
+    "What does a BRCA2 genetic variant mean for me?",
+    "What does a PMS2 genetic variant mean for me?",
+    "What does an EPCAM/MSH2 genetic variant mean for me?",
+    "What does an MLH1 genetic variant mean for me?",
+    "What does an MSH2 genetic variant mean for me?",
+    "What does an MSH6 genetic variant mean for me?",
+    "What if I feel overwhelmed?",
+    "What if I want to have children and have a hereditary cancer gene? What are my reproductive options?",
+    "What if a family member doesn't want to get tested?",
+    "What is Lynch Syndrome?",
+    "What is my cancer risk if I have BRCA1 Hereditary Breast and Ovarian Cancer syndrome?",
+    "What is my cancer risk if I have BRCA2 Hereditary Breast and Ovarian Cancer syndrome?",
+    "What is my cancer risk if I have MLH1 Lynch syndrome?",
+    "What is my cancer risk if I have MSH2 or EPCAM-associated Lynch syndrome?",
+    "What is my cancer risk if I have MSH6 Lynch syndrome?",
+    "What is my cancer risk if I have PMS2 Lynch syndrome?",
+    "What other resources are available to help me?",
+    "What screening tests do you recommend for BRCA1 carriers?",
+    "What screening tests do you recommend for BRCA2 carriers?",
+    "What screening tests do you recommend for EPCAM/MSH2 carriers?",
+    "What screening tests do you recommend for MLH1 carriers?",
+    "What screening tests do you recommend for MSH2 carriers?",
+    "What screening tests do you recommend for MSH6 carriers?",
+    "What screening tests do you recommend for PMS2 carriers?",
+    "What steps can I take to manage my cancer risk if I have Lynch syndrome?",
+    "What types of cancers am I at risk for with a BRCA1 mutation?",
+    "What types of cancers am I at risk for with a BRCA2 mutation?",
+    "What types of cancers am I at risk for with a PMS2 mutation?",
+    "What types of cancers am I at risk for with an EPCAM/MSH2 mutation?",
+    "What types of cancers am I at risk for with an MLH1 mutation?",
+    "What types of cancers am I at risk for with an MSH2 mutation?",
+    "What types of cancers am I at risk for with an MSH6 mutation?",
+    "Where can I find a genetic counselor?",
+    "Which of my relatives are at risk?",
+    "Who are my first-degree relatives?",
+    "Who do my family members call to have genetic testing?",
+    "Why do some families with Lynch syndrome have more cases of cancer than others?",
+    "Why should I share my BRCA1 genetic results with family?",
+    "Why should I share my BRCA2 genetic results with family?",
+    "Why should I share my EPCAM/MSH2 genetic results with family?",
+    "Why should I share my MLH1 genetic results with family?",
+    "Why should I share my MSH2 genetic results with family?",
+    "Why should I share my MSH6 genetic results with family?",
+    "Why should I share my PMS2 genetic results with family?",
+    "Why would my relatives want to know if they have this? What can they do about it?",
+    "Will my insurance cover testing for my parents/brother/sister?",
+    "Will this affect my health insurance?",
+]
+class InferenceAPIBot:
+    """Wrapper that uses Hugging Face Inference API with OAuth token"""
+    def __init__(self, bot: RAGBot):
+        """Initialize with a RAGBot (for vector DB)"""
+        self.bot = bot  # Use bot for vector DB and formatting
+        self.current_model = bot.args.model
+        logger.info(f"InferenceAPIBot initialized with model: {self.current_model}")
+    def _get_client(self, hf_token: Optional[str] = None) -> InferenceClient:
+        """Create InferenceClient with token (can be None for public models)"""
+        if hf_token:
+            return InferenceClient(token=hf_token)
+        else:
+            # Try without token (works for public models)
+            return InferenceClient()
+    @property
+    def args(self):
+        """Access args from the wrapped bot"""
+        return self.bot.args
+    def generate_answer(self, prompt: str, hf_token: Optional[str] = None, **kwargs) -> str:
+        """Generate answer using Inference API"""
+        try:
+            max_tokens = kwargs.get('max_new_tokens', 512)
+            temperature = kwargs.get('temperature', 0.2)
+            top_p = kwargs.get('top_p', 0.9)
+            # Create client with token
+            client = self._get_client(hf_token)
+            # Use text_generation API directly
+            logger.info(f"Calling Inference API for model: {self.current_model}")
+            response = client.text_generation(
+                prompt,
+                model=self.current_model,
+                max_new_tokens=max_tokens,
         temperature=temperature,
         top_p=top_p,
+                return_full_text=False,
+            )
+            logger.info(f"Inference API response received (length: {len(response) if response else 0})")
+            return response
+        except Exception as e:
+            logger.error(f"Error calling Inference API: {e}", exc_info=True)
+            import traceback
+            logger.error(f"Traceback: {traceback.format_exc()}")
+            return f"Error generating answer: {str(e)}. Please check the logs for details."
+    def enhance_readability(self, answer: str, target_level: str = "middle_school", hf_token: Optional[str] = None) -> Tuple[str, float]:
+        """Enhance readability using Inference API"""
+        try:
+            # Define prompts for different reading levels
+            if target_level == "middle_school":
+                level_description = "middle school reading level (ages 12-14, 6th-8th grade)"
+                instructions = """
+- Use simpler medical terms or explain them
+- Medium-length sentences
+- Clear, structured explanations
+- Keep important medical information accessible"""
+            elif target_level == "high_school":
+                level_description = "high school reading level (ages 15-18, 9th-12th grade)"
+                instructions = """
+- Use appropriate medical terminology with context
+- Varied sentence length
+- Comprehensive yet accessible explanations
+- Maintain technical accuracy while ensuring clarity"""
+            elif target_level == "college":
+                level_description = "college reading level (undergraduate level, ages 18-22)"
+                instructions = """
+- Use standard medical terminology with brief explanations
+- Professional and clear writing style
+- Include relevant clinical context
+- Maintain scientific accuracy and precision
+- Appropriate for undergraduate students in health sciences"""
+            elif target_level == "doctoral":
+                level_description = "doctoral/professional reading level (graduate level, medical professionals)"
+                instructions = """
+- Use advanced medical and scientific terminology
+- Include detailed clinical and research context
+- Reference specific mechanisms, pathways, and evidence
+- Provide comprehensive technical explanations
+- Appropriate for medical professionals, researchers, and graduate students
+- Include nuanced discussions of clinical implications and research findings"""
+            else:
+                raise ValueError(f"Unknown target_level: {target_level}")
+            system_message = f"""You are a helpful medical assistant who specializes in explaining complex medical information at appropriate reading levels. Rewrite the following medical answer for {level_description}:
+{instructions}
+- Keep the same important information but adapt the complexity
+- Provide context for technical terms
+- Ensure the answer is informative yet understandable"""
+            user_message = f"Please rewrite this medical answer for {level_description}:\n\n{answer}"
+            # Combine system and user messages for text generation
+            combined_prompt = f"{system_message}\n\n{user_message}"
+            logger.info(f"Enhancing readability for {target_level} level")
+            # Create client with token
+            client = self._get_client(hf_token)
+            max_tokens = 512 if target_level in ["college", "doctoral"] else 384
+            temperature = 0.4 if target_level in ["college", "doctoral"] else 0.3
+            enhanced_answer = client.text_generation(
+                combined_prompt,
+                model=self.current_model,
+                max_new_tokens=max_tokens,
+                temperature=temperature,
+                return_full_text=False,
+            )
+            # Clean the answer (same as bot.py)
+            cleaned = self.bot._clean_readability_answer(enhanced_answer, target_level)
+            # Calculate Flesch score
+            try:
+                flesch_score = textstat.flesch_kincaid_grade(cleaned)
+            except:
+                flesch_score = 0.0
+            return cleaned, flesch_score
+        except Exception as e:
+            logger.error(f"Error enhancing readability: {e}", exc_info=True)
+            return answer, 0.0
+    # Delegate other methods to bot
+    def format_prompt(self, context_chunks: List[Chunk], question: str) -> str:
+        return self.bot.format_prompt(context_chunks, question)
+    def retrieve_with_scores(self, query: str, k: int) -> Tuple[List[Chunk], List[float]]:
+        return self.bot.retrieve_with_scores(query, k)
+    def _categorize_question(self, question: str) -> str:
+        return self.bot._categorize_question(question)
+    @property
+    def vector_retriever(self):
+        return self.bot.vector_retriever
+class GradioRAGInterface:
+    """Wrapper class to integrate RAGBot with Gradio using OAuth"""
+    def __init__(self, initial_bot: RAGBot):
+        # Always use Inference API on Spaces
+        if HF_INFERENCE_AVAILABLE:
+            self.bot = InferenceAPIBot(initial_bot)
+            self.use_inference_api = True
+            logger.info("Using Hugging Face Inference API with OAuth")
+        else:
+            self.bot = initial_bot
+            self.use_inference_api = False
+            logger.warning("Inference API not available, falling back to local model")
+        # Get current model from bot args
+        self.current_model = self.bot.args.model if hasattr(self.bot, 'args') else getattr(self.bot, 'current_model', None)
+        if self.current_model is None and hasattr(self.bot, 'bot'):
+            self.current_model = self.bot.bot.args.model
+        self.data_dir = initial_bot.args.data_dir
+        logger.info("GradioRAGInterface initialized")
+    def _find_file_path(self, filename: str) -> str:
+        """Find the full file path for a given filename"""
+        from pathlib import Path
+        data_path = Path(self.data_dir)
+        if not data_path.exists():
+            return ""
+        # Search for the file recursively
+        for file_path in data_path.rglob(filename):
+            return str(file_path)
+        return ""
+    def reload_model(self, model_short_name: str) -> str:
+        """Reload the model when user selects a different one"""
+        if model_short_name not in MODEL_MAP:
+            return f"Error: Unknown model '{model_short_name}'"
+        new_model_path = MODEL_MAP[model_short_name]
+        # If same model, no need to reload
+        if new_model_path == self.current_model:
+            return f"Model already loaded: {model_short_name}"
+        try:
+            logger.info(f"Switching model from {self.current_model} to {new_model_path}")
+            if self.use_inference_api:
+                # For Inference API, just update the model name
+                self.bot.current_model = new_model_path
+                self.current_model = new_model_path
+                return f"✓ Model switched to: {model_short_name} (using Inference API)"
+            else:
+                # For local model, reload it
+                self.bot.args.model = new_model_path
+                # Clear old model from memory
+                if hasattr(self.bot, 'model') and self.bot.model is not None:
+                    del self.bot.model
+                    del self.bot.tokenizer
+                    torch.cuda.empty_cache() if torch.cuda.is_available() else None
+                # Load new model
+                self.bot._load_model()
+                self.current_model = new_model_path
+                return f"✓ Model loaded: {model_short_name}"
+        except Exception as e:
+            logger.error(f"Error reloading model: {e}", exc_info=True)
+            return f"✗ Error loading model: {str(e)}"
+    def process_question(
+        self,
+        question: str,
+        model_name: str,
+        education_level: str,
+        k: int,
+        temperature: float,
+        max_tokens: int,
+        hf_token: Optional[str] = None
+    ) -> Tuple[str, str, str, str, str]:
+        """
+        Process a single question and return formatted results
+        Returns:
+            Tuple of (answer, flesch_score, sources, similarity_scores, question_category)
+        """
+        import time
+        if not question or not question.strip():
+            return "Please enter a question.", "N/A", "", "", ""
+        # Check if token is provided when using Inference API
+        if self.use_inference_api and not hf_token:
+            return (
+                "⚠️ **Authentication Required**\n\nPlease log in using the Hugging Face login button in the sidebar to use the Inference API.",
+                "N/A",
+                "",
+                "",
+                "Error"
+            )
+        try:
+            start_time = time.time()
+            logger.info(f"Processing question: {question[:50]}...")
+            # Reload model if changed
+            if model_name in MODEL_MAP:
+                model_path = MODEL_MAP[model_name]
+                if model_path != self.current_model:
+                    logger.info(f"Model changed, reloading from {self.current_model} to {model_path}")
+                    reload_status = self.reload_model(model_name)
+                    if reload_status.startswith("✗"):
+                        return f"Error: {reload_status}", "N/A", "", "", ""
+                    logger.info(f"Model reloaded in {time.time() - start_time:.1f}s")
+            # Update bot args for this query
+            self.bot.args.k = k
+            self.bot.args.temperature = temperature
+            self.bot.args.max_new_tokens = min(max_tokens, 512)  # Cap at 512 for faster responses
+            # Categorize question
+            logger.info("Categorizing question...")
+            question_group = self.bot._categorize_question(question)
+            # Retrieve relevant chunks with similarity scores
+            logger.info("Retrieving relevant documents...")
+            retrieve_start = time.time()
+            context_chunks, similarity_scores = self.bot.retrieve_with_scores(question, k)
+            logger.info(f"Retrieved {len(context_chunks)} chunks in {time.time() - retrieve_start:.2f}s")
+            if not context_chunks:
+                return (
+                    "I don't have enough information to answer this question. Please try rephrasing or asking about a different topic.",
+                    "N/A",
+                    "No sources found",
+                    "No matches found",
+                    question_group
+                )
+            # Format similarity scores
+            similarity_scores_str = ", ".join([f"{score:.3f}" for score in similarity_scores])
+            # Format sources with chunk text and file paths
+            sources_list = []
+            for i, (chunk, score) in enumerate(zip(context_chunks, similarity_scores)):
+                file_path = self._find_file_path(chunk.filename)
+                source_info = f"""
+{'='*80}
+SOURCE {i+1} | Similarity: {score:.3f}
+{'='*80}
+📄 File: {chunk.filename}
+📍 Path: {file_path if file_path else 'File path not found (search in Data Resources directory)'}
+📊 Chunk: {chunk.chunk_id + 1}/{chunk.total_chunks} (Position: {chunk.start_pos}-{chunk.end_pos})
+📝 Full Chunk Text:
+{chunk.text}
 """
+                sources_list.append(source_info)
+            sources = "\n".join(sources_list)
+            # Generation kwargs
+            gen_kwargs = {
+                'max_new_tokens': min(max_tokens, 512),
+                'temperature': temperature,
+                'top_p': self.bot.args.top_p,
+                'repetition_penalty': self.bot.args.repetition_penalty
+            }
+            # Generate answer based on education level
+            answer = ""
+            flesch_score = 0.0
+            # Generate original answer first
+            logger.info("Generating original answer...")
+            gen_start = time.time()
+            prompt = self.bot.format_prompt(context_chunks, question)
+            original_answer = self.bot.generate_answer(prompt, hf_token=hf_token, **gen_kwargs)
+            logger.info(f"Original answer generated in {time.time() - gen_start:.1f}s")
+            # Enhance based on education level
+            logger.info(f"Enhancing answer for {education_level} level...")
+            enhance_start = time.time()
+            if education_level == "middle_school":
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="middle_school", hf_token=hf_token)
+            elif education_level == "high_school":
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="high_school", hf_token=hf_token)
+            elif education_level == "college":
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="college", hf_token=hf_token)
+            elif education_level == "doctoral":
+                answer, flesch_score = self.bot.enhance_readability(original_answer, target_level="doctoral", hf_token=hf_token)
+            else:
+                answer = "Invalid education level selected."
+                flesch_score = 0.0
+            logger.info(f"Answer enhanced in {time.time() - enhance_start:.1f}s")
+            total_time = time.time() - start_time
+            logger.info(f"Total processing time: {total_time:.1f}s")
+            # Clean the answer - remove special tokens and formatting
+            import re
+            cleaned_answer = answer
+            # Remove special tokens (case-insensitive)
+            special_tokens = [
+                "<|end|>",
+                "<|endoftext|>",
+                "<|end_of_text|>",
+                "<|eot_id|>",
+                "<|start_header_id|>",
+                "<|end_header_id|>",
+                "<|assistant|>",
+                "<|endoftext|>",
+                "<|end_of_text|>",
+            ]
+            for token in special_tokens:
+                cleaned_answer = re.sub(re.escape(token), '', cleaned_answer, flags=re.IGNORECASE)
+            # Remove any remaining special token patterns
+            cleaned_answer = re.sub(r'<\|[^|]+\|>', '', cleaned_answer)
+            cleaned_answer = re.sub(r'^\*\*.*?\*\*.*?\n', '', cleaned_answer, flags=re.MULTILINE)
+            cleaned_answer = re.sub(r'\n\s*\n\s*\n+', '\n\n', cleaned_answer)
+            cleaned_answer = re.sub(r'^\s+|\s+$', '', cleaned_answer, flags=re.MULTILINE)
+            cleaned_answer = cleaned_answer.strip()
+            return (
+                cleaned_answer,
+                f"{flesch_score:.1f}",
+                sources,
+                similarity_scores_str,
+                question_group
+            )
+        except Exception as e:
+            logger.error(f"Error processing question: {e}", exc_info=True)
+            return (
+                f"An error occurred while processing your question: {str(e)}",
+                "N/A",
+                "",
+                "",
+                "Error"
+            )
+def create_interface(initial_bot: RAGBot) -> gr.Blocks:
+    """Create and configure the Gradio interface with OAuth"""
+    try:
+        interface = GradioRAGInterface(initial_bot)
+    except Exception as e:
+        logger.error(f"Failed to create GradioRAGInterface: {e}")
+        with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+            gr.Markdown(f"""
+            # ⚠️ Initialization Error
+            Failed to initialize the chatbot interface.
+            **Error:** {str(e)}
+            Please check the logs for more details.
+            """)
+        return demo
+    # Get initial model name from bot
+    initial_model_short = None
+    for short_name, full_path in MODEL_MAP.items():
+        if full_path == initial_bot.args.model:
+            initial_model_short = short_name
+            break
+    if initial_model_short is None:
+        initial_model_short = list(MODEL_MAP.keys())[0]
+    # Create the Gradio interface
+    try:
+        with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+            with gr.Sidebar():
+                gr.LoginButton()
+                gr.Markdown("### 🔐 Authentication")
+                gr.Markdown("Please log in with your Hugging Face account to use the Inference API.")
+            gr.Markdown("""
+            # 🧬 CGT-LLM-Beta: Genetic Counseling RAG Chatbot
+            Ask questions about genetic counseling, cascade genetic testing, hereditary cancer syndromes, and related topics.
+            The chatbot uses a Retrieval-Augmented Generation (RAG) system to provide evidence-based answers from medical literature.
+            """)
+            with gr.Row():
+                with gr.Column(scale=2):
+                    question_input = gr.Textbox(
+                        label="Your Question",
+                        placeholder="e.g., What is Lynch Syndrome? What screening is recommended for BRCA1 carriers?",
+                        lines=3
+                    )
+                    with gr.Row():
+                        model_dropdown = gr.Dropdown(
+                            choices=list(MODEL_MAP.keys()),
+                            value=initial_model_short,
+                            label="Select Model",
+                            info="Choose which LLM model to use for generating answers"
+                        )
+                        education_dropdown = gr.Dropdown(
+                            choices=list(EDUCATION_LEVELS.keys()),
+                            value=list(EDUCATION_LEVELS.keys())[0],
+                            label="Education Level",
+                            info="Select your education level for personalized answers"
+                        )
+                    with gr.Accordion("Advanced Settings", open=False):
+                        k_slider = gr.Slider(
+                            minimum=1,
+                            maximum=10,
+                            value=5,
+                            step=1,
+                            label="Number of document chunks to retrieve (k)"
+                        )
+                        temperature_slider = gr.Slider(
             minimum=0.1,
             maximum=1.0,
+                            value=0.2,
+                            step=0.1,
+                            label="Temperature (lower = more focused)"
+                        )
+                        max_tokens_slider = gr.Slider(
+                            minimum=128,
+                            maximum=1024,
+                            value=512,
+                            step=128,
+                            label="Max Tokens (lower = faster responses)"
+                        )
+                    submit_btn = gr.Button("Ask Question", variant="primary", size="lg")
+                with gr.Column(scale=3):
+                    answer_output = gr.Textbox(
+                        label="Answer",
+                        lines=20,
+                        interactive=False,
+                        elem_classes=["answer-box"]
+                    )
+                    with gr.Row():
+                        flesch_output = gr.Textbox(
+                            label="Flesch-Kincaid Grade Level",
+                            value="N/A",
+                            interactive=False,
+                            scale=1
+                        )
+                        similarity_output = gr.Textbox(
+                            label="Similarity Scores",
+                            value="",
+                            interactive=False,
+                            scale=1
+                        )
+                        category_output = gr.Textbox(
+                            label="Question Category",
+                            value="",
+                            interactive=False,
+                            scale=1
+                        )
+                    sources_output = gr.Textbox(
+                        label="Source Documents (with Chunk Text)",
+                        lines=15,
+                        interactive=False,
+                        info="Shows the retrieved document chunks with full text. File paths are shown for easy access."
+                    )
+            # Example questions
+            gr.Markdown("### 💡 Example Questions")
+            gr.Markdown(f"Select a question below to use it in the chatbot ({len(EXAMPLE_QUESTIONS)} questions - scrollable dropdown):")
+            example_questions_dropdown = gr.Dropdown(
+                choices=EXAMPLE_QUESTIONS,
+                label="Example Questions",
+                value=None,
+                info="Open the dropdown and scroll through all questions. Select one to use it.",
+                interactive=True,
+                container=True,
+                scale=1
+            )
+            def update_question_from_dropdown(selected_question):
+                return selected_question if selected_question else ""
+            example_questions_dropdown.change(
+                fn=update_question_from_dropdown,
+                inputs=example_questions_dropdown,
+                outputs=question_input
+            )
+            # Footer
+            gr.Markdown("""
+            ---
+            **Note:** This chatbot provides informational answers based on medical literature.
+            It is not a substitute for professional medical advice, diagnosis, or treatment.
+            Always consult with qualified healthcare providers for medical decisions.
+            """)
+            # Connect the submit button with OAuth token
+            def process_with_education_level(question, model, education, k, temp, max_tok, hf_token: gr.OAuthToken):
+                education_key = EDUCATION_LEVELS[education]
+                token = hf_token.token if hf_token else None
+                return interface.process_question(question, model, education_key, k, temp, max_tok, hf_token=token)
+            submit_btn.click(
+                fn=process_with_education_level,
+                inputs=[
+                    question_input,
+                    model_dropdown,
+                    education_dropdown,
+                    k_slider,
+                    temperature_slider,
+                    max_tokens_slider,
+                    gr.OAuthToken()
+                ],
+                outputs=[
+                    answer_output,
+                    flesch_output,
+                    sources_output,
+                    similarity_output,
+                    category_output
+                ]
+            )
+            # Also allow Enter key to submit
+            question_input.submit(
+                fn=process_with_education_level,
+                inputs=[
+                    question_input,
+                    model_dropdown,
+                    education_dropdown,
+                    k_slider,
+                    temperature_slider,
+                    max_tokens_slider,
+                    gr.OAuthToken()
+                ],
+                outputs=[
+                    answer_output,
+                    flesch_output,
+                    sources_output,
+                    similarity_output,
+                    category_output
+                ]
+            )
+    except Exception as interface_error:
+        logger.error(f"Error setting up Gradio interface components: {interface_error}", exc_info=True)
+        import traceback
+        error_trace = traceback.format_exc()
+        with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+            gr.Markdown(f"""
+            # ⚠️ Interface Setup Error
+            An error occurred while setting up the interface components.
+            **Error:** {str(interface_error)}
+            **Traceback:**
+            ```
+            {error_trace[:1000]}...
+            ```
+            Please check the logs for more details.
+            """)
+        return demo
+    logger.info("Gradio interface created successfully")
+    return demo
+# Check if we're on Spaces
+IS_SPACES = (
+    os.getenv("SPACE_ID") is not None or
+    os.getenv("SYSTEM") == "spaces" or
+    os.getenv("HF_SPACE_ID") is not None
 )
+# Initialize demo variable
+demo = None
+def _create_demo():
+    """Create the demo - separated into function for better error handling"""
+    try:
+        logger.info("=" * 80)
+        logger.info("Starting demo creation...")
+        logger.info(f"IS_SPACES: {IS_SPACES}")
+        logger.info(f"BOT_AVAILABLE: {BOT_AVAILABLE}")
+        if not BOT_AVAILABLE:
+            raise ImportError("bot module is not available - cannot create demo")
+        # Initialize with default args
+        parser = argparse.ArgumentParser()
+        parser.add_argument('--model', type=str, default='meta-llama/Llama-3.2-3B-Instruct')
+        parser.add_argument('--vector-db-dir', default='./chroma_db')
+        parser.add_argument('--data-dir', default='./Data Resources')
+        parser.add_argument('--max-new-tokens', type=int, default=1024)
+        parser.add_argument('--temperature', type=float, default=0.2)
+        parser.add_argument('--top-p', type=float, default=0.9)
+        parser.add_argument('--repetition-penalty', type=float, default=1.1)
+        parser.add_argument('--k', type=int, default=5)
+        parser.add_argument('--skip-indexing', action='store_true', default=True)
+        parser.add_argument('--verbose', action='store_true', default=False)
+        parser.add_argument('--seed', type=int, default=42)
+        args = parser.parse_args([])  # Empty args
+        args.skip_model_loading = IS_SPACES  # Skip model loading on Spaces, use Inference API
+        logger.info("Creating RAGBot...")
+        bot = RAGBot(args)
+        if bot.vector_retriever is None:
+            raise Exception("Vector database not available")
+        # Check if vector database has documents
+        collection_stats = bot.vector_retriever.get_collection_stats()
+        if collection_stats.get('total_chunks', 0) == 0:
+            logger.warning("Vector database is empty. The chatbot may not find relevant documents.")
+        logger.info("Creating interface...")
+        demo = create_interface(bot)
+        logger.info(f"Demo created successfully: {type(demo)}")
+        return demo
+    except Exception as bot_error:
+        logger.error(f"Error initializing: {bot_error}", exc_info=True)
+        import traceback
+        error_trace = traceback.format_exc()
+        logger.error(f"Full traceback: {error_trace}")
+        with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as error_demo:
+            gr.Markdown(f"""
+            # ⚠️ Initialization Error
+            The chatbot encountered an error during initialization:
+            **Error:** {str(bot_error)}
+            **Possible causes:**
+            - Missing vector database (chroma_db directory)
+            - Missing dependencies
+            - Configuration issues
+            **Error Details:**
+            ```
+            {error_trace[:1000]}...
+            ```
+            """)
+        logger.info(f"Error demo created: {type(error_demo)}")
+        return error_demo
+# Create demo at module level
+try:
+    if IS_SPACES:
+        logger.info("Creating demo directly at module level for Spaces...")
+    else:
+        logger.info("Creating demo for local execution...")
+    demo = _create_demo()
+    if demo is None or not isinstance(demo, (gr.Blocks, gr.Interface)):
+        raise ValueError(f"Demo creation returned invalid result: {type(demo)}")
+    logger.info("Demo creation completed successfully")
+except Exception as e:
+    logger.error(f"CRITICAL: Error creating demo: {e}", exc_info=True)
+    import traceback
+    error_trace = traceback.format_exc()
+    logger.error(f"Full traceback: {error_trace}")
+    with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+        gr.Markdown(f"""
+        # Error Initializing Chatbot
+        A critical error occurred while initializing the chatbot.
+        **Error:** {str(e)}
+        **Traceback:**
+        ```
+        {error_trace[:1500]}...
+        ```
+        Please check the logs for more details.
+        """)
+    logger.info(f"Fallback error demo created: {type(demo)}")
+# Final verification
+if demo is None:
+    logger.error("CRITICAL: Demo variable is None! Creating fallback demo.")
+    with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+        gr.Markdown("# Error: Demo was not created properly\n\nPlease check the logs for details.")
+elif not isinstance(demo, (gr.Blocks, gr.Interface)):
+    logger.error(f"CRITICAL: Demo is not a valid Gradio object: {type(demo)}")
+    with gr.Blocks(title="CGT-LLM-Beta RAG Chatbot") as demo:
+        gr.Markdown(f"# Error: Invalid demo type\n\nDemo type: {type(demo)}\n\nPlease check the logs for details.")
+else:
+    logger.info(f"✅ Final demo check passed: demo type={type(demo)}")
+# For local execution only (not on Spaces)
 if __name__ == "__main__":
+    if not IS_SPACES:
+        # For local use, we can launch it
     demo.launch()

bot.py ADDED Viewed

	@@ -0,0 +1,1777 @@

+#!/usr/bin/env python3
+"""
+RAG Chatbot Implementation for CGT-LLM-Beta with Vector Database
+Production-ready local RAG system with ChromaDB and MPS acceleration for Apple Silicon
+"""
+import argparse
+import csv
+import json
+import logging
+import os
+import re
+import sys
+import time
+import hashlib
+from pathlib import Path
+from typing import List, Tuple, Dict, Any, Optional, Union
+from dataclasses import dataclass
+from collections import defaultdict
+import textstat
+import torch
+import numpy as np
+import pandas as pd
+from tqdm import tqdm
+# Optional imports with graceful fallbacks
+try:
+    import chromadb
+    from chromadb.config import Settings
+    CHROMADB_AVAILABLE = True
+except ImportError:
+    CHROMADB_AVAILABLE = False
+    print("Warning: chromadb not available. Install with: pip install chromadb")
+try:
+    from sentence_transformers import SentenceTransformer
+    SENTENCE_TRANSFORMERS_AVAILABLE = True
+except ImportError:
+    SENTENCE_TRANSFORMERS_AVAILABLE = False
+    print("Warning: sentence-transformers not available. Install with: pip install sentence-transformers")
+try:
+    import pypdf
+    PDF_AVAILABLE = True
+except ImportError:
+    PDF_AVAILABLE = False
+    print("Warning: pypdf not available. PDF files will be skipped.")
+try:
+    from docx import Document
+    DOCX_AVAILABLE = True
+except ImportError:
+    DOCX_AVAILABLE = False
+    print("Warning: python-docx not available. DOCX files will be skipped.")
+try:
+    from rank_bm25 import BM25Okapi
+    BM25_AVAILABLE = True
+except ImportError:
+    BM25_AVAILABLE = False
+    print("Warning: rank-bm25 not available. BM25 retrieval disabled.")
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.StreamHandler(),
+        logging.FileHandler('rag_bot.log')
+    ]
+)
+logger = logging.getLogger(__name__)
+@dataclass
+class Document:
+    """Represents a document with metadata"""
+    filename: str
+    content: str
+    filepath: str
+    file_type: str
+    chunk_count: int = 0
+    file_hash: str = ""
+@dataclass
+class Chunk:
+    """Represents a text chunk with metadata"""
+    text: str
+    filename: str
+    chunk_id: int
+    total_chunks: int
+    start_pos: int
+    end_pos: int
+    metadata: Dict[str, Any]
+    chunk_hash: str = ""
+class VectorRetriever:
+    """ChromaDB-based vector retrieval"""
+    def __init__(self, collection_name: str = "cgt_documents", persist_directory: str = "./chroma_db"):
+        if not CHROMADB_AVAILABLE:
+            raise ImportError("ChromaDB is required for vector retrieval")
+        self.collection_name = collection_name
+        self.persist_directory = persist_directory
+        # Initialize ChromaDB client
+        self.client = chromadb.PersistentClient(path=persist_directory)
+        # Get or create collection
+        try:
+            self.collection = self.client.get_collection(name=collection_name)
+            logger.info(f"Loaded existing collection '{collection_name}' with {self.collection.count()} documents")
+        except:
+            self.collection = self.client.create_collection(
+                name=collection_name,
+                metadata={"description": "CGT-LLM-Beta document collection"}
+            )
+            logger.info(f"Created new collection '{collection_name}'")
+        # Initialize embedding model
+        if SENTENCE_TRANSFORMERS_AVAILABLE:
+            self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
+            logger.info("Loaded sentence-transformers embedding model")
+        else:
+            self.embedding_model = None
+            logger.warning("Sentence-transformers not available, using ChromaDB default embeddings")
+    def add_documents(self, chunks: List[Chunk]) -> None:
+        """Add document chunks to the vector database"""
+        if not chunks:
+            return
+        logger.info(f"Adding {len(chunks)} chunks to vector database...")
+        # Prepare data for ChromaDB
+        documents = []
+        metadatas = []
+        ids = []
+        for chunk in chunks:
+            chunk_id = f"{chunk.filename}_{chunk.chunk_id}"
+            documents.append(chunk.text)
+            metadata = {
+                "filename": chunk.filename,
+                "chunk_id": chunk.chunk_id,
+                "total_chunks": chunk.total_chunks,
+                "start_pos": chunk.start_pos,
+                "end_pos": chunk.end_pos,
+                "chunk_hash": chunk.chunk_hash,
+                **chunk.metadata
+            }
+            metadatas.append(metadata)
+            ids.append(chunk_id)
+        # Add to collection
+        try:
+            self.collection.add(
+                documents=documents,
+                metadatas=metadatas,
+                ids=ids
+            )
+            logger.info(f"Successfully added {len(chunks)} chunks to vector database")
+        except Exception as e:
+            logger.error(f"Error adding documents to vector database: {e}")
+    def search(self, query: str, k: int = 5) -> List[Tuple[Chunk, float]]:
+        """Search for similar chunks using vector similarity"""
+        try:
+            # Perform vector search
+            results = self.collection.query(
+                query_texts=[query],
+                n_results=k
+            )
+            chunks_with_scores = []
+            if results['documents'] and results['documents'][0]:
+                for i, (doc, metadata, distance) in enumerate(zip(
+                    results['documents'][0],
+                    results['metadatas'][0],
+                    results['distances'][0]
+                )):
+                    # Convert distance to similarity score (ChromaDB uses cosine distance)
+                    similarity_score = 1 - distance
+                    chunk = Chunk(
+                        text=doc,
+                        filename=metadata['filename'],
+                        chunk_id=metadata['chunk_id'],
+                        total_chunks=metadata['total_chunks'],
+                        start_pos=metadata['start_pos'],
+                        end_pos=metadata['end_pos'],
+                        metadata={k: v for k, v in metadata.items()
+                                if k not in ['filename', 'chunk_id', 'total_chunks', 'start_pos', 'end_pos', 'chunk_hash']},
+                        chunk_hash=metadata.get('chunk_hash', '')
+                    )
+                    chunks_with_scores.append((chunk, similarity_score))
+            return chunks_with_scores
+        except Exception as e:
+            logger.error(f"Error searching vector database: {e}")
+            return []
+    def get_collection_stats(self) -> Dict[str, Any]:
+        """Get statistics about the collection"""
+        try:
+            count = self.collection.count()
+            return {
+                "total_chunks": count,
+                "collection_name": self.collection_name,
+                "persist_directory": self.persist_directory
+            }
+        except Exception as e:
+            logger.error(f"Error getting collection stats: {e}")
+            return {}
+class RAGBot:
+    """Main RAG chatbot class with vector database"""
+    def __init__(self, args):
+        self.args = args
+        self.device = self._setup_device()
+        self.model = None
+        self.tokenizer = None
+        self.vector_retriever = None
+        # Load model (unless skipping for Inference API)
+        if not hasattr(args, 'skip_model_loading') or not args.skip_model_loading:
+            self._load_model()
+        # Initialize vector retriever
+        self._setup_vector_retriever()
+    def _setup_device(self) -> str:
+        """Setup device with MPS support for Apple Silicon"""
+        if torch.backends.mps.is_available():
+            device = "mps"
+            logger.info("Using device: mps (Apple Silicon)")
+        elif torch.cuda.is_available():
+            device = "cuda"
+            logger.info("Using device: cuda")
+        else:
+            device = "cpu"
+            logger.info("Using device: cpu")
+        return device
+    def _load_model(self):
+        """Load the specified LLM model and tokenizer"""
+        try:
+            model_name = self.args.model
+            logger.info(f"Loading model: {model_name}...")
+            from transformers import AutoTokenizer, AutoModelForCausalLM
+            # Get Hugging Face token from environment (for gated models)
+            hf_token = os.getenv("HF_TOKEN") or os.getenv("HUGGING_FACE_HUB_TOKEN")
+            # Load tokenizer
+            tokenizer_kwargs = {
+                "trust_remote_code": True
+            }
+            if hf_token:
+                tokenizer_kwargs["token"] = hf_token
+                logger.info("Using HF_TOKEN for authentication")
+            self.tokenizer = AutoTokenizer.from_pretrained(
+                model_name,
+                **tokenizer_kwargs
+            )
+            # Determine appropriate torch dtype based on device and model
+            # Use float16 for MPS/CUDA, float32 for CPU
+            # Some models work better with bfloat16
+            if self.device == "mps":
+                torch_dtype = torch.float16
+            elif self.device == "cuda":
+                torch_dtype = torch.float16
+            else:
+                torch_dtype = torch.float32
+            # Load model with appropriate settings
+            model_kwargs = {
+                "torch_dtype": torch_dtype,
+                "trust_remote_code": True,
+            }
+            # Add token if available (for gated models)
+            if hf_token:
+                model_kwargs["token"] = hf_token
+            # Use 8-bit quantization on CPU to reduce memory usage
+            # This reduces memory by ~50% with minimal quality loss
+            if self.device == "cpu":
+                try:
+                    from transformers import BitsAndBytesConfig
+                    # Use 8-bit quantization for CPU (reduces memory significantly)
+                    model_kwargs["load_in_8bit"] = False  # 8-bit not available on CPU
+                    # Instead, use float16 even on CPU to save memory
+                    model_kwargs["torch_dtype"] = torch.float16
+                    logger.info("Using float16 on CPU to reduce memory usage")
+                except ImportError:
+                    # Fallback: use float16 anyway
+                    model_kwargs["torch_dtype"] = torch.float16
+                    logger.info("Using float16 on CPU to reduce memory usage (fallback)")
+            # For MPS, use device_map; for CUDA, let it auto-detect
+            if self.device == "mps":
+                model_kwargs["device_map"] = self.device
+            elif self.device == "cuda":
+                model_kwargs["device_map"] = "auto"
+            # For CPU, don't specify device_map
+            self.model = AutoModelForCausalLM.from_pretrained(
+                model_name,
+                **model_kwargs
+            )
+            # Move to device if not using device_map
+            if self.device == "cpu":
+                self.model = self.model.to(self.device)
+            # Set pad token if not already set
+            if self.tokenizer.pad_token is None:
+                if self.tokenizer.eos_token is not None:
+                    self.tokenizer.pad_token = self.tokenizer.eos_token
+                else:
+                    # Some models might need a different approach
+                    self.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
+            logger.info(f"Model {model_name} loaded successfully on {self.device}")
+        except Exception as e:
+            logger.error(f"Failed to load model {self.args.model}: {e}")
+            logger.error("Make sure the model name is correct and you have access to it on HuggingFace")
+            logger.error("For gated models (like Llama), you need to:")
+            logger.error("  1. Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct")
+            logger.error("  2. Add HF_TOKEN as a secret in your Hugging Face Space settings")
+            logger.error("  3. Get your token from: https://huggingface.co/settings/tokens")
+            logger.error("For local use, ensure you're logged in: huggingface-cli login")
+            sys.exit(2)
+    def _setup_vector_retriever(self):
+        """Setup the vector retriever"""
+        try:
+            self.vector_retriever = VectorRetriever(
+                collection_name="cgt_documents",
+                persist_directory=self.args.vector_db_dir
+            )
+            logger.info("Vector retriever initialized successfully")
+        except Exception as e:
+            logger.error(f"Failed to setup vector retriever: {e}")
+            sys.exit(2)
+    def _calculate_file_hash(self, filepath: str) -> str:
+        """Calculate hash of file for change detection"""
+        try:
+            with open(filepath, 'rb') as f:
+                return hashlib.md5(f.read()).hexdigest()
+        except:
+            return ""
+    def _calculate_chunk_hash(self, text: str) -> str:
+        """Calculate hash of chunk text"""
+        return hashlib.md5(text.encode('utf-8')).hexdigest()
+    def load_corpus(self, data_dir: str) -> List[Document]:
+        """Load all documents from the data directory"""
+        logger.info(f"Loading corpus from {data_dir}")
+        documents = []
+        data_path = Path(data_dir)
+        if not data_path.exists():
+            logger.error(f"Data directory {data_dir} does not exist")
+            sys.exit(1)
+        # Supported file extensions
+        supported_extensions = {'.txt', '.md', '.json', '.csv'}
+        if PDF_AVAILABLE:
+            supported_extensions.add('.pdf')
+        if DOCX_AVAILABLE:
+            supported_extensions.add('.docx')
+            supported_extensions.add('.doc')
+        # Find all files recursively
+        files = []
+        for ext in supported_extensions:
+            files.extend(data_path.rglob(f"*{ext}"))
+        logger.info(f"Found {len(files)} files to process")
+        # Process files with progress bar
+        for file_path in tqdm(files, desc="Loading documents"):
+            try:
+                content = self._read_file(file_path)
+                if content.strip():  # Only add non-empty documents
+                    file_hash = self._calculate_file_hash(file_path)
+                    doc = Document(
+                        filename=file_path.name,
+                        content=content,
+                        filepath=str(file_path),
+                        file_type=file_path.suffix.lower(),
+                        file_hash=file_hash
+                    )
+                    documents.append(doc)
+                    logger.debug(f"Loaded {file_path.name} ({len(content)} chars)")
+                else:
+                    logger.warning(f"Skipping empty file: {file_path.name}")
+            except Exception as e:
+                logger.error(f"Failed to load {file_path.name}: {e}")
+                continue
+        logger.info(f"Successfully loaded {len(documents)} documents")
+        return documents
+    def _read_file(self, file_path: Path) -> str:
+        """Read content from various file types"""
+        suffix = file_path.suffix.lower()
+        try:
+            if suffix == '.txt':
+                return file_path.read_text(encoding='utf-8')
+            elif suffix == '.md':
+                return file_path.read_text(encoding='utf-8')
+            elif suffix == '.json':
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    data = json.load(f)
+                    if isinstance(data, dict):
+                        return json.dumps(data, indent=2)
+                    else:
+                        return str(data)
+            elif suffix == '.csv':
+                df = pd.read_csv(file_path)
+                return df.to_string()
+            elif suffix == '.pdf' and PDF_AVAILABLE:
+                text = ""
+                with open(file_path, 'rb') as f:
+                    pdf_reader = pypdf.PdfReader(f)
+                    for page in pdf_reader.pages:
+                        text += page.extract_text() + "\n"
+                return text
+            elif suffix in ['.docx', '.doc'] and DOCX_AVAILABLE:
+                doc = Document(file_path)
+                text = ""
+                for paragraph in doc.paragraphs:
+                    text += paragraph.text + "\n"
+                return text
+            else:
+                logger.warning(f"Unsupported file type: {suffix}")
+                return ""
+        except Exception as e:
+            logger.error(f"Error reading {file_path}: {e}")
+            return ""
+    def chunk_documents(self, docs: List[Document], chunk_size: int, overlap: int) -> List[Chunk]:
+        """Chunk documents into smaller pieces"""
+        logger.info(f"Chunking {len(docs)} documents (size={chunk_size}, overlap={overlap})")
+        chunks = []
+        for doc in docs:
+            doc_chunks = self._chunk_text(
+                doc.content,
+                doc.filename,
+                chunk_size,
+                overlap
+            )
+            chunks.extend(doc_chunks)
+            # Update document metadata
+            doc.chunk_count = len(doc_chunks)
+        logger.info(f"Created {len(chunks)} chunks from {len(docs)} documents")
+        return chunks
+    def _chunk_text(self, text: str, filename: str, chunk_size: int, overlap: int) -> List[Chunk]:
+        """Split text into overlapping chunks"""
+        # Clean text
+        text = re.sub(r'\s+', ' ', text.strip())
+        # Simple token-based chunking (approximate)
+        words = text.split()
+        chunks = []
+        for i in range(0, len(words), chunk_size - overlap):
+            chunk_words = words[i:i + chunk_size]
+            chunk_text = ' '.join(chunk_words)
+            if chunk_text.strip():
+                chunk_hash = self._calculate_chunk_hash(chunk_text)
+                chunk = Chunk(
+                    text=chunk_text,
+                    filename=filename,
+                    chunk_id=len(chunks),
+                    total_chunks=0,  # Will be updated later
+                    start_pos=i,
+                    end_pos=i + len(chunk_words),
+                    metadata={
+                        'word_count': len(chunk_words),
+                        'char_count': len(chunk_text)
+                    },
+                    chunk_hash=chunk_hash
+                )
+                chunks.append(chunk)
+        # Update total_chunks for each chunk
+        for chunk in chunks:
+            chunk.total_chunks = len(chunks)
+        return chunks
+    def build_or_update_index(self, chunks: List[Chunk], force_rebuild: bool = False) -> None:
+        """Build or update the vector index"""
+        if not chunks:
+            logger.warning("No chunks provided for indexing")
+            return
+        # Check if we need to rebuild
+        collection_stats = self.vector_retriever.get_collection_stats()
+        existing_count = collection_stats.get('total_chunks', 0)
+        if existing_count > 0 and not force_rebuild:
+            logger.info(f"Vector database already contains {existing_count} chunks. Use --force-rebuild to rebuild.")
+            return
+        if force_rebuild and existing_count > 0:
+            logger.info("Force rebuild requested. Clearing existing collection...")
+            try:
+                self.client.delete_collection(self.vector_retriever.collection_name)
+                self.vector_retriever.collection = self.client.create_collection(
+                    name=self.vector_retriever.collection_name,
+                    metadata={"description": "CGT-LLM-Beta document collection"}
+                )
+            except Exception as e:
+                logger.error(f"Error clearing collection: {e}")
+        # Add chunks to vector database
+        self.vector_retriever.add_documents(chunks)
+        logger.info("Vector index built successfully")
+    def retrieve(self, query: str, k: int) -> List[Chunk]:
+        """Retrieve relevant chunks for a query using vector search"""
+        results = self.vector_retriever.search(query, k)
+        chunks = [chunk for chunk, score in results]
+        if self.args.verbose:
+            logger.info(f"Retrieved {len(chunks)} chunks for query: {query[:50]}...")
+            for i, (chunk, score) in enumerate(results):
+                logger.info(f"  {i+1}. {chunk.filename} (score: {score:.3f})")
+        return chunks
+    def retrieve_with_scores(self, query: str, k: int) -> Tuple[List[Chunk], List[float]]:
+        """Retrieve relevant chunks with similarity scores
+        Returns:
+            Tuple of (chunks, scores) where scores are similarity scores for each chunk
+        """
+        results = self.vector_retriever.search(query, k)
+        chunks = [chunk for chunk, score in results]
+        scores = [score for chunk, score in results]
+        if self.args.verbose:
+            logger.info(f"Retrieved {len(chunks)} chunks for query: {query[:50]}...")
+            for i, (chunk, score) in enumerate(results):
+                logger.info(f"  {i+1}. {chunk.filename} (score: {score:.3f})")
+        return chunks, scores
+    def format_prompt(self, context_chunks: List[Chunk], question: str) -> str:
+        """Format the prompt with context and question, ensuring it fits within token limits"""
+        context_parts = []
+        for chunk in context_chunks:
+            context_parts.append(f"{chunk.text}")
+        context = "\n".join(context_parts)
+        # Try to use the tokenizer's chat template if available
+        if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+            try:
+                messages = [
+                    {"role": "system", "content": "You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative."},
+                    {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
+                ]
+                base_prompt = self.tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            except Exception as e:
+                logger.warning(f"Failed to use chat template, falling back to manual format: {e}")
+                base_prompt = self._format_prompt_manual(context, question)
+        else:
+            # Fall back to manual formatting (for Llama models)
+            base_prompt = self._format_prompt_manual(context, question)
+        # Check if prompt is too long and truncate context if needed
+        max_context_tokens = 1200  # Leave room for generation
+        try:
+            tokenized = self.tokenizer(base_prompt, return_tensors="pt")
+            current_tokens = tokenized['input_ids'].shape[1]
+        except Exception as e:
+            logger.warning(f"Tokenization error, using base prompt as-is: {e}")
+            return base_prompt
+        if current_tokens > max_context_tokens:
+            # Truncate context to fit within limits
+            try:
+                context_tokens = self.tokenizer(context, return_tensors="pt")['input_ids'].shape[1]
+                available_tokens = max_context_tokens - (current_tokens - context_tokens)
+                if available_tokens > 0:
+                    # Truncate context to fit
+                    truncated_context = self.tokenizer.decode(
+                        self.tokenizer(context, return_tensors="pt", truncation=True, max_length=available_tokens)['input_ids'][0],
+                        skip_special_tokens=True
+                    )
+                    # Reformat with truncated context
+                    if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                        try:
+                            messages = [
+                                {"role": "system", "content": "You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative."},
+                                {"role": "user", "content": f"Context: {truncated_context}\n\nQuestion: {question}"}
+                            ]
+                            prompt = self.tokenizer.apply_chat_template(
+                                messages,
+                                tokenize=False,
+                                add_generation_prompt=True
+                            )
+                        except:
+                            prompt = self._format_prompt_manual(truncated_context, question)
+                    else:
+                        prompt = self._format_prompt_manual(truncated_context, question)
+                else:
+                    # If even basic prompt is too long, use minimal format
+                    prompt = self._format_prompt_manual(context[:500] + "...", question)
+            except Exception as e:
+                logger.warning(f"Error truncating context: {e}, using base prompt")
+                prompt = base_prompt
+        else:
+            prompt = base_prompt
+        return prompt
+    def _format_prompt_manual(self, context: str, question: str) -> str:
+        """Manual prompt formatting for models without chat templates (e.g., Llama)"""
+        return f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a helpful medical assistant. Answer questions based on the provided context. Be specific and informative.<|eot_id|><|start_header_id|>user<|end_header_id|>
+Context: {context}
+Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+    def format_improved_prompt(self, context_chunks: List[Chunk], question: str) -> Tuple[str, str]:
+        """Format an improved prompt with better tone, structure, and medical appropriateness
+        Returns:
+            Tuple of (prompt, prompt_text) where prompt_text is the system prompt instructions
+        """
+        context_parts = []
+        for chunk in context_chunks:
+            context_parts.append(f"{chunk.text}")
+        context = "\n".join(context_parts)
+        # Improved prompt with all the feedback incorporated
+        improved_prompt_text = """Provide a concise, neutral, and informative answer based on the provided medical context.
+CRITICAL GUIDELINES:
+- Format your response as clear, well-structured sentences and paragraphs
+- Be concise and direct - focus on answering the specific question asked
+- Use neutral, factual language - do NOT tell the questioner how to feel (avoid phrases like 'don't worry', 'the good news is', etc.)
+- Do NOT use leading or coercive language - present information neutrally to preserve patient autonomy
+- Do NOT make specific medical recommendations - instead state that management decisions should be made with a healthcare provider
+- Use third-person voice only - never claim to be a medical professional or assistant
+- Use consistent terminology: use 'children' (not 'offspring') consistently
+- Do NOT include hypothetical examples with specific names (e.g., avoid 'Aunt Jenna' or similar)
+- Include important distinctions when relevant (e.g., somatic vs. germline variants, reproductive risks)
+- When citing sources, be consistent - always specify which guidelines or sources when mentioned
+- Remove any formatting markers like asterisks (*) or bold markers
+- Do NOT include phrases like 'Here's a rewritten version' - just provide the answer directly
+If the question asks about medical management, screening, or interventions, conclude with: 'Management recommendations are individualized and should be discussed with a healthcare provider or genetic counselor.'"""
+        # Try to use the tokenizer's chat template if available
+        if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+            try:
+                messages = [
+                    {"role": "system", "content": improved_prompt_text},
+                    {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
+                ]
+                base_prompt = self.tokenizer.apply_chat_template(
+                    messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            except Exception as e:
+                logger.warning(f"Failed to use chat template for improved prompt, falling back to manual format: {e}")
+                base_prompt = self._format_improved_prompt_manual(context, question, improved_prompt_text)
+        else:
+            # Fall back to manual formatting (for Llama models)
+            base_prompt = self._format_improved_prompt_manual(context, question, improved_prompt_text)
+        # Check if prompt is too long and truncate context if needed
+        max_context_tokens = 1200  # Leave room for generation
+        try:
+            tokenized = self.tokenizer(base_prompt, return_tensors="pt")
+            current_tokens = tokenized['input_ids'].shape[1]
+        except Exception as e:
+            logger.warning(f"Tokenization error for improved prompt, using base prompt as-is: {e}")
+            return base_prompt, improved_prompt_text
+        if current_tokens > max_context_tokens:
+            # Truncate context to fit within limits
+            try:
+                context_tokens = self.tokenizer(context, return_tensors="pt")['input_ids'].shape[1]
+                available_tokens = max_context_tokens - (current_tokens - context_tokens)
+                if available_tokens > 0:
+                    # Truncate context to fit
+                    truncated_context = self.tokenizer.decode(
+                        self.tokenizer(context, return_tensors="pt", truncation=True, max_length=available_tokens)['input_ids'][0],
+                        skip_special_tokens=True
+                    )
+                    # Reformat with truncated context
+                    if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                        try:
+                            messages = [
+                                {"role": "system", "content": improved_prompt_text},
+                                {"role": "user", "content": f"Context: {truncated_context}\n\nQuestion: {question}"}
+                            ]
+                            prompt = self.tokenizer.apply_chat_template(
+                                messages,
+                                tokenize=False,
+                                add_generation_prompt=True
+                            )
+                        except:
+                            prompt = self._format_improved_prompt_manual(truncated_context, question, improved_prompt_text)
+                    else:
+                        prompt = self._format_improved_prompt_manual(truncated_context, question, improved_prompt_text)
+                else:
+                    # If even basic prompt is too long, use minimal format
+                    prompt = self._format_improved_prompt_manual(context[:500] + "...", question, improved_prompt_text)
+            except Exception as e:
+                logger.warning(f"Error truncating context for improved prompt: {e}, using base prompt")
+                prompt = base_prompt
+        else:
+            prompt = base_prompt
+        return prompt, improved_prompt_text
+    def _format_improved_prompt_manual(self, context: str, question: str, improved_prompt_text: str) -> str:
+        """Manual prompt formatting for improved prompts (for models without chat templates)"""
+        return f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{improved_prompt_text}<|eot_id|><|start_header_id|>user<|end_header_id|>
+Context: {context}
+Question: {question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+    def generate_answer(self, prompt: str, **gen_kwargs) -> str:
+        """Generate answer using the language model"""
+        try:
+            if self.args.verbose:
+                logger.info(f"Full prompt (first 500 chars): {prompt[:500]}...")
+            # Tokenize input with more conservative limit to leave room for generation
+            inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1500)
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            if self.args.verbose:
+                logger.info(f"Input tokens: {inputs['input_ids'].shape}")
+            # Generate
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_new_tokens=gen_kwargs.get('max_new_tokens', 512),
+                    temperature=gen_kwargs.get('temperature', 0.7),
+                    top_p=gen_kwargs.get('top_p', 0.95),
+                    repetition_penalty=gen_kwargs.get('repetition_penalty', 1.05),
+                    do_sample=True,
+                    pad_token_id=self.tokenizer.eos_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    use_cache=True,
+                    num_beams=1
+                )
+            # Decode response without skipping special tokens to preserve full length
+            response = self.tokenizer.decode(outputs[0], skip_special_tokens=False)
+            if self.args.verbose:
+                logger.info(f"Full response (first 1000 chars): {response[:1000]}...")
+                logger.info(f"Looking for 'Answer:' in response: {'Answer:' in response}")
+                if "Answer:" in response:
+                    answer_part = response.split("Answer:")[-1]
+                    logger.info(f"Answer part (first 200 chars): {answer_part[:200]}...")
+                # Debug: Show the full response to understand the structure
+                logger.info(f"Full response length: {len(response)}")
+                logger.info(f"Prompt length: {len(prompt)}")
+                logger.info(f"Response after prompt (first 500 chars): {response[len(prompt):][:500]}...")
+            # Extract the answer more robustly by looking for the end of the prompt
+            # Find the actual end of the prompt in the response
+            prompt_end_marker = "<|start_header_id|>assistant<|end_header_id|>\n\n"
+            if prompt_end_marker in response:
+                answer = response.split(prompt_end_marker)[-1].strip()
+            else:
+                # Fallback to character-based extraction
+                answer = response[len(prompt):].strip()
+            if self.args.verbose:
+                logger.info(f"Full LLM output (first 200 chars): {answer[:200]}...")
+                logger.info(f"Full LLM output length: {len(answer)} characters")
+                logger.info(f"Full LLM output (last 200 chars): ...{answer[-200:]}")
+            # Only do minimal cleanup to preserve the full response
+            # Remove special tokens that might interfere with display, but preserve content
+            if "<|start_header_id|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|start_header_id|>"):
+                    answer = answer[:-len("<|start_header_id|>")].strip()
+            if "<|eot_id|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|eot_id|>"):
+                    answer = answer[:-len("<|eot_id|>")].strip()
+            if "<|end_of_text|>" in answer:
+                # Only remove if it's at the very end
+                if answer.endswith("<|end_of_text|>"):
+                    answer = answer[:-len("<|end_of_text|>")].strip()
+            # Final validation - only reject if completely empty
+            if not answer or len(answer) < 3:
+                answer = "I don't know."
+            if self.args.verbose:
+                logger.info(f"Final answer: '{answer}'")
+            return answer
+        except Exception as e:
+            logger.error(f"Generation error: {e}")
+            return "I encountered an error while generating the answer."
+    def process_questions(self, questions_path: str, **kwargs) -> List[Tuple[str, str, str, str, float, str, float, str, float, str, str]]:
+        """Process all questions and generate answers with multiple readability levels
+        Returns:
+            List of tuples: (question, answer, sources, question_group, original_flesch,
+                            middle_school_answer, middle_school_flesch,
+                            high_school_answer, high_school_flesch, improved_answer, similarity_scores)
+        """
+        logger.info(f"Processing questions from {questions_path}")
+        # Load questions
+        try:
+            with open(questions_path, 'r', encoding='utf-8') as f:
+                questions = [line.strip() for line in f if line.strip()]
+        except Exception as e:
+            logger.error(f"Failed to load questions: {e}")
+            sys.exit(1)
+        logger.info(f"Found {len(questions)} questions to process")
+        qa_pairs = []
+        # Get the improved prompt text for CSV header by calling format_improved_prompt with empty chunks
+        # This will give us the prompt text without actually generating
+        _, improved_prompt_text = self.format_improved_prompt([], "")
+        # Initialize CSV file with headers
+        self.write_csv([], kwargs.get('output_file', 'results.csv'), append=False, improved_prompt_text=improved_prompt_text)
+        # Process each question
+        for i, question in enumerate(tqdm(questions, desc="Processing questions")):
+            logger.info(f"Question {i+1}/{len(questions)}: {question[:50]}...")
+            try:
+                # Categorize question
+                question_group = self._categorize_question(question)
+                # Retrieve relevant chunks with similarity scores
+                context_chunks, similarity_scores = self.retrieve_with_scores(question, self.args.k)
+                # Format similarity scores as a string (comma-separated, 3 decimal places)
+                similarity_scores_str = ", ".join([f"{score:.3f}" for score in similarity_scores]) if similarity_scores else "0.000"
+                if not context_chunks:
+                    answer = "I don't know."
+                    sources = "No sources found"
+                    middle_school_answer = "I don't know."
+                    high_school_answer = "I don't know."
+                    improved_answer = "I don't know."
+                    original_flesch = 0.0
+                    middle_school_flesch = 0.0
+                    high_school_flesch = 0.0
+                    similarity_scores_str = "0.000"
+                else:
+                    # Format original prompt
+                    prompt = self.format_prompt(context_chunks, question)
+                    # Generate original answer
+                    start_time = time.time()
+                    answer = self.generate_answer(prompt, **kwargs)
+                    gen_time = time.time() - start_time
+                    # Generate improved answer
+                    improved_prompt, _ = self.format_improved_prompt(context_chunks, question)
+                    improved_start = time.time()
+                    improved_answer = self.generate_answer(improved_prompt, **kwargs)
+                    improved_time = time.time() - improved_start
+                    # Clean up improved answer - remove unwanted phrases and formatting
+                    improved_answer = self._clean_improved_answer(improved_answer)
+                    logger.info(f"Improved answer generated in {improved_time:.2f}s")
+                    # Extract source documents
+                    sources = self._extract_sources(context_chunks)
+                    # Calculate original answer Flesch score
+                    try:
+                        original_flesch = textstat.flesch_kincaid_grade(answer)
+                    except:
+                        original_flesch = 0.0
+                    # Generate middle school version
+                    readability_start = time.time()
+                    middle_school_answer, middle_school_flesch = self.enhance_readability(answer, "middle_school")
+                    readability_time = time.time() - readability_start
+                    logger.info(f"Middle school readability in {readability_time:.2f}s")
+                    # Generate high school version
+                    readability_start = time.time()
+                    high_school_answer, high_school_flesch = self.enhance_readability(answer, "high_school")
+                    readability_time = time.time() - readability_start
+                    logger.info(f"High school readability in {readability_time:.2f}s")
+                    logger.info(f"Generated answer in {gen_time:.2f}s")
+                    logger.info(f"Sources: {sources}")
+                    logger.info(f"Similarity scores: {similarity_scores_str}")
+                    logger.info(f"Original Flesch: {original_flesch:.1f}, Middle School: {middle_school_flesch:.1f}, High School: {high_school_flesch:.1f}")
+                qa_pairs.append((question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str))
+                # Write incrementally to CSV after each question
+                self.write_csv([(question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str)],
+                             kwargs.get('output_file', 'results.csv'), append=True, improved_prompt_text=improved_prompt_text)
+                logger.info(f"Progress saved: {i+1}/{len(questions)} questions completed")
+            except Exception as e:
+                logger.error(f"Error processing question {i+1}: {e}")
+                error_answer = "I encountered an error processing this question."
+                sources = "Error retrieving sources"
+                question_group = self._categorize_question(question)
+                original_flesch = 0.0
+                middle_school_answer = "I encountered an error processing this question."
+                high_school_answer = "I encountered an error processing this question."
+                improved_answer = "I encountered an error processing this question."
+                middle_school_flesch = 0.0
+                high_school_flesch = 0.0
+                similarity_scores_str = "0.000"
+                qa_pairs.append((question, error_answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str))
+                # Still write the error to CSV
+                self.write_csv([(question, error_answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores_str)],
+                             kwargs.get('output_file', 'results.csv'), append=True, improved_prompt_text=improved_prompt_text)
+                logger.info(f"Error saved: {i+1}/{len(questions)} questions completed")
+        return qa_pairs
+    def _clean_readability_answer(self, answer: str, target_level: str) -> str:
+        """Clean up readability-enhanced answers to remove unwanted phrases and formatting
+        Args:
+            answer: The readability-enhanced answer
+            target_level: Either "middle_school" or "high_school"
+        """
+        cleaned = answer
+        # Remove the "Here's a rewritten version" phrases
+        if target_level == "middle_school":
+            unwanted_phrases = [
+                "Here's a rewritten version of the text at a middle school reading level:",
+                "Here's a rewritten version of the text at a middle school reading level",
+                "Here is a rewritten version of the text at a middle school reading level:",
+                "Here is a rewritten version of the text at a middle school reading level",
+                "Here's a rewritten version at a middle school reading level:",
+                "Here's a rewritten version at a middle school reading level",
+            ]
+        elif target_level == "high_school":
+            unwanted_phrases = [
+                "Here's a rewritten version of the text at a high school reading level",
+                "Here's a rewritten version of the text at a high school reading level:",
+                "Here is a rewritten version of the text at a high school reading level",
+                "Here is a rewritten version of the text at a high school reading level:",
+                "Here's a rewritten version at a high school reading level",
+                "Here's a rewritten version at a high school reading level:",
+            ]
+        else:
+            unwanted_phrases = []
+        for phrase in unwanted_phrases:
+            if phrase.lower() in cleaned.lower():
+                # Find and remove the phrase (case-insensitive)
+                pattern = re.compile(re.escape(phrase), re.IGNORECASE)
+                cleaned = pattern.sub("", cleaned).strip()
+                # Remove leading colons, semicolons, or dashes
+                cleaned = re.sub(r'^[:;\-]\s*', '', cleaned).strip()
+        # Remove asterisks (but preserve bullet points if they use •)
+        cleaned = re.sub(r'\*\*', '', cleaned)  # Remove bold markers
+        cleaned = re.sub(r'\(\*\)', '', cleaned)  # Remove (*)
+        cleaned = re.sub(r'\*', '', cleaned)  # Remove remaining asterisks
+        # Clean up extra whitespace
+        cleaned = ' '.join(cleaned.split())
+        return cleaned
+    def _clean_improved_answer(self, answer: str) -> str:
+        """Clean up improved answer to remove unwanted phrases and formatting"""
+        # Remove phrases like "Here's a rewritten version" or similar
+        unwanted_phrases = [
+            "Here's a rewritten version",
+            "Here's a version",
+            "Here is a rewritten version",
+            "Here is a version",
+            "Here's the answer",
+            "Here is the answer"
+        ]
+        cleaned = answer
+        for phrase in unwanted_phrases:
+            if phrase.lower() in cleaned.lower():
+                # Find and remove the phrase and any following colon/semicolon
+                pattern = re.compile(re.escape(phrase), re.IGNORECASE)
+                cleaned = pattern.sub("", cleaned).strip()
+                # Remove leading colons, semicolons, or dashes
+                cleaned = re.sub(r'^[:;\-]\s*', '', cleaned).strip()
+        # Remove formatting markers like (*) or ** but preserve bullet points
+        cleaned = re.sub(r'\*\*', '', cleaned)  # Remove bold markers
+        cleaned = re.sub(r'\(\*\)', '', cleaned)  # Remove (*)
+        # Note: Single asterisks are left alone as they might be used for formatting
+        # The prompt specifies using • for bullet points, so this should be fine
+        # Remove "Don't worry" and similar emotional management phrases
+        emotional_phrases = [
+            r"don't worry[^.]*\.\s*",
+            r"Don't worry[^.]*\.\s*",
+            r"the good news is[^.]*\.\s*",
+            r"The good news is[^.]*\.\s*",
+        ]
+        for pattern in emotional_phrases:
+            cleaned = re.sub(pattern, '', cleaned, flags=re.IGNORECASE)
+        # Clean up extra whitespace
+        cleaned = ' '.join(cleaned.split())
+        return cleaned
+    def diagnose_system(self, sample_questions: List[str] = None) -> Dict[str, Any]:
+        """Diagnose the document loading, chunking, and retrieval system
+        Args:
+            sample_questions: Optional list of questions to test retrieval
+        Returns:
+            Dictionary with diagnostic information
+        """
+        diagnostics = {
+            'vector_db_stats': {},
+            'document_stats': {},
+            'chunk_stats': {},
+            'retrieval_tests': []
+        }
+        # Check vector database
+        try:
+            stats = self.vector_retriever.get_collection_stats()
+            diagnostics['vector_db_stats'] = {
+                'total_chunks': stats.get('total_chunks', 0),
+                'collection_name': stats.get('collection_name', 'unknown'),
+                'status': 'OK' if stats.get('total_chunks', 0) > 0 else 'EMPTY'
+            }
+        except Exception as e:
+            diagnostics['vector_db_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test document loading (without actually loading)
+        try:
+            data_path = Path(self.args.data_dir)
+            if data_path.exists():
+                supported_extensions = {'.txt', '.md', '.json', '.csv'}
+                if PDF_AVAILABLE:
+                    supported_extensions.add('.pdf')
+                if DOCX_AVAILABLE:
+                    supported_extensions.add('.docx')
+                    supported_extensions.add('.doc')
+                files = []
+                for ext in supported_extensions:
+                    files.extend(data_path.rglob(f"*{ext}"))
+                # Sample a few files to check content
+                sample_files = files[:5] if len(files) > 5 else files
+                file_samples = []
+                for file_path in sample_files:
+                    try:
+                        content = self._read_file(file_path)
+                        file_samples.append({
+                            'filename': file_path.name,
+                            'size_chars': len(content),
+                            'size_words': len(content.split()),
+                            'readable': True
+                        })
+                    except Exception as e:
+                        file_samples.append({
+                            'filename': file_path.name,
+                            'readable': False,
+                            'error': str(e)
+                        })
+                diagnostics['document_stats'] = {
+                    'total_files_found': len(files),
+                    'sample_files': file_samples,
+                    'status': 'OK'
+                }
+            else:
+                diagnostics['document_stats'] = {
+                    'status': 'ERROR',
+                    'error': f'Data directory {self.args.data_dir} does not exist'
+                }
+        except Exception as e:
+            diagnostics['document_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test chunking on a sample document
+        try:
+            if diagnostics['document_stats'].get('status') == 'OK':
+                sample_file = None
+                for file_info in diagnostics['document_stats'].get('sample_files', []):
+                    if file_info.get('readable', False):
+                        # Find the actual file
+                        data_path = Path(self.args.data_dir)
+                        for ext in ['.txt', '.md', '.pdf', '.docx']:
+                            files = list(data_path.rglob(f"*{file_info['filename']}"))
+                            if files:
+                                sample_file = files[0]
+                                break
+                        if sample_file:
+                            break
+                if sample_file:
+                    content = self._read_file(sample_file)
+                    # Create a dummy document (Document is already imported at top)
+                    sample_doc = Document(
+                        filename=sample_file.name,
+                        content=content,
+                        filepath=str(sample_file),
+                        file_type=sample_file.suffix.lower(),
+                        file_hash=""
+                    )
+                    # Test chunking
+                    sample_chunks = self._chunk_text(
+                        content,
+                        sample_file.name,
+                        self.args.chunk_size,
+                        self.args.chunk_overlap
+                    )
+                    chunk_lengths = [len(chunk.text.split()) for chunk in sample_chunks]
+                    diagnostics['chunk_stats'] = {
+                        'sample_document': sample_file.name,
+                        'total_chunks': len(sample_chunks),
+                        'avg_chunk_size_words': sum(chunk_lengths) / len(chunk_lengths) if chunk_lengths else 0,
+                        'min_chunk_size_words': min(chunk_lengths) if chunk_lengths else 0,
+                        'max_chunk_size_words': max(chunk_lengths) if chunk_lengths else 0,
+                        'chunk_size_setting': self.args.chunk_size,
+                        'chunk_overlap_setting': self.args.chunk_overlap,
+                        'status': 'OK'
+                    }
+        except Exception as e:
+            diagnostics['chunk_stats'] = {
+                'status': 'ERROR',
+                'error': str(e)
+            }
+        # Test retrieval with sample questions
+        if sample_questions and diagnostics['vector_db_stats'].get('status') == 'OK':
+            for question in sample_questions:
+                try:
+                    context_chunks = self.retrieve(question, self.args.k)
+                    sources = self._extract_sources(context_chunks)
+                    # Get similarity scores
+                    results = self.vector_retriever.search(question, self.args.k)
+                    # Get sample chunk text (first 200 chars of first chunk)
+                    sample_chunk_text = context_chunks[0].text[:200] + "..." if context_chunks else "N/A"
+                    diagnostics['retrieval_tests'].append({
+                        'question': question,
+                        'chunks_retrieved': len(context_chunks),
+                        'sources': sources,
+                        'similarity_scores': [f"{score:.3f}" for _, score in results],
+                        'sample_chunk_preview': sample_chunk_text,
+                        'status': 'OK' if context_chunks else 'NO_RESULTS'
+                    })
+                except Exception as e:
+                    diagnostics['retrieval_tests'].append({
+                        'question': question,
+                        'status': 'ERROR',
+                        'error': str(e)
+                    })
+        return diagnostics
+    def print_diagnostics(self, diagnostics: Dict[str, Any]) -> None:
+        """Print diagnostic information in a readable format"""
+        print("\n" + "="*80)
+        print("SYSTEM DIAGNOSTICS")
+        print("="*80)
+        # Vector DB Stats
+        print("\n📊 VECTOR DATABASE:")
+        vdb = diagnostics.get('vector_db_stats', {})
+        print(f"  Status: {vdb.get('status', 'UNKNOWN')}")
+        print(f"  Total chunks: {vdb.get('total_chunks', 0)}")
+        print(f"  Collection: {vdb.get('collection_name', 'unknown')}")
+        if 'error' in vdb:
+            print(f"  Error: {vdb['error']}")
+        # Document Stats
+        print("\n📄 DOCUMENT LOADING:")
+        doc_stats = diagnostics.get('document_stats', {})
+        print(f"  Status: {doc_stats.get('status', 'UNKNOWN')}")
+        print(f"  Total files found: {doc_stats.get('total_files_found', 0)}")
+        if 'sample_files' in doc_stats:
+            print(f"  Sample files:")
+            for file_info in doc_stats['sample_files']:
+                if file_info.get('readable', False):
+                    print(f"    ✓ {file_info['filename']}: {file_info.get('size_chars', 0):,} chars, {file_info.get('size_words', 0):,} words")
+                else:
+                    print(f"    ✗ {file_info['filename']}: {file_info.get('error', 'unreadable')}")
+        if 'error' in doc_stats:
+            print(f"  Error: {doc_stats['error']}")
+        # Chunk Stats
+        print("\n✂️  CHUNKING:")
+        chunk_stats = diagnostics.get('chunk_stats', {})
+        print(f"  Status: {chunk_stats.get('status', 'UNKNOWN')}")
+        if chunk_stats.get('status') == 'OK':
+            print(f"  Sample document: {chunk_stats.get('sample_document', 'N/A')}")
+            print(f"  Total chunks from sample: {chunk_stats.get('total_chunks', 0)}")
+            print(f"  Average chunk size: {chunk_stats.get('avg_chunk_size_words', 0):.1f} words")
+            print(f"  Chunk size range: {chunk_stats.get('min_chunk_size_words', 0)} - {chunk_stats.get('max_chunk_size_words', 0)} words")
+            print(f"  Settings: size={chunk_stats.get('chunk_size_setting', 0)}, overlap={chunk_stats.get('chunk_overlap_setting', 0)}")
+        if 'error' in chunk_stats:
+            print(f"  Error: {chunk_stats['error']}")
+        # Retrieval Tests
+        if diagnostics.get('retrieval_tests'):
+            print("\n🔍 RETRIEVAL TESTS:")
+            for test in diagnostics['retrieval_tests']:
+                print(f"\n  Question: {test.get('question', 'N/A')}")
+                print(f"  Status: {test.get('status', 'UNKNOWN')}")
+                if test.get('status') == 'OK':
+                    print(f"  Chunks retrieved: {test.get('chunks_retrieved', 0)}")
+                    print(f"  Sources: {test.get('sources', 'N/A')}")
+                    scores = test.get('similarity_scores', [])
+                    if scores:
+                        print(f"  Similarity scores: {', '.join(scores)}")
+                        # Warn if scores are low
+                        try:
+                            score_values = [float(s) for s in scores]
+                            if max(score_values) < 0.3:
+                                print(f"  ⚠️  WARNING: Low similarity scores - retrieved chunks may not be very relevant")
+                            elif max(score_values) < 0.5:
+                                print(f"  ⚠️  NOTE: Moderate similarity - consider increasing --k or checking chunk quality")
+                        except:
+                            pass
+                    if 'sample_chunk_preview' in test:
+                        print(f"  Sample chunk preview: {test['sample_chunk_preview']}")
+                elif 'error' in test:
+                    print(f"  Error: {test['error']}")
+        print("\n" + "="*80 + "\n")
+    def _extract_sources(self, context_chunks: List[Chunk]) -> str:
+        """Extract source document names from context chunks"""
+        sources = []
+        for chunk in context_chunks:
+            # Debug: Print chunk filename if verbose
+            if self.args.verbose:
+                logger.info(f"Chunk filename: {chunk.filename}")
+            # Extract filename from chunk attribute (not metadata)
+            source = chunk.filename if hasattr(chunk, 'filename') and chunk.filename else 'Unknown source'
+            # Clean up the source name
+            if source.endswith('.pdf'):
+                source = source[:-4]  # Remove .pdf extension
+            elif source.endswith('.txt'):
+                source = source[:-4]  # Remove .txt extension
+            elif source.endswith('.md'):
+                source = source[:-3]  # Remove .md extension
+            sources.append(source)
+        # Remove duplicates while preserving order
+        unique_sources = []
+        for source in sources:
+            if source not in unique_sources:
+                unique_sources.append(source)
+        return "; ".join(unique_sources)
+    def _categorize_question(self, question: str) -> str:
+        """Categorize a question into one of 5 categories"""
+        question_lower = question.lower()
+        # Gene-Specific Recommendations
+        if any(gene in question_lower for gene in ['msh2', 'mlh1', 'msh6', 'pms2', 'epcam', 'brca1', 'brca2']):
+            if any(kw in question_lower for kw in ['screening', 'surveillance', 'prevention', 'recommendation', 'risk', 'cancer risk', 'steps', 'management']):
+                return "Gene-Specific Recommendations"
+        # Inheritance Patterns
+        if any(kw in question_lower for kw in ['inherit', 'inherited', 'pass', 'skip a generation', 'generation', 'can i pass']):
+            return "Inheritance Patterns"
+        # Family Risk Assessment
+        if any(kw in question_lower for kw in ['family member', 'relative', 'first-degree', 'family risk', 'which relative', 'should my family']):
+            return "Family Risk Assessment"
+        # Genetic Variant Interpretation
+        if any(kw in question_lower for kw in ['what does', 'genetic variant mean', 'variant mean', 'mutation mean', 'genetic result']):
+            return "Genetic Variant Interpretation"
+        # Support and Resources
+        if any(kw in question_lower for kw in ['cope', 'overwhelmed', 'resource', 'genetic counselor', 'support', 'research', 'help', 'insurance', 'gina']):
+            return "Support and Resources"
+        # Default to Genetic Variant Interpretation if unclear
+        return "Genetic Variant Interpretation"
+    def enhance_readability(self, answer: str, target_level: str = "middle_school") -> Tuple[str, float]:
+        """Enhance answer readability to different levels and calculate Flesch-Kincaid Grade Level
+        Args:
+            answer: The original answer to simplify or enhance
+            target_level: One of "middle_school", "high_school", "college", or "doctoral"
+        Returns:
+            Tuple of (enhanced_answer, grade_level)
+        """
+        try:
+            # Define prompts for different reading levels
+            if target_level == "middle_school":
+                level_description = "middle school reading level (ages 12-14, 6th-8th grade)"
+                instructions = """
+- Use simpler medical terms or explain them
+- Medium-length sentences
+- Clear, structured explanations
+- Keep important medical information accessible"""
+            elif target_level == "high_school":
+                level_description = "high school reading level (ages 15-18, 9th-12th grade)"
+                instructions = """
+- Use appropriate medical terminology with context
+- Varied sentence length
+- Comprehensive yet accessible explanations
+- Maintain technical accuracy while ensuring clarity"""
+            elif target_level == "college":
+                level_description = "college reading level (undergraduate level, ages 18-22)"
+                instructions = """
+- Use standard medical terminology with brief explanations
+- Professional and clear writing style
+- Include relevant clinical context
+- Maintain scientific accuracy and precision
+- Appropriate for undergraduate students in health sciences"""
+            elif target_level == "doctoral":
+                level_description = "doctoral/professional reading level (graduate level, medical professionals)"
+                instructions = """
+- Use advanced medical and scientific terminology
+- Include detailed clinical and research context
+- Reference specific mechanisms, pathways, and evidence
+- Provide comprehensive technical explanations
+- Appropriate for medical professionals, researchers, and graduate students
+- Include nuanced discussions of clinical implications and research findings"""
+            else:
+                raise ValueError(f"Unknown target_level: {target_level}. Must be one of: middle_school, high_school, college, doctoral")
+            # Create a prompt to enhance the medical answer for the target level
+            # Try to use chat template if available, otherwise use manual format
+            system_message = f"""You are a helpful medical assistant who specializes in explaining complex medical information at appropriate reading levels. Rewrite the following medical answer for {level_description}:
+{instructions}
+- Keep the same important information but adapt the complexity
+- Provide context for technical terms
+- Ensure the answer is informative yet understandable"""
+            user_message = f"Please rewrite this medical answer for {level_description}:\n\n{answer}"
+            # Try to use chat template if available
+            if hasattr(self.tokenizer, 'apply_chat_template') and self.tokenizer.chat_template is not None:
+                try:
+                    messages = [
+                        {"role": "system", "content": system_message},
+                        {"role": "user", "content": user_message}
+                    ]
+                    readability_prompt = self.tokenizer.apply_chat_template(
+                        messages,
+                        tokenize=False,
+                        add_generation_prompt=True
+                    )
+                except Exception as e:
+                    logger.warning(f"Failed to use chat template for readability, falling back to manual format: {e}")
+                    readability_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_message}
+<|eot_id|><|start_header_id|>user<|end_header_id|>
+{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+            else:
+                # Fall back to manual formatting (for Llama models)
+                readability_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+{system_message}
+<|eot_id|><|start_header_id|>user<|end_header_id|>
+{user_message}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+"""
+            # Generate simplified answer
+            inputs = self.tokenizer(readability_prompt, return_tensors="pt", truncation=True, max_length=2048)
+            if self.device == "mps":
+                inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            # Adjust generation parameters based on target level
+            if target_level in ["college", "doctoral"]:
+                max_tokens = 512  # Reduced from 1024 for faster responses
+                temp = 0.4  # Slightly higher temperature for more natural flow
+            else:
+                max_tokens = 384  # Reduced from 512 for faster responses
+                temp = 0.3  # Lower temperature for more consistent simplification
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_new_tokens=max_tokens,
+                    temperature=temp,
+                    top_p=0.9,
+                    repetition_penalty=1.05,
+                    do_sample=True,
+                    pad_token_id=self.tokenizer.eos_token_id,
+                    eos_token_id=self.tokenizer.eos_token_id,
+                    use_cache=True,
+                    num_beams=1
+                )
+            # Decode response
+            response = self.tokenizer.decode(outputs[0], skip_special_tokens=False)
+            # Extract enhanced answer
+            # Try to find the assistant response marker
+            prompt_end_marker = "<|start_header_id|>assistant<|end_header_id|>\n\n"
+            if prompt_end_marker in response:
+                simplified_answer = response.split(prompt_end_marker)[-1].strip()
+            elif "<|assistant|>" in response:
+                # Some chat templates use <|assistant|>
+                simplified_answer = response.split("<|assistant|>")[-1].strip()
+            else:
+                # Fallback: extract everything after the prompt
+                simplified_answer = response[len(readability_prompt):].strip()
+            # Clean up special tokens
+            if "<|eot_id|>" in simplified_answer:
+                if simplified_answer.endswith("<|eot_id|>"):
+                    simplified_answer = simplified_answer[:-len("<|eot_id|>")].strip()
+            if "<|end_of_text|>" in simplified_answer:
+                if simplified_answer.endswith("<|end_of_text|>"):
+                    simplified_answer = simplified_answer[:-len("<|end_of_text|>")].strip()
+            # Clean up unwanted phrases and formatting
+            simplified_answer = self._clean_readability_answer(simplified_answer, target_level)
+            # Calculate Flesch-Kincaid Grade Level
+            try:
+                grade_level = textstat.flesch_kincaid_grade(simplified_answer)
+            except:
+                grade_level = 0.0
+            if self.args.verbose:
+                logger.info(f"Simplified answer length: {len(simplified_answer)} characters")
+                logger.info(f"Flesch-Kincaid Grade Level: {grade_level:.1f}")
+            return simplified_answer, grade_level
+        except Exception as e:
+            logger.error(f"Error enhancing readability: {e}")
+            # Fallback: return original answer with estimated grade level
+            try:
+                grade_level = textstat.flesch_kincaid_grade(answer)
+            except:
+                grade_level = 12.0  # Default to high school level
+            return answer, grade_level
+    def write_csv(self, qa_pairs: List[Tuple[str, str, str, str, float, str, float, str, float, str, str]], output_path: str, append: bool = False, improved_prompt_text: str = "") -> None:
+        """Write Q&A pairs to CSV file in results folder
+        Expected tuple format: (question, answer, sources, question_group, original_flesch,
+                               middle_school_answer, middle_school_flesch,
+                               high_school_answer, high_school_flesch, improved_answer, similarity_scores)
+        """
+        # Ensure results directory exists
+        os.makedirs('results', exist_ok=True)
+        # If output_path doesn't already have results/ prefix, add it
+        if not output_path.startswith('results/'):
+            output_path = f'results/{output_path}'
+        if append:
+            logger.info(f"Appending results to {output_path}")
+        else:
+            logger.info(f"Writing results to {output_path}")
+        # Create output directory if needed
+        output_path = Path(output_path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        try:
+            # Check if file exists and if we're appending
+            file_exists = output_path.exists()
+            write_mode = 'a' if append and file_exists else 'w'
+            with open(output_path, write_mode, newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                # Write header only if creating new file or first append
+                if not append or not file_exists:
+                    # Create improved answer header with prompt text
+                    improved_header = f'improved_answer (PROMPT: {improved_prompt_text})'
+                    writer.writerow(['question', 'question_group', 'answer', 'original_flesch', 'sources',
+                                   'similarity_scores', 'middle_school_answer', 'middle_school_flesch',
+                                   'high_school_answer', 'high_school_flesch', improved_header])
+                for data in qa_pairs:
+                    # Unpack the data tuple
+                    (question, answer, sources, question_group, original_flesch,
+                     middle_school_answer, middle_school_flesch,
+                     high_school_answer, high_school_flesch, improved_answer, similarity_scores) = data
+                    # Clean and escape the answers for CSV
+                    def clean_text(text):
+                        # Replace newlines with spaces and clean up formatting
+                        cleaned = text.replace('\n', ' ').replace('\r', ' ')
+                        # Remove extra whitespace but preserve the full content
+                        cleaned = ' '.join(cleaned.split())
+                        # Escape quotes properly for CSV
+                        cleaned = cleaned.replace('"', '""')
+                        return cleaned
+                    clean_question = clean_text(question)
+                    clean_answer = clean_text(answer)
+                    clean_sources = clean_text(sources)
+                    clean_middle_school = clean_text(middle_school_answer)
+                    clean_high_school = clean_text(high_school_answer)
+                    clean_improved = clean_text(improved_answer)
+                    # Log the full answer length for debugging
+                    if self.args.verbose:
+                        logger.info(f"Writing answer length: {len(clean_answer)} characters")
+                        logger.info(f"Middle school answer length: {len(clean_middle_school)} characters")
+                        logger.info(f"High school answer length: {len(clean_high_school)} characters")
+                        logger.info(f"Improved answer length: {len(clean_improved)} characters")
+                        logger.info(f"Question group: {question_group}")
+                    # Use proper CSV quoting - let csv.writer handle the quoting
+                    writer.writerow([
+                        clean_question,
+                        question_group,
+                        clean_answer,
+                        f"{original_flesch:.1f}",
+                        clean_sources,
+                        similarity_scores,  # Similarity scores (comma-separated)
+                        clean_middle_school,
+                        f"{middle_school_flesch:.1f}",
+                        clean_high_school,
+                        f"{high_school_flesch:.1f}",
+                        clean_improved
+                    ])
+            if append:
+                logger.info(f"Appended {len(qa_pairs)} Q&A pairs to {output_path}")
+            else:
+                logger.info(f"Successfully wrote {len(qa_pairs)} Q&A pairs to {output_path}")
+        except Exception as e:
+            logger.error(f"Failed to write CSV: {e}")
+            sys.exit(4)
+def parse_args():
+    """Parse command line arguments"""
+    parser = argparse.ArgumentParser(description="RAG Chatbot for CGT-LLM-Beta with Vector Database")
+    # File paths
+    parser.add_argument('--data-dir', default='./Data Resources',
+                       help='Directory containing documents to index')
+    parser.add_argument('--questions', default='./questions.txt',
+                       help='File containing questions (one per line)')
+    parser.add_argument('--out', default='./answers.csv',
+                       help='Output CSV file for answers')
+    parser.add_argument('--vector-db-dir', default='./chroma_db',
+                       help='Directory for ChromaDB persistence')
+    # Retrieval parameters
+    parser.add_argument('--k', type=int, default=5,
+                       help='Number of chunks to retrieve per question')
+    # Chunking parameters
+    parser.add_argument('--chunk-size', type=int, default=500,
+                       help='Size of text chunks in tokens')
+    parser.add_argument('--chunk-overlap', type=int, default=200,
+                       help='Overlap between chunks in tokens')
+    # Model selection
+    parser.add_argument('--model', type=str, default='meta-llama/Llama-3.2-3B-Instruct',
+                       help='HuggingFace model name to use (e.g., meta-llama/Llama-3.2-3B-Instruct, mistralai/Mistral-7B-Instruct-v0.2)')
+    # Generation parameters
+    parser.add_argument('--max-new-tokens', type=int, default=1024,
+                       help='Maximum new tokens to generate')
+    parser.add_argument('--temperature', type=float, default=0.2,
+                       help='Generation temperature')
+    parser.add_argument('--top-p', type=float, default=0.9,
+                       help='Top-p sampling parameter')
+    parser.add_argument('--repetition-penalty', type=float, default=1.1,
+                       help='Repetition penalty')
+    # Database options
+    parser.add_argument('--force-rebuild', action='store_true',
+                       help='Force rebuild of vector database')
+    parser.add_argument('--skip-indexing', action='store_true',
+                       help='Skip document indexing, use existing database')
+    # Other options
+    parser.add_argument('--seed', type=int, default=42,
+                       help='Random seed for reproducibility')
+    parser.add_argument('--verbose', action='store_true',
+                       help='Enable verbose logging')
+    parser.add_argument('--dry-run', action='store_true',
+                       help='Build index and test retrieval without generation')
+    parser.add_argument('--diagnose', action='store_true',
+                       help='Run system diagnostics and exit')
+    return parser.parse_args()
+def main():
+    """Main function"""
+    args = parse_args()
+    # Set random seed
+    torch.manual_seed(args.seed)
+    np.random.seed(args.seed)
+    # Set logging level
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+    logger.info("Starting RAG Chatbot with Vector Database")
+    logger.info(f"Arguments: {vars(args)}")
+    try:
+        # Initialize bot
+        bot = RAGBot(args)
+        # Check if we should skip indexing
+        if not args.skip_indexing:
+            # Load and process documents
+            documents = bot.load_corpus(args.data_dir)
+            if not documents:
+                logger.error("No documents found to process")
+                sys.exit(3)
+            # Chunk documents
+            chunks = bot.chunk_documents(documents, args.chunk_size, args.chunk_overlap)
+            if not chunks:
+                logger.error("No chunks created from documents")
+                sys.exit(3)
+            # Build or update index
+            bot.build_or_update_index(chunks, args.force_rebuild)
+        else:
+            logger.info("Skipping document indexing, using existing vector database")
+        # Run diagnostics if requested
+        if args.diagnose:
+            sample_questions = [
+                "What is Lynch Syndrome?",
+                "What does a BRCA1 genetic variant mean?",
+                "What screening tests are recommended for MSH2 carriers?"
+            ]
+            diagnostics = bot.diagnose_system(sample_questions=sample_questions)
+            bot.print_diagnostics(diagnostics)
+            return
+        if args.dry_run:
+            logger.info("Dry run completed successfully")
+            return
+        # Process questions
+        generation_kwargs = {
+            'max_new_tokens': args.max_new_tokens,
+            'temperature': args.temperature,
+            'top_p': args.top_p,
+            'repetition_penalty': args.repetition_penalty
+        }
+        qa_pairs = bot.process_questions(args.questions, output_file=args.out, **generation_kwargs)
+        logger.info("RAG Chatbot completed successfully")
+    except KeyboardInterrupt:
+        logger.info("Interrupted by user")
+        sys.exit(0)
+    except Exception as e:
+        logger.error(f"Unexpected error: {e}")
+        if args.verbose:
+            import traceback
+            traceback.print_exc()
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/data_level0.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80fe29380be0f587de8c3d0df3bbd891219ebe35d3ab4e007721d322ca704b9f
+size 18888520

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/header.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56091853c1c20a1ec97ba4a7935cb7ab95f58b91d1ca56b990bf768f7bd2df88
+size 100

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/index_metadata.pickle ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:754f12ddf66368443039e44c7d3625dbfa54c42604f231054e5c8ab8df162ebb
+size 548379

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/length.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e72c9f5fb80c8fa3f488f68172cf32cdaf226d94cb6cff09ff68990b34fbb04c
+size 45080

chroma_db/7eddb202-b9b0-46c1-ae4b-37838cdc5aac/link_lists.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a0046b8333ff42649a27896a5da1f0fd89ee54954221fde9172dfe284d94262b
+size 99820

chroma_db/chroma.sqlite3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70340ab0d0dddb6b5bcf29c0e09f316b0f695f6645be0231302346d5af463700
+size 294584320

requirements.txt ADDED Viewed

	@@ -0,0 +1,56 @@

+# =============================================================================
+# RAG Chatbot with Vector Database - Requirements
+# =============================================================================
+# Production-ready dependencies for medical document analysis and Q&A
+# Core ML/AI Framework
+torch>=2.0.0                    # PyTorch for model inference
+transformers>=4.30.0            # Hugging Face transformers
+huggingface_hub>=0.23.0        # Hugging Face Hub API (for Inference API)
+accelerate>=0.20.0              # Model loading optimization
+safetensors>=0.3.0              # Safe model loading
+# Vector Database & Embeddings
+chromadb>=0.4.0                 # Vector database for fast retrieval
+sentence-transformers>=2.2.0   # Semantic embeddings (all-MiniLM-L6-v2)
+# Data Processing
+pandas>=1.3.0                   # Data manipulation and CSV handling
+numpy>=1.21.0                   # Numerical computing
+scikit-learn>=1.0.0             # ML utilities and TF-IDF
+# Text Analysis & Readability
+textstat>=0.7.0                 # Flesch-Kincaid Grade Level calculation
+nltk>=3.8.0                     # Natural language processing utilities
+# Document Processing (Core)
+pypdf>=3.0.0                    # PDF document parsing
+python-docx>=0.8.11             # DOCX document parsing
+# Optional Document Processing
+rank-bm25>=0.2.2                # BM25 retrieval algorithm (alternative to TF-IDF)
+# Utilities & Progress
+tqdm>=4.65.0                    # Progress bars
+pathlib2>=2.3.0                 # Enhanced path handling (if needed)
+# Web Interface
+gradio>=4.44.1                  # Gradio web interface for chatbot (updated for Spaces compatibility)
+# Development & Testing (Optional)
+pytest>=7.0.0                   # Testing framework
+black>=22.0.0                   # Code formatting
+flake8>=4.0.0                   # Code linting
+# Performance Monitoring (Optional)
+psutil>=5.8.0                   # System resource monitoring
+memory-profiler>=0.60.0         # Memory usage profiling
+# =============================================================================
+# Installation Notes:
+# =============================================================================
+# 1. Install with: pip install -r requirements.txt
+# 2. For Apple Silicon: PyTorch will automatically use MPS acceleration
+# 3. Optional packages can be installed separately if needed
+# 4. Model files (~6GB) will be downloaded on first run
+# =============================================================================