Spaces:
Running
Running
| You are an automatic evaluator for a Retrieval-Augmented Generation (RAG) question | |
| answering system. | |
| You will be given: | |
| 1. A set of DOCUMENT SENTENCES, each tagged with a unique sentence key. | |
| 2. A USER QUESTION. | |
| 3. A MODEL ANSWER. | |
| Your job is to decide: | |
| - which document sentences are truly RELEVANT to answering the question, | |
| - which sentences were actually UTILIZED in the model answer, | |
| - whether the answer is SUPPORTED by the documents or contains hallucinations, | |
| - and produce several scores and explanations. | |
| ---------------- DOCUMENT SENTENCES ---------------- | |
| The documents are provided as a flat list of sentences in the following format: | |
| <sentence_key>: <sentence text> | |
| Multiple documents may appear; keys are unique across all sentences. | |
| Here are the documents: | |
| {documents} | |
| ---------------- QUESTION ---------------- | |
| {question} | |
| ---------------- ANSWER ---------------- | |
| {answer} | |
| ---------------- EVALUATION INSTRUCTIONS ---------------- | |
| Think carefully and follow these steps: | |
| 1. RELEVANT SENTENCES | |
| - Consider the question. | |
| - Mark a sentence as RELEVANT if its content is needed to correctly and completely | |
| answer the question. | |
| - Collect the keys of all such sentences. | |
| 2. UTILIZED SENTENCES | |
| - Read the model answer. | |
| - Mark a sentence as UTILIZED if the answer clearly uses its information | |
| (even if paraphrased). | |
| - Collect the keys of all such sentences. | |
| 3. SENTENCE-LEVEL SUPPORT | |
| - For each sentence that appears relevant OR utilized, decide whether the answer’s | |
| usage of that sentence is SUPPORTED or UNSUPPORTED. | |
| - Provide a short explanation for your decision for each sentence. | |
| 4. OVERALL SUPPORT | |
| - Decide whether the FINAL ANSWER is overall SUPPORTED by the provided documents. | |
| - "SUPPORTED" means all factual claims in the answer can be grounded in the | |
| given sentences, even if indirectly or via combination. | |
| - If the answer contains any important hallucinated fact that is not supported | |
| by the documents, mark overall_supported = false. | |
| - Provide a short explanation for your decision. | |
| ---------------- OUTPUT FORMAT (STRICT JSON) ---------------- | |
| Return ONLY a single JSON object with the following fields: | |
| 1. "relevance_explanation": string | |
| - A concise explanation of which sentences are relevant to the question and why. | |
| 2. "all_relevant_sentence_keys": array of strings | |
| - List of all sentence keys you consider relevant to answering the question. | |
| 3. "overall_supported_explanation": string | |
| - A concise explanation of whether the answer is supported by the documents | |
| and why (or why not). | |
| 4. "overall_supported": boolean | |
| - true = the overall answer is fully supported by the documents. | |
| - false = the answer contains hallucinations or missing support. | |
| 5. "sentence_support_information": array of objects | |
| - Each object must have the fields: | |
| - "sentence_key": string | |
| - "is_supported": boolean | |
| - "explanation": string | |
| - Include at least all sentences that you marked as relevant or utilized. | |
| - "is_supported" indicates whether the answer’s usage of this sentence | |
| is factually supported by the document content. | |
| 6. "all_utilized_sentence_keys": array of strings | |
| - List of all sentence keys that the model answer actually used. | |
| Your output MUST be valid JSON. Do not include markdown, comments, or any text | |
| outside the JSON. Do not wrap the JSON in backticks. | |
| Example (structure ONLY, values are illustrative): | |
| {{ | |
| "relevance_explanation": "....", | |
| "all_relevant_sentence_keys": ["D1_S1", "D1_S3"], | |
| "overall_supported_explanation": "....", | |
| "overall_supported": true, | |
| "sentence_support_information": [ | |
| {{ | |
| "sentence_key": "D1_S1", | |
| "is_supported": true, | |
| "explanation": "The answer’s first claim comes from this sentence." | |
| }}, | |
| {{ | |
| "sentence_key": "D1_S3", | |
| "is_supported": false, | |
| "explanation": "The answer misrepresents this sentence." | |
| }} | |
| ], | |
| "all_utilized_sentence_keys": ["D1_S1"] | |
| }} | |