Spaces:
Runtime error
Runtime error
| import generationhelper | |
| import json | |
| def evaluate_response_with_prompt(templete, query, documents, answer, eval_model="llama-3.3-70b-specdec"): | |
| formatted_documents = "" | |
| for doc_idx, doc_text in enumerate(documents["document"]): | |
| if isinstance(doc_text, list): | |
| doc_text = " ".join(doc_text) # Convert list to a single string | |
| sentences = doc_text.split('. ') | |
| formatted_documents += "\n".join([f"{doc_idx}{chr(97 + i)}. {sent}" for i, sent in enumerate(sentences)]) + "\n" | |
| # Format response with unique keys (a, b, c) | |
| formatted_answer = "\n".join([f"{chr(97 + i)}. {sent}" for i, sent in enumerate(answer.split('. '))]) | |
| prompt = templete.format(documents=formatted_documents, question=query, answer=formatted_answer) | |
| # Call the LLM API (Llama 3.3-70B) | |
| completion = generationhelper.groq_client.chat.completions.create( | |
| model=eval_model, | |
| messages=[{"role": "user", "content": prompt}], | |
| temperature=0.7, | |
| max_tokens=2048, | |
| top_p=1 | |
| ) | |
| print("\nGenerated Response:\n", completion) | |
| return completion | |
| def FormatAndScores(query, documents, answer, eval_model): | |
| templete= get_templet_to_calculatescores() | |
| completion_results = evaluate_response_with_prompt(templete, query,documents, answer, eval_model) | |
| print(completion_results) | |
| completion_results_response = completion_results.choices[0].message.content | |
| completion_results_response = completion_results_response.strip().strip('```') | |
| print(completion_results_response) | |
| # Check if response_content is empty | |
| if not completion_results_response.strip(): | |
| raise ValueError("Empty response content") | |
| # Decode if it's a byte string | |
| if isinstance(completion_results_response, bytes): | |
| completion_results_response = completion_results_response.decode('utf-8') | |
| # Try to parse JSON | |
| try: | |
| data_json = json.loads(completion_results_response) | |
| print("JSON parsed successfully:") | |
| print(data_json) | |
| except json.JSONDecodeError as e: | |
| print(f"Failed to parse JSON: {e}") | |
| print(f"Response content: {completion_results_response}") | |
| relavance_explanation = data_json['relevance_explanation'] | |
| relevant_sentence_keys = data_json['all_relevant_sentence_keys'] | |
| overall_supported_explanation = data_json['overall_supported_explanation'] | |
| overall_supported = data_json['overall_supported'] | |
| sentence_support_information = data_json['sentence_support_information'] | |
| all_utilized_sentence_keys = data_json['all_utilized_sentence_keys'] | |
| support_keys = [] | |
| support_level = [] | |
| for sentence_support in sentence_support_information: | |
| support_keys += sentence_support['supporting_sentence_keys'] | |
| support_level.append(sentence_support['fully_supported']) | |
| print(relavance_explanation) | |
| print(relevant_sentence_keys) | |
| print(overall_supported_explanation) | |
| print(overall_supported) | |
| print(sentence_support_information) | |
| print(all_utilized_sentence_keys) | |
| return completion_results_response,relevant_sentence_keys,all_utilized_sentence_keys,support_keys,support_level | |
| def get_templet_to_calculatescores(): | |
| return """ | |
| You asked someone to answer a question based on one or more documents. | |
| Your task is to review their response and assess whether or not each sentence | |
| in that response is supported by text in the documents. And if so, which | |
| sentences in the documents provide that support. You will also tell me which | |
| of the documents contain useful information for answering the question, and | |
| which of the documents the answer was sourced from. | |
| Here are the documents, each of which is split into sentences. Alongside each | |
| sentence is an associated key, such as '0a.' or '0b.' that you can use to refer | |
| to it: | |
| βββ | |
| {documents} | |
| βββ | |
| The question was: | |
| βββ | |
| {question} | |
| βββ | |
| Here is their response, split into sentences. Alongside each sentence is | |
| an associated key, such as 'a.' or 'b.' that you can use to refer to it. Note | |
| that these keys are unique to the response, and are not related to the keys | |
| in the documents: | |
| βββ | |
| {answer} | |
| βββ | |
| You must respond with a JSON object matching this schema: | |
| βββ | |
| {{ | |
| "relevance_explanation": string, | |
| "all_relevant_sentence_keys": [string], | |
| "overall_supported_explanation": string, | |
| "overall_supported": boolean, | |
| "sentence_support_information": [ | |
| {{ | |
| "response_sentence_key": string, | |
| "explanation": string, | |
| "supporting_sentence_keys": [string], | |
| "fully_supported": boolean | |
| }}, | |
| ], | |
| "all_utilized_sentence_keys": [string] | |
| }} | |
| βββ | |
| The relevance_explanation field is a string explaining which documents | |
| contain useful information for answering the question. Provide a step-by-step | |
| breakdown of information provided in the documents and how it is useful for | |
| answering the question. | |
| The all_relevant_sentence_keys field is a list of all document sentences keys | |
| (e.g. '0a') that are relevant to the question. Include every sentence that is | |
| useful and relevant to the question, even if it was not used in the response, | |
| or if only parts of the sentence are useful. Ignore the provided response when | |
| making this judgment and base your judgment solely on the provided documents | |
| and question. Omit sentences that, if removed from the document, would not | |
| impact someone's ability to answer the question. | |
| The overall_supported_explanation field is a string explaining why the response | |
| *as a whole* is or is not supported by the documents. In this field, provide a | |
| step-by-step breakdown of the claims made in the response and the support (or | |
| lack thereof) for those claims in the documents. Begin by assessing each claim | |
| separately, one by one; don't make any remarks about the response as a whole | |
| until you have assessed all the claims in isolation. | |
| The overall_supported field is a boolean indicating whether the response as a | |
| whole is supported by the documents. This value should reflect the conclusion | |
| you drew at the end of your step-by-step breakdown in overall_supported_explanation. | |
| In the sentence_support_information field, provide information about the support | |
| *for each sentence* in the response. | |
| The sentence_support_information field is a list of objects, one for each sentence | |
| in the response. Each object MUST have the following fields: | |
| - response_sentence_key: a string identifying the sentence in the response. | |
| This key is the same as the one used in the response above. | |
| - explanation: a string explaining why the sentence is or is not supported by the | |
| documents. | |
| - supporting_sentence_keys: keys (e.g. '0a') of sentences from the documents that | |
| support the response sentence. If the sentence is not supported, this list MUST | |
| be empty. If the sentence is supported, this list MUST contain one or more keys. | |
| In special cases where the sentence is supported, but not by any specific sentence, | |
| you can use the string "supported_without_sentence" to indicate that the sentence | |
| is generally supported by the documents. Consider cases where the sentence is | |
| expressing inability to answer the question due to lack of relevant information in | |
| the provided context as "supported_without_sentence". In cases where the | |
| sentence is making a general statement (e.g. outlining the steps to produce an answer, or | |
| summarizing previously stated sentences, or a transition sentence), use the | |
| string "general". In cases where the sentence is correctly stating a well-known fact, | |
| like a mathematical formula, use the string "well_known_fact". In cases where the | |
| sentence is performing numerical reasoning (e.g. addition, multiplication), use the | |
| string "numerical_reasoning". | |
| - fully_supported: a boolean indicating whether the sentence is fully supported by | |
| the documents. | |
| - This value should reflect the conclusion you drew at the end of your step-by-step | |
| breakdown in explanation. | |
| - If supporting_sentence_keys is an empty list, then fully_supported must be false. | |
| - Otherwise, use fully_supported to clarify whether everything in the response | |
| sentence is fully supported by the document text indicated in supporting_sentence_keys | |
| (fully_supported = true), or whether the sentence is only partially or incompletely | |
| supported by that document text (fully_supported = false). | |
| The all_utilized_sentence_keys field is a list of all sentences keys (e.g. '0a') that | |
| were used to construct the answer. Include every sentence that either directly supported | |
| the answer, or was implicitly used to construct the answer, even if it was not used | |
| in its entirety. Omit sentences that were not used and could have been removed from | |
| the documents without affecting the answer. | |
| You must respond with a valid JSON string. Use escapes for quotes, e.g. '\\\\"', and | |
| newlines, e.g. '\\\\n'. Do not write anything before or after the JSON string. Do not | |
| wrap the JSON string in backticks like '\\`' or '\\`json. | |
| As a reminder: your task is to review the response and assess which documents contain | |
| useful information pertaining to the question, and how each sentence in the response | |
| is supported by the text in the documents. | |
| """ |