Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.52.2
[EN] Guide for Input .jsonl Files
If you have five models to compare, upload five .jsonl files.
- ๐ฅAll
.jsonlfiles must have the same number of rows. - ๐ฅThe
model_idfield must be different for each file and unique within each file. - ๐ฅEach
.jsonlfile should have differentgenerated,model_idfrom the other files.instruction,taskshould be the same.
Required .jsonl Fields
- Reserved Fields (Mandatory)
model_id: The name of the model being evaluated. (Recommended to be short)instruction: The instruction given to the model. This corresponds to the test set prompt (not the evaluation prompt).generated: Enter the response generated by the model for the test set instruction.task: Used to group and display overall results as a subset. Can be utilized when you want to use different evaluation prompts per row.
- Additional
- Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your
.jsonlfiles, avoiding the keywords mentioned above.- Example: For
translation_pair.yamlandtranslation_fortunecookie.yamlprompts, thesource_langandtarget_langfields are read from the.jsonland utilized.
- Example: For
- Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your
For example, when evaluating with the translation_pair prompt, each .jsonl file looks like this:
# model1.jsonl
{"model_id": "๋ชจ๋ธ1", "task": "์ํ", "instruction": "์ด๋๋ก ๊ฐ์ผํ์ค", "generated": "Where should I go", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ1", "task": "ํ์", "instruction": "1+1?", "generated": "1+1?", "source_lang": "English", "target_lang": "Korean"}
# model2.jsonl -* model1.jsonl๊ณผ `instruction`์ ๊ฐ๊ณ `generated`, `model_id` ๋ ๋ค๋ฆ
๋๋ค!
{"model_id": "๋ชจ๋ธ2", "task": "์ํ", "instruction": "์ด๋๋ก ๊ฐ์ผํ์ค", "generated": "๊ธ์๋ค", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ2", "task": "ํ์", "instruction": "1+1?", "generated": "2", "source_lang": "English", "target_lang": "Korean"}
...
..
On the other hand, when evaluating with the llmbar prompt, fields like source_lang and target_lang are not used, similar to translation evaluation, and naturally, you don't need to add them to your .jsonl.