Comparing the performance of several models with several prompts based on their responses.
touch .env
Add the following creds inside .env file.
OPENAI_API_KEY=
Install dependencies:
pip install -r requirements.txt
Every prompt should mention only this response format explicitly
{
"articles": [
{"article_ref": "CO ART. 337"},
{"article_ref": "OR ART. 12a Abs. 2"}
]
}
For mean performances: (output will be saved as Output_Performance.xlsx)
python evaluatePR.py START_ROW END_ROW
Example Usage:
python evaluatePR.py 1 3