azure-samples / ai-rag-chat-evaluator Goto Github PK

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI

License: MIT License

Bicep 9.36% Python 81.06% Dockerfile 0.56% HTML 9.03%

ai-rag-chat-evaluator's Introduction

Evaluating a RAG Chat App

This repo contains scripts and tools for evaluating a chat app that uses the RAG architecture. There are many parameters that affect the quality and style of answers generated by the chat app, such as the system prompt, search parameters, and GPT model parameters.

Whenever you are making changes to a RAG chat with the goal of improving the answers, you should evaluate the results. This repository offers tools to make it easier to run evaluations, plus examples of evaluations that we've run on our sample chat app.

📺 Watch a video overview of this repo

Table of contents:

Setting up this project
Deploying a GPT-4 model
Generating ground truth data
Running an evaluation
Viewing the results
Measuring app's ability to say "I don't know"

Setting up this project

If you open this project in a Dev Container or GitHub Codespaces, it will automatically set up the environment for you. If not, then follow these steps:

Install Python 3.10 or higher
Create a Python virtual environment.
Inside that virtual environment, install the requirements:
```
python -m pip install -r requirements.txt
```

Deploying a GPT-4 model

It's best to use a GPT-4 model for performing the evaluation, even if your chat app uses GPT-3.5 or another model. You can either use an Azure OpenAI instance or an openai.com instance.

Using a new Azure OpenAI instance

To use a new Azure OpenAI instance, you'll need to create a new instance and deploy the app to it. We've made that easy to deploy with the azd CLI tool.

Install the Azure Developer CLI
Run azd auth login to log in to your Azure account
Run azd up to deploy a new GPT-4 instance
Create a .env file based on the provisioned resources by running one of the following commands.

Bash:
```
azd env get-values > .env
```
PowerShell:
```
$output = azd env get-values; Add-Content -Path .env -Value $output;
```

Using an existing Azure OpenAI instance

If you already have an Azure OpenAI instance, you can use that instead of creating a new one.

Create .env file by copying .env.sample

Fill in the values for your instance:

AZURE_OPENAI_EVAL_DEPLOYMENT="<deployment-name>"
AZURE_OPENAI_SERVICE="<service-name>"

The scripts default to keyless access (via AzureDefaultCredential), but you can optionally use a key by setting AZURE_OPENAI_KEY in .env.

Using an openai.com instance

If you have an openai.com instance, you can use that instead of an Azure OpenAI instance.

Create .env file by copying .env.sample
Change OPENAI_HOST to "openai" and fill in the key for for your OpenAI account. If you do not have an organization, you can leave that blank.
```
OPENAI_HOST="openai"
OPENAICOM_KEY=""
OPENAICOM_ORGANIZATION=""
```

Generating ground truth data

In order to evaluate new answers, they must be compared to "ground truth" answers: the ideal answer for a particular question. See example_input/qa.jsonl for an example of the format. We recommend at least 200 QA pairs if possible.

There are a few ways to get this data:

Manually curate a set of questions and answers that you consider to be ideal. This is the most accurate, but also the most time-consuming. Make sure your answers include citations in the expected format. This approach requires domain expertise in the data.
Use the generator script to generate a set of questions and answers. This is the fastest, but may also be the least accurate. See below for details on how to run the generator script.
Use the generator script to generate a set of questions and answers, and then manually curate them, rewriting any answers that are subpar and adding missing citations. This is a good middle ground, and is what we recommend.

Additional tips for ground truth data generation

Generate more QA pairs than you need, then prune them down manually based on quality and overlap. Remove low quality answers, and remove questions that are too similar to other questions.
Be aware of the knowledge distribution in the document set, so you effectively sample questions across the knowledge space.
Once your chat application is live, continually sample live user questions (within accordance to your privacy policy) to make sure you're representing the sorts of questions that users are asking.

Running the generator script

This repo includes a script for generating questions and answers from documents stored in Azure AI Search.

Important

The generator script can only generate English Q/A pairs right now, due to limitations in the azure-ai-generative SDK.

Create .env file by copying .env.sample
Fill in the values for your Azure AI Search instance:
```
AZURE_SEARCH_SERVICE="<service-name>"
AZURE_SEARCH_INDEX="<index-name>"
AZURE_SEARCH_KEY=""
```
The key may not be necessary if it's configured for keyless access from your account. If providing a key, it's best to provide a query key since the script only requires that level of access.
Run the generator script:
```
python -m scripts generate --output=example_input/qa.jsonl --numquestions=200 --persource=5
```
That script will generate 200 questions and answers, and store them in example_input/qa.jsonl. We've already provided an example based off the sample documents for this app.

To further customize the generator beyond the numquestions and persource parameters, modify scripts/generate.py.

Running an evaluation

We provide a script that loads in the current azd environment's variables, installs the requirements for the evaluation, and runs the evaluation against the local app. Run it like this:

python -m scripts evaluate --config=example_config.json

The config.json should contain these fields as a minimum:

{
    "testdata_path": "example_input/qa.jsonl",
    "target_url": "http://localhost:50505/chat",
    "requested_metrics": ["groundedness", "relevance", "coherence", "latency", "answer_length"],
    "results_dir": "example_results/experiment<TIMESTAMP>"
}

Running against a local container

If you're running this evaluator in a container and your app is running in a container on the same system, use a URL like this for the target_url:

"target_url": "http://host.docker.internal:50505/chat"

Running against a deployed app

To run against a deployed endpoint, change the target_url to the chat endpoint of the deployed app:

"target_url": "https://app-backend-j25rgqsibtmlo.azurewebsites.net/chat"

Running on a subset of questions

It's common to run the evaluation on a subset of the questions, to get a quick sense of how the changes are affecting the answers. To do this, use the --numquestions parameter:

python -m scripts evaluate --config=example_config.json --numquestions=2

Specifying the evaluate metrics

The evaluate command will use the metrics specified in the requested_metrics field of the config JSON. Some of those metrics are built-in to the evaluation SDK, and the rest are custom metrics that we've added.

Built-in metrics

These metrics are calculated by sending a call to the GPT model, asking it to provide a 1-5 rating, and storing that rating.

Important

The built-in metrics are only intended for use on evaluating English language answers, since they use English-language prompts internally. For non-English languages, you should use the custom prompt metrics instead.

gpt_coherence measures how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language.
gpt_relevance assesses the ability of answers to capture the key points of the context.
gpt_groundedness assesses the correspondence between claims in an AI-generated answer and the source context, making sure that these claims are substantiated by the context.
gpt_similarity measures the similarity between a source data (ground truth) sentence and the generated response by an AI model.
gpt_fluency measures the grammatical proficiency of a generative AI's predicted answer.
f1_score Measures the ratio of the number of shared words between the model generation and the ground truth answers.

Custom metrics

Prompt metrics

The following metrics are implemented very similar to the built-in metrics, but use a locally stored prompt. They're a great fit if you find that the built-in metrics are not working well for you or if you need to translate the prompt to another language.

mycoherence: Measures how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language. Based on scripts/evaluate_metrics/prompts/coherence.prompty.
myrelevance: Assesses the ability of answers to capture the key points of the context. Based on scripts/evaluate_metrics/prompts/relevance.prompty.
mygroundedness: Assesses the correspondence between claims in an AI-generated answer and the source context, making sure that these claims are substantiated by the context. Based on scripts/evaluate_metrics/prompts/groundedness.prompty.

Code metrics

These metrics are calculated with some local code based on the results of the chat app, and do not require a call to the GPT model.

latency: The time it takes for the chat app to generate an answer, in seconds.
length: The length of the generated answer, in characters.
has_citation: Whether the answer contains a correctly formatted citation to a source document, assuming citations are in square brackets.
citation_match: Whether the answer contains at least all of the citations that were in the ground truth answer.

Sending additional parameters to the app

This repo assumes that your chat app is following the Chat App Protocol, which means that all POST requests look like this:

{"messages": [{"content": "<Actual user question goes here>", "role": "user"}],
 "stream": False,
 "context": {...},
}

Any additional app parameters would be specified in the context of that JSON, such as temperature, search settings, prompt overrides, etc. To specify those parameters, add a target_parameters key to your config JSON. For example:

    "target_parameters": {
        "overrides": {
            "semantic_ranker": false,
            "prompt_template": "<READFILE>example_input/prompt_refined.txt"
        }
    }

The overrides key is the same as the overrides key in the context of the POST request. As a convenience, you can use the <READFILE> prefix to read in a file and use its contents as the value for the parameter. That way, you can store potential (long) prompts separately from the config JSON file.

Viewing the results

The results of each evaluation are stored in a results folder (defaulting to example_results). Inside each run's folder, you'll find:

eval_results.jsonl: Each question and answer, along with the GPT metrics for each QA pair.
parameters.json: The parameters used for the run, like the overrides.
summary.json: The overall results, like the average GPT metrics.
config.json: The original config used for the run. This is useful for reproducing the run.

To make it easier to view and compare results across runs, we've built a few tools, located inside the review-tools folder.

Using the summary tool

To view a summary across all the runs, use the summary command with the path to the results folder:

python -m review_tools summary example_results

This will display an interactive table with the results for each run, like this:

To see the parameters used for a particular run, select the folder name. A modal will appear with the parameters, including any prompt override.

Using the compare tool

To compare the answers generated for each question across 2 runs, use the compare command with 2 paths:

python -m review_tools diff example_results/baseline_1 example_results/baseline_2

This will display each question, one at a time, with the two generated answers in scrollable panes, and the GPT metrics below each answer.

]

Use the buttons at the bottom to navigate to the next question or quit the tool.

You can also filter to only show questions where the value changed for a particular metric, like this:

python -m review_tools diff example_results/baseline_1 example_results/baseline_2 --changed=has_citation

Measuring app's ability to say "I don't know"

The evaluation flow described above focused on evaluating a model’s answers for a set of questions that could be answered by the data. But what about all those questions that can’t be answered by the data? Does your model know how to say “I don’t know?” The GPT models are trained to try and be helpful, so their tendency is to always give some sort of answer, especially for answers that were in their training data. If you want to ensure your app can say “I don’t know” when it should, you need to evaluate it on a different set of questions with a different metric.

Generating ground truth data for answer-less questions

For this evaluation, our ground truth data needs to be a set of question whose answer should provoke an "I don’t know" response from the data. There are several categories of such questions:

Unknowable: Questions that are related to the sources but not actually in them (and not public knowledge).
Uncitable: Questions whose answers are well known to the LLM from its training data, but are not in the sources. There are two flavors of these:
- Related: Similar topics to sources, so LLM will be particularly tempted to think the sources know.
- Unrelated: Completely unrelated to sources, so LLM shouldn’t be as tempted to think the sources know.
Nonsensical: Questions that are non-questions, that a human would scratch their head at and ask for clarification.

You can write these questions manually, but it’s also possible to generate them using a generator script in this repo, assuming you already have ground truth data with answerable questions.

python -m scripts generate-dontknows --input=example_input/qa.jsonl --output=example_input/qa_dontknows.jsonl --numquestions=45

That script sends the current questions to the configured GPT-4 model along with prompts to generate questions of each kind.

When it’s done, you should review and curate the resulting ground truth data. Pay special attention to the "unknowable" questions at the top of the file, since you may decide that some of those are actually knowable, and you may want to reword or rewrite entirely.

Running an evaluation for answer-less questions

This repo contains a custom GPT metric called "dontknowness" that rates answers from 1-5, where 1 is "answered the question completely with no certainty" and 5 is "said it didn't know and attempted no answer". The goal is for all answers to be rated 4 or 5.

Here's an example configuration JSON that requests that metric, referencing the new ground truth data and a new output folder:

{
    "testdata_path": "example_input/qa_dontknows.jsonl",
    "results_dir": "example_results_dontknows/baseline",
    "requested_metrics": ["dontknowness", "answer_length", "latency", "has_citation"],
    "target_url": "http://localhost:50505/chat",
    "target_parameters": {
    }
}

We recommend a separate output folder, as you'll likely want to make multiple runs and easily compare between those runs using the review tools.

Run the evaluation like this:

python -m scripts evaluate --config=dontknows.config.json

The results will be stored in the results_dir folder, and can be reviewed using the review tools.

Improving the app's ability to say "I don't know"

If the app is not saying "I don't know" enough, you can use the diff tool to compare the answers for the "dontknows" questions across runs, and see if the answers are improving. Changes you can try:

Adjust the prompt to encourage the model to say "I don't know" more often. Remove anything in the prompt that might be distracting or overly encouraging it to answer.
Try using GPT-4 instead of GPT-3.5. The results will be slower (see the latency column) but it may be more likely to say "I don't know" when it should.
Adjust the temperature of the model used by your app.
Add an additional LLM step in your app after generating the answer, to have the LLM rate its own confidence that the answer is found in the sources. If the confidence is low, the app should say "I don't know".

ai-rag-chat-evaluator's People

Contributors

Stargazers

Watchers

Forkers

arvindmits rnrbarbosa tvriesde doytsujin sdwailah nielsots chazarabriseis loo-ree diberry dfberry timzhan sncfr-davg diwasagilify deadlycrush jasonhaley sofyanajridi icloudhospital richardpark-msft grnisha kdcllc amyhei paulmoise john-rice yutaokpmg datavisualscode jfontestad mmvmarko courtlin-holt-nguyen devopsincanada el-technology jonrippon ianagoudie oleksis killchainrepo yaqinamy viperinewaif codemillmatt czumbiehl absingavi coeffowl966 aymenfurter snafis ravikadri nautisense hatefulrock henry-zeng samkoduru gijsvoogd huangyingting armstrong-kim mikedizon

ai-rag-chat-evaluator's Issues

Add tests for review_tools

This repo needs tests for review_tools and a CI action that can run them.

Evaluate script not working

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

python -m scripts evaluate --config=example_config.json
15:26:09 (INFO) scripts: Running evaluation from config C:\Users\FranciscoMendes\OneDrive\OpenAI_POCs\SPMS_POC_OpenAI\ai-rag-chat-evaluator\example_config.json
15:26:09 (INFO) scripts: Replaced results_dir in config with timestamp
15:26:09 (INFO) scripts: Replaced prompt_template in config with contents of example_input/prompt_refined.txt
15:26:09 (INFO) scripts: Using Azure OpenAI Service with Azure Developer CLI Credential
15:26:09 (INFO) azure.identity._internal.decorators: AzureDeveloperCliCredential.get_token succeeded
15:26:09 (INFO) scripts: Running evaluation using data from C:\Users\FranciscoMendes\OneDrive\OpenAI_POCs\SPMS_POC_OpenAI\ai-rag-chat-evaluator\example_input\qa.jsonl
15:26:09 (INFO) scripts: Sending a test question to the target to ensure it is running...
15:26:10 (ERROR) scripts: Failed to send a test question to the target due to error:
Expecting value: line 1 column 1 (char 0)
15:26:10 (ERROR) scripts: Evaluation was terminated early due to an error ⬆

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

I am trying to run the evaluate script but it is not working. I have already managed to generate the json with the questions and answers correctly and changed the config.json with my required target but I keep getting the same error. The error seems to be related to the connection with the target endpoint.

Thanks! We'll be in touch soon.

new clone today has error in postCreateCommand

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [X ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Tested on both mac and win on new/fresh clone

Open in dev container.

Any log messages given by the failure

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: pip install --upgrade pip
An error has occurred: FatalError: git failed. Is it installed, and are you in a
 Git repository directory?
Check the log at /home/vscode/.cache/pre-commit/pre-commit.log
[522193 ms] postCreateCommand failed with exit code 1. Skipping any further user-provided commands.
Done. Press any key to close the terminal.

Expected/desired behavior

No errors at end of container startup

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

Wrong computation of GPT metrics after Update

After the updates from #45 I've rerun the evaluation and got the minor score for all metrics:

Before update:
(I've omitted the context)

    {
            "question": "Qual é o objetivo do Acordo de Cooperação Técnica mencionado na Resolução nº 19.423?",
            "answer": "O objetivo do Acordo de Cooperação Técnica mencionado na Resolução nº 19.423 é a melhoria da gestão das parcerias que envolvem colaboração mútua e interesse público recíproco, bem como promover e estimular ações de capacitação, comunicação e transparência.",
            "context": ...,
            "truth": "O objetivo do Acordo de Cooperação Técnica mencionado na Resolução é promover ações voltadas ao desenvolvimento do 'Projeto Sede de Aprender Nacional'.",
            "gpt_groundedness": 5,
            "gpt_relevance": 5,
            "gpt_coherence": 5
        },

After update:

        {
            "question": "Qual é o objetivo do Acordo de Cooperação Técnica mencionado na Resolução nº 19.423?",
            "truth": "O objetivo do Acordo de Cooperação Técnica mencionado na Resolução é promover ações voltadas ao desenvolvimento do 'Projeto Sede de Aprender Nacional'.",
            "latency": 1.905,
            "answer": "O objetivo do Acordo de Cooperação Técnica mencionado na Resolução nº 19.423 é a melhoria da gestão das parcerias que envolvem colaboração mútua e interesse público recíproco, bem como promover e estimular ações de capacitação, comunicação e transparência.",
            "context": "....",
            "answer_length": 256,
            "has_citation": false,
            "gpt_coherence": 1,
            "gpt_relevance": 1,
            "gpt_groundedness": 1
        },

@pamelafox any idea of what may happen?

Evaluate fails using openai.com instance

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Follow https://github.com/Azure-Samples/azure-search-openai-demo to launch local copy of the backend app, using openai deployment (not Azure openai).
Clone this repo, setup dev container and set environmental variables in accordance with https://github.com/Azure-Samples/ai-rag-chat-evaluator?tab=readme-ov-file#using-an-openaicom-instance.
Launch evaluate with the next config:
{ "api_type": "openai", "testdata_path": "example_input/qa.jsonl", "results_dir": "example_results/experiment<TIMESTAMP>", "requested_metrics": ["gpt_groundedness", "gpt_relevance", "gpt_coherence", "answer_length", "latency"], "target_url": "http://host.docker.internal:50505/chat", "target_parameters": { "overrides": { "semantic_ranker": false, "prompt_template": "<READFILE>example_input/prompt_refined.txt" } }, "model": "gpt-3.5-turbo" }

Log trace

$ python -m scripts evaluate --config=example_config.json
17:44:10 (INFO) scripts: Running evaluation from config /workspaces/ai-rag-chat-evaluator/example_config.json
17:44:10 (INFO) scripts: Replaced results_dir in config with timestamp
17:44:10 (INFO) scripts: Replaced prompt_template in config with contents of example_input/prompt_refined.txt
17:44:10 (INFO) scripts: Using OpenAI Service with API Key from OPENAICOM_KEY
17:44:10 (INFO) scripts: Running evaluation using data from /workspaces/ai-rag-chat-evaluator/example_input/qa.jsonl
17:44:10 (INFO) scripts: Sending a test question to the target to ensure it is running...
17:44:17 (INFO) scripts: Successfully received response from target: "question": "What information is in your kn...", "answer": "What would you like to know?", "context": "SUBSET-026-2 v360.pdf#page=9: ..."
17:44:17 (INFO) scripts: Sending a test chat completion to the GPT deployment to ensure it is running...
17:44:19 (INFO) httpx: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
17:44:19 (INFO) scripts: Successfully received response from GPT: Hello! How can I assist you today?
17:44:19 (INFO) scripts: Starting evaluation...
Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
Calculating code metrics : ['answer_length', 'latency']
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 20.79it/s]
Traceback (most recent call last):
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_evaluate.py", line 359, in _evaluate
inbuilt_metrics_results = inbuilt_metrics_handler.calculate_metrics()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_metric_handler.py", line 82, in calculate_metrics
deployment_id = openai_config["deployment_id"]
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'deployment_id'

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspaces/ai-rag-chat-evaluator/scripts/main.py", line 6, in
app()
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 328, in call
raise e
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 783, in main
return _main(
^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 225, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 685, in wrapper
return callback(**use_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 31, in evaluate
run_evaluate_from_config(Path.cwd(), config, numquestions)
File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 196, in run_evaluate_from_config
evaluation_run_complete = run_evaluation(
^^^^^^^^^^^^^^^
File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 121, in run_evaluation
results = evaluate(
^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/ml/_telemetry/activity.py", line 291, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_evaluate.py", line 237, in evaluate
evaluation_result = _evaluate(
^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_evaluate.py", line 359, in _evaluate
inbuilt_metrics_results = inbuilt_metrics_handler.calculate_metrics()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_metric_handler.py", line 82, in calculate_metrics
deployment_id = openai_config["deployment_id"]
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'deployment_id'

Expected/desired behavior

Evaluation completes just like in the video: https://www.youtube.com/watch?v=mM8pZAI2C5w

OS and Version?

macOS 14.3.1 (23D60)

Mention any other details that might be useful

I traced the line causing an error in the azure sdk (see https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-generative/azure/ai/generative/evaluate/_metric_handler.py). It seems that the code expects an instance of https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-resources/azure/ai/resources/entities/configs.py, but when I inspect it, it refers to Azure deployments only.
Am I missing something when trying to evaluate using openai.api?

Getting nan values in results for metrics

Please provide us with the following information:

While running the py snippet i am getting null values for some metrices. Using below mentioned input.

question="What is the capital of France?",
context="France is in Europe and it's capital is Paris.",
answer="Paris is the capital of France."
truth = "Paris"

results = evaluate(
target=wrap_target,
data=testdata,
task_type="qa",
metrics_list=["gpt_groundedness","gpt_relevance","gpt_coherence","gpt_fluency","gpt_similarity", "hate_unfairness", "sexual", "violence", "self_harm"],
model_config= model_config
data_mapping={
"question": "question",
"context": "context",
"answer": "answer",
},
tracking=False,
output_path="./")

This issue is for a: (mark with an `x`)

- [ print(results.metrics_summary )
- {'gpt_coherence': nan,
 'gpt_similarity': nan,
 'gpt_fluency': nan,
 'gpt_relevance': 5.0,
 'gpt_groundedness': 5.0,
 'self_harm_defect_rate': nan,
 'violence_defect_rate': nan,
 'hate_unfairness_defect_rate': nan,
 'sexual_defect_rate': nan} ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'

Expected/desired behavior

I am getting the answer for gpt_groundedness and gpt_relevance but for other metrics getting null values.

OS and Version?

Windows 11

Versions

azure-ai-generative[evaluate]==1.0.0b8
promptflow==1.6.0

Mention any other details that might be useful

Thanks! We'll be in touch soon.

Where to give input data to generate ground truth samples from?

I'm trying to generate ground truth examples, where are we providing the data here?

Can the generate script create the output dir/path if it doesn't exist?

2024-01-12 23:51:12 (INFO) scripts: Writing 16 questions to /workspaces/ai-rag-chat-evaluator/example_input_2/qa.jsonl
Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/workspaces/ai-rag-chat-evaluator/scripts/__main__.py", line 6, in <module>
    app()

  File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 36, in generate
    generate_test_qa_data(

  File "/workspaces/ai-rag-chat-evaluator/scripts/generate.py", line 47, in generate_test_qa_data
    with open(output_file, "w") as f:
         ^^^^^^^^^^^^^^^^^^^^^^

FileNotFoundError: [Errno 2] No such file or directory: '/workspaces/ai-rag-chat-evaluator/example_input_2/qa.jsonl'

Connection error when running generator script

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

21:13:11 (INFO) scripts: Using Azure OpenAI Service with Azure Developer CLI Credential
21:13:11 (INFO) azure.identity._internal.decorators: AzureDeveloperCliCredential.get_token succeeded
21:13:11 (INFO) scripts: Using Azure Search Service with Azure Developer CLI Credential
21:13:11 (INFO) scripts: Generating 200 questions total, 5 per source, based on search results
21:13:11 (INFO) azure.identity._internal.decorators: AzureDeveloperCliCredential.get_token succeeded
21:13:11 (INFO) azure.core.pipeline.policies.http_logging_policy: Request URL: 'https://gptkb-tmvqzfxi3p3jc.search.windows.net/indexes('gptkbindex')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
'Content-Type': 'application/json'
'Content-Length': '27'
'Accept': 'application/json'
'x-ms-client-request-id': 'fa39de84-f1fe-11ee-bbb8-002248870ddc'
'User-Agent': 'azsdk-python-search-documents/11.4.0 Python/3.11.8 (Linux-6.2.0-1019-azure-x86_64-with-glibc2.31)'
'Authorization': 'REDACTED'
A body is sent with the request
21:13:15 (INFO) azure.core.pipeline.policies.http_logging_policy: Response status: 200
Response headers:
'Transfer-Encoding': 'chunked'
'Content-Type': 'application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8'
'Content-Encoding': 'REDACTED'
'Vary': 'REDACTED'
'Server': 'Microsoft-IIS/10.0'
'Strict-Transport-Security': 'REDACTED'
'Preference-Applied': 'REDACTED'
'OData-Version': 'REDACTED'
'request-id': 'fa39de84-f1fe-11ee-bbb8-002248870ddc'
'elapsed-time': 'REDACTED'
'Date': 'Wed, 03 Apr 2024 21:13:13 GMT'
21:13:15 (INFO) scripts: Processing search document Yan Calibration Transfer and drift compensation e noses via coupled task learning 2015.pdf#page=1
21:13:15 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.979383 seconds
21:13:16 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.683232 seconds
21:13:20 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.990305 seconds
21:13:21 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.640511 seconds
21:13:27 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.801224 seconds
21:13:27 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.739024 seconds
21:13:37 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.890408 seconds
21:13:38 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.806998 seconds
21:13:56 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.803116 seconds
21:13:57 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.692248 seconds
21:14:30 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.972259 seconds
21:14:31 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.619241 seconds
21:15:37 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.946597 seconds
21:15:38 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.540704 seconds
21:17:48 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 0.822784 seconds
21:17:48 (INFO) openai._base_client: Retrying request to /deployments//chat/completions in 1.631186 seconds
Traceback (most recent call last):
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
yield
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_transports/default.py", line 233, in handle_request
resp = self._pool.handle_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 216, in handle_request
raise exc from None
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 196, in handle_request
response = connection.handle_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 99, in handle_request
raise exc
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 76, in handle_request
stream = self._connect(request)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_sync/connection.py", line 122, in _connect
stream = self._network_backend.connect_tcp(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 205, in connect_tcp
with map_exceptions(exc_map):
File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
self.gen.throw(typ, value, traceback)
File "/home/vscode/.local/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ConnectError: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 931, in _request
response = self._client.send(
^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_client.py", line 1015, in _send_single_request
response = transport.handle_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_transports/default.py", line 232, in handle_request
with map_httpcore_exceptions():
File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
self.gen.throw(typ, value, traceback)
File "/home/vscode/.local/lib/python3.11/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/workspaces/ai-rag-chat-evaluator/scripts/main.py", line 6, in
app()
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 326, in call
raise e
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 309, in call
return get_command(self)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 723, in main
return _main(
^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 193, in _main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 688, in wrapper
return callback(**use_params)
^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 40, in generate
generate_test_qa_data(
File "/workspaces/ai-rag-chat-evaluator/scripts/generate.py", line 40, in generate_test_qa_data
result = qa_generator.generate(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/ml/_telemetry/activity.py", line 291, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/synthetic/qa.py", line 397, in generate
content, token_usage = _completion_with_retries(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/synthetic/qa.py", line 69, in _completion_with_retries
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 667, in create
return self._post(
^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1213, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 902, in request
return self._request(
^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 955, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 955, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 965, in _request
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

Expected/desired behavior

OS and Version?

Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

yaml file does not exist

I've taken the latest pull and facing this error: mlflow.exceptions.MissingConfigException: Yaml file '..\mlruns\0\meta.yaml' does not exist.

I tried to add meta.yaml file from old pull run and everything worked fine. Looks like something is missing in the latest code w.r.t this file.

Incompatible Python Version 3.11, 3.12

Please provide us with the following information:
what version of python should be using for this repos? because it failed on installing the modules via python 3.11 and 3.12... for this

pip install azure-ai-generative[evaluate]==1.0.0b7

or anything that i could missed out during the step?

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [x] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

run with python 3.11 and python 3.12 with this pip install azure-ai-generative[evaluate]==1.0.0b7

Any log messages given by the failure

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for rtoml
Successfully built google-search-results
Failed to build rtoml
ERROR: Could not build wheels for rtoml, which is required to install pyproject.toml-based projects

Expected/desired behavior

build successfully

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

I able to run it with
pip install --upgrade pip setuptools wheel
pip install --only-binary :all: rtoml
pip install azure-ai-generative

Thanks! We'll be in touch soon.

Thank you!

Deploy in Container apps job?

Maybe I'm missing where the evaluator gets deployed but would it make sense as a container app job?

@anthonychu

Is the .env.sample correct?

Perhas the env variable that is needed in service_setup.py, OPENAI_GPT_MODEL, has a different name, OPENAI_EVALGPT_MODEL, in .env.samples`?

Review tool errors "No such file or directory: 'my_results/experiment1705604697/parameters.json'"

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Create evaluations in my_results directory.

python3 -m review_tools summary my_results
Notice error is looking for a different file than the 1 the tool created. parameters.json versus evaluate_parameters.json

Any log messages given by the failure

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /workspaces/ai-rag-chat-evaluator/review_tools/cli.py:20 in summary                              │
│                                                                                                  │
│   17                                                                                             │
│   18 @app.command()                                                                              │
│   19 def summary(results_dir: Path = typer.Argument(exists=True, dir_okay=True, file_okay=Fal    │
│ ❱ 20 │   summary_app.main(results_dir)                                                           │
│   21                                                                                             │
│   22                                                                                             │
│   23 def cli():                                                                                  │
│                                                                                                  │
│ ╭─────────────── locals ────────────────╮                                                        │
│ │ results_dir = PosixPath('my_results') │                                                        │
│ ╰───────────────────────────────────────╯                                                        │
│                                                                                                  │
│ /workspaces/ai-rag-chat-evaluator/review_tools/summary_app.py:78 in main                         │
│                                                                                                  │
│   75                                                                                             │
│   76                                                                                             │
│   77 def main(directory: Path):                                                                  │
│ ❱ 78 │   app = TableApp(directory)                                                               │
│   79 │   app.run()                                                                               │
│   80                                                                                             │
│                                                                                                  │
│ ╭────────────── locals ───────────────╮                                                          │
│ │ directory = PosixPath('my_results') │                                                          │
│ ╰─────────────────────────────────────╯                                                          │
│                                                                                                  │
│ /workspaces/ai-rag-chat-evaluator/review_tools/summary_app.py:59 in __init__                     │
│                                                                                                  │
│   56 │   │   │   │   │   │   summary.get("answer_length", {}).get("mean", "Unknown"),            │
│   57 │   │   │   │   │   )                                                                       │
│   58 │   │   │   │   )                                                                           │
│ ❱ 59 │   │   │   with open(Path(results_dir) / folder / "parameters.json") as f:                 │
│   60 │   │   │   │   self.row_parameters[folder] = json.load(f)                                  │
│   61 │                                                                                           │
│   62 │   def compose(self) -> ComposeResult:                                                     │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │     citation = 1.0                                                                           │ │
│ │    coherence = {'mean_rating': 5.0, 'pass_count': 14, 'pass_rate': 1.0}                      │ │
│ │            f = <_io.TextIOWrapper name='my_results/experiment1705604697/summary.json'        │ │
│ │                mode='r' encoding='UTF-8'>                                                    │ │
│ │       folder = 'experiment1705604697'                                                        │ │
│ │      folders = ['experiment1705604697', 'experiment1705605065', 'experiment1705605215']      │ │
│ │ groundedness = {'mean_rating': 5.0, 'pass_count': 14, 'pass_rate': 1.0}                      │ │
│ │    relevance = {'mean_rating': 5.0, 'pass_count': 14, 'pass_rate': 1.0}                      │ │
│ │  results_dir = PosixPath('my_results')                                                       │ │
│ │         self = TableApp(title='TableApp', classes={'-dark-mode'})                            │ │
│ │      summary = {                                                                             │ │
│ │                │   'gpt_coherence': {                                                        │ │
│ │                │   │   'mean_rating': 5.0,                                                   │ │
│ │                │   │   'pass_count': 14,                                                     │ │
│ │                │   │   'pass_rate': 1.0                                                      │ │
│ │                │   },                                                                        │ │
│ │                │   'gpt_relevance': {                                                        │ │
│ │                │   │   'mean_rating': 5.0,                                                   │ │
│ │                │   │   'pass_count': 14,                                                     │ │
│ │                │   │   'pass_rate': 1.0                                                      │ │
│ │                │   },                                                                        │ │
│ │                │   'gpt_groundedness': {                                                     │ │
│ │                │   │   'mean_rating': 5.0,                                                   │ │
│ │                │   │   'pass_count': 14,                                                     │ │
│ │                │   │   'pass_rate': 1.0                                                      │ │
│ │                │   },                                                                        │ │
│ │                │   'answer_length': {                                                        │ │
│ │                │   │   'total': 22932,                                                       │ │
│ │                │   │   'mean': 1638.0,                                                       │ │
│ │                │   │   'max': 2615,                                                          │ │
│ │                │   │   'min': 705                                                            │ │
│ │                │   },                                                                        │ │
│ │                │   'answer_has_citation': {'total': 14, 'rate': 1.0}                         │ │
│ │                }                                                                             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: 'my_results/experiment1705604697/parameters.json'

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

@pamelafox - love the new output for errors - much easier to figure out.

Add tests for evaluate

This repo needs test for evaluate() and a CI to run them. This will require several mocks.

AZURE_OPENAI_API_KEY mismatch

This issue is for a: (mark with an `x`)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

setup env and launch python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=200 --persource=5

Any log messages given by the failure

openai: error_code=PermissionDenied error_message='Principal does not have access to API/Operation.' error_param=None error_type=None message='OpenAI API error received' stream_error=False

Expected/desired behavior

N/A

OS and Version?

Windows 11. Linux (Github Codespaces).

Versions

main

Mention any other details that might be useful

The .env.sample provides the variable AZURE_OPENAI_KEY while service_setup.py looks for AZURE_OPENAI_API_KEY

Thanks! We'll be in touch soon.

When using the local metrics with prompt getting Key Error during evaluation - values returning as None

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [X ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Change from built-in metrics to custom metrics listed in the scripts\evaluate_metrics\prompts folder. Modify example_config.json to contain:

Any log messages given by the failure

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['relevance_score'], dtype='object')] are in the [columns]"

Expected/desired behavior

Jinja prompt templates to be used and evaluated with scores and then evaluated.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Windows 11

Versions

v22H2

Mention any other details that might be useful

I am using the same test data path and the only change I am making is to use the local prompt via the Jinja2 templates. My goal is to be able to customize the metrics to my use case and add additional metrics as well but am encountering this issue whenever attempting to use a non-built-in metric.

Here is an example of the metric response received:

'latency': 9.598966, 'relevance': None, 'answer_length': 615, 'gpt_coherence': 5, 'gpt_groundedness': 5, 'gpt_groundedness_reason': '(Failed)'

"Relevance" is the metric changed to use the local prompt that where it seems to have an issue with the value, although this has happened when attempting to utilize any of the local prompts.

Thanks! We'll be in touch soon.

Generator script not working

Trying to run the generator script and it doesn't work. I am using Azure openai/Ai search for index/code space for testing.
Please help if possible

[ x] bug report -> please search issues before submitting
feature request
[x ] documentation issue or request
regression (a behavior that used to work and stopped in a new release)


### Minimal steps to reproduce
>

### Any log messages given by the failure

>05:06:56 (INFO) scripts: Using Azure OpenAI Service with Azure Developer CLI Credential
05:06:56 (INFO) azure.identity._internal.decorators: AzureDeveloperCliCredential.get_token succeeded
05:06:56 (INFO) scripts: Using Azure Search Service with API Key from AZURE_SEARCH_KEY
05:06:56 (INFO) scripts: Generating 50 questions total, 5 per source, based on search results
05:06:56 (INFO) azure.core.pipeline.policies.http_logging_policy: Request URL: 'https://testservcie3.search.windows.net/indexes('index12')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '27'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '822ae6a6-e5ae-11ee-8800-000d3acb1738'
    'User-Agent': 'azsdk-python-search-documents/11.4.0 Python/3.11.8 (Linux-6.2.0-1019-azure-x86_64-with-glibc2.31)'
A body is sent with the request
05:06:56 (INFO) azure.core.pipeline.policies.http_logging_policy: Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.streaming=true; charset=utf-8'
    'Content-Encoding': 'REDACTED'
    'Vary': 'REDACTED'
    'Server': 'Microsoft-IIS/10.0'
    'Strict-Transport-Security': 'REDACTED'
    'Preference-Applied': 'REDACTED'
    'OData-Version': 'REDACTED'
    'request-id': '822ae6a6-e5ae-11ee-8800-000d3acb1738'
    'elapsed-time': 'REDACTED'
    'Date': 'Tue, 19 Mar 2024 05:06:56 GMT'
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspaces/ai-rag-chat-evaluator/scripts/__main__.py", line 6, in <module>
    app()
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 328, in __call__
    raise e
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 778, in main
    return _main(
           ^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 40, in generate
    generate_test_qa_data(
  File "/workspaces/ai-rag-chat-evaluator/scripts/generate.py", line 37, in generate_test_qa_data
    logger.info("Processing search document %s", doc["sourcepage"])

### Expected/desired behavior
>na

### OS and Version?
windows 10 VM 



### Mention any other details that might be useful
na

mismatched_organization error

when comparing local llama2 model with GPT4, got error:

ERROR Failed to send a test chat completion to the GPT deployment due to error: evaluate.py:110
Error code: 401 - {'error': {'message': 'OpenAI-Organization header should match organization for API key', 'type': 'invalid_request_error',
'param': None, 'code': 'mismatched_organization'}

Connection is not found during evaluate

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

generate script works fine against index
using local running app with target url: http://127.0.0.1:50505/chat
python -m scripts evaluate --config=example_config.json --numquestions=2

Any log messages given by the failure

Traceback (most recent call last):
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/azure/ai/generative/evaluate/_evaluate.py", line 359, in _evaluate
inbuilt_metrics_results = inbuilt_metrics_handler.calculate_metrics()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/azure/ai/generative/evaluate/_metric_handler.py", line 108, in calculate_metrics
pf_run = run_pf_flow_with_dict_list(flow_path, dict_list, flow_params={"connections": {node: connection_override for node in nodes_list}})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/azure/ai/generative/evaluate/_utils.py", line 65, in run_pf_flow_with_dict_list
return pf_client.run(
^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_pf_client.py", line 187, in run
return self.runs.create_or_update(run=run, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/operations/_run_operations.py", line 103, in create_or_update
created_run = RunSubmitter(client=self._client).submit(run=run, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/run_submitter.py", line 40, in submit
self._run_bulk(run=run, stream=stream, **kwargs)
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/run_submitter.py", line 97, in _run_bulk
self._submit_bulk_run(flow=flow, run=run, local_storage=local_storage)
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/run_submitter.py", line 118, in _submit_bulk_run
connections = SubmitterHelper.resolve_connections(flow=flow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/utils.py", line 276, in resolve_connections
return SubmitterHelper.resolve_connection_names(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/utils.py", line 348, in resolve_connection_names
raise e
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_submitter/utils.py", line 344, in resolve_connection_names
conn = client.connections.get(name=n, with_secrets=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/operations/_connection_operations.py", line 52, in get
return self._get(name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/operations/_connection_operations.py", line 57, in _get
orm_connection = ORMConnection.get(name, raise_error)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_orm/retry.py", line 43, in f_retry
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/Users/hzwnfj/workspaces/ai-rag-chat-evaluator/.venv/lib/python3.12/site-packages/promptflow/_sdk/_orm/connection.py", line 52, in get
raise ConnectionNotFoundError(f"Connection {name!r} is not found.")
promptflow._sdk._errors.ConnectionNotFoundError: Connection 'Default_AzureOpenAI' is not found.

Expected/desired behavior

OS and Version?

macOS Sonoma 14.3.1

Versions

Mention any other details that might be useful

I've tried evaluating all questions (200) and a subset (2), both failed with the same connection error. The test question went fine as I saw the log before the error

14:59:06 (INFO) scripts: Sending a test chat completion to the GPT deployment to ensure it is running...
14:59:08 (INFO) httpx: HTTP Request: POST https://dig-openai.openai.azure.com/openai/deployments/bd-gpt4-32k/chat/completions?api-version=2023-07-01-preview "HTTP/1.1 200 OK"
14:59:08 (INFO) scripts: Successfully received response from GPT: Hello! How can I assist you today?
14:59:08 (INFO) scripts: Starting evaluation...

Thanks! We'll be in touch soon.

Evaluate script fails after "Starting evaluation...": 'charmap' codec can't encode characters in position 6-10: character maps to <undefined>

This issue is for a: (mark with an `x`)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Used the azure-search-openai-demo and loaded it with the constitution of the US. App itself works fine. Then I used the generate script to generate the following q/a pairs:

{"question": "What happens if a bill is not returned by the President within ten days, excluding Sundays?", "truth": "If a bill is not returned by the President within ten days (Sundays excepted) after it has been presented to him, it shall become a law in the same manner as if he had signed it.\n[constitution.pdf#page=4]"}
{"question": "What is the exception to this rule regarding a bill becoming a law without the President's signature?", "truth": "The exception to this rule is if Congress adjourns and prevents the bill's return, in which case it shall not become a law.\n[constitution.pdf#page=4]"}
{"question": "What types of orders, resolutions, or votes require the concurrence of both the Senate and House of Representatives?", "truth": "Every order, resolution, or vote that requires the concurrence of both the Senate and House of Representatives, except on a question of adjournment, must be presented to the President of the United States.\n[constitution.pdf#page=4]"}
etc..

When I then try to run the evaluate command, it first sends a test question which goes fine:

2024-02-02 14:54:30 (INFO) scripts: Sending a test question to the target to ensure it is running...
2024-02-02 14:54:39 (INFO) scripts: Successfully received response from target: "question": "What information is in your kn...", "answer": "Our knowledge base includes in...", "context": "constitution.pdf#page=5: by th..."
2024-02-02 14:54:39 (INFO) scripts: Starting evaluation...

But then it fails and I get the following error messages:

2024-02-02 14:57:54 (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_relevance', 'gpt_coherence', 'gpt_groundedness']
2024-02-02 14:57:54 (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 20, 'y_pred_length': 20, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2024-02-02 14:57:56 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-10: character maps to <undefined>
2024-02-02 14:57:56 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_relevance
2024-02-02 14:57:56 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2024-02-02 14:57:57 (WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-10: character maps to <undefined>
2024-02-02 14:57:57 (ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_coherence
2024-02-02 14:57:57 (ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]2

eval_results.jsonl also contains the following:

{"question":"What happens if a bill is not returned by the President within ten days, excluding Sundays?","answer":"If a bill is not returned by the President within ten days, excluding Sundays, it will become a law as if the President had signed it [constitution.pdf#page=4].","context":"constitution.pdf#page=4: shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their Adjournment prevent its Return, in which Case it shall not be a Law Every Order, Resolution, or Vote to which the Concur- rence of the Senate and House of Representatives may be necessary (except on a question of Adjournment) shall be presented to the President of the United States; and before the Same shall take Effect, shall be approved by him, or be- ing disapproved by him, shall be repassed by two thirds of the Senate and House of Representatives, according to the Rules and Limitations prescribed in the Case of a Bill. SECTION. 8 The Congress shall have Power To lay and collect Taxes, Duties, Imposts and Excises, to pay the Debts and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States; To borrow Money on the credit of the United States; To regulate Commerce with foreign Nations, and among the several States, and with the Indian Tribes; To establish an uniform Rule of \n\nconstitution.pdf#page=4: originate in the House of Representatives; but the Senate may propose or concur with Amendments as on other Bills Every Bill which shall have passed the House of Represen- tatives and the Senate, shall, before it become a Law, be presented to the President of the United States; If he ap- prove he shall sign it, but if not he shall return it, with his Objections to that House in which it shall have originated, who shall enter the Objections at large on their Journal, and proceed to reconsider it. If after such Reconsideration two thirds of that House shall agree to pass the Bill, it shall be sent, together with the Objections, to the other House, by which it shall likewise be reconsidered, and if approved by two thirds of that House, it shall become a Law. But in all such Cases the Votes of both Houses shall be determined by Yeas and Nays, and the Names of the Persons voting for and against the Bill shall be entered on the Journal of each House respectively, If any Bill shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their \n\nconstitution.pdf#page=18: written declaration that the President is unable to discharge the powers and duties of his office, the Vice President shall immediately assume the powers and duties of the office as Acting President. Thereafter, when the President transmits to the President pro tempore of the Senate and the Speaker of the House of Representatives his written declaration that no inability ex- ists, he shall resume the powers and duties of his office un- less the Vice President and a majority of either the principal officers of the executive department or of such other body as Congress may by law provide, transmit within four days to the President pro tempore of the Senate and the Speaker of the House of Representatives their written declaration that the President is unable to discharge the powers and duties of his office. Thereupon Congress shall decide the issue, assembling within forty-eight hours for that purpose if not in session. If the Congress, within twenty-one days after receipt of the latter written declaration, or, if Congress is not in session, within twenty-one days after Congress is required to assemble, determines by two-thirds vote of both Houses that the President is unable to ","truth":"If a bill is not returned by the President within ten days (Sundays excepted) after it has been presented to him, it shall become a law in the same manner as if he had signed it, unless Congress adjourns and prevents its return, in which case it shall not become a law.\n[constitution.pdf#page=4]","gpt_relevance":null,"gpt_coherence":null,"gpt_groundedness":null}
{"question":"What is the process for an order, resolution, or vote to take effect, which requires the concurrence of the Senate and House of Representatives?","answer":"The process for an order, resolution, or vote to take effect, which requires the concurrence of the Senate and House of Representatives, is as follows:\n\n1. The order, resolution, or vote is presented to the President of the United States.\n2. The President must approve the order, resolution, or vote for it to take effect.\n3. If the President disapproves of the order, resolution, or vote, it must be repassed by two-thirds of the Senate and House of Representatives.\n4. Once the order, resolution, or vote is approved by the President or repassed by two-thirds of Congress, it takes effect.\n\n[constitution.pdf#page=4][constitution.pdf#page=13]","context":"constitution.pdf#page=4: shall not be returned by the President within ten Days (Sundays excepted) after it shall have been presented to him, the Same shall be a Law, in like Manner as if he had signed it, unless the Congress by their Adjournment prevent its Return, in which Case it shall not be a Law Every Order, Resolution, or Vote to which the Concur- rence of the Senate and House of Representatives may be necessary (except on a question of Adjournment) shall be presented to the President of the United States; and before the Same shall take Effect, shall be approved by him, or be- ing disapproved by him, shall be repassed by two thirds of the Senate and House of Representatives, according to the Rules and Limitations prescribed in the Case of a Bill. SECTION. 8 The Congress shall have Power To lay and collect Taxes, Duties, Imposts and Excises, to pay the Debts and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States; To borrow Money on the credit of the United States; To regulate Commerce with foreign Nations, and among the several States, and with the Indian Tribes; To establish an uniform Rule of \n\nconstitution.pdf#page=13: of the United States, directed to the President of the Senate ;- the President of the Senate shall, in the presence of the Senate and House of Represen- tatives, open all the certificates and the votes shall then be counted ;- The person having the greatest number of votes for President, shall be the President, if such number be a majority of the whole number of Electors appointed; and if no person have such majority, then from the persons having the highest numbers not exceeding three on the list of those voted for as President, the House of Representatives shall choose immediately, by ballot, the President. But in choos- ing the President, the votes shall be taken by states, the representation from each state having one vote; a quorum for this purpose shall consist of a member or members from two-thirds of the states, and a majority of all the states shall be necessary to a choice. [And if the House of Representa- tives shall not choose a President whenever the right of choice shall devolve upon them, before the fourth day of March next following, then the Vice-President shall act as President, as in case of the death or other constitutional disability of the President .\n\nconstitution.pdf#page=11: for the sole Purpose of receiving, opening and counting the Votes for President; and, that after he shall be chosen, the Congress, together with the President, should, without Delay, proceed to execute this<\/td><\/tr><tr><td>Constitution<\/td><\/tr><tr><td rowSpan=3><\/td><td>By the unanimous Order of the Convention<\/td><\/tr><tr><td>Go. Washington-Presidt:<\/td><\/tr><tr><td>W. JACKSON Secretary.<\/td><\/tr><\/table>  * Language in brackets has been changed by amendment. CONSTITUTION OF THE UNITED STATES THE AMENDMENTS TO THE CONSTITUTION OF THE UNITED STATES AS RATIFIED BY THE STATES Preamble to the Bill of Rights Congress of the United States begun and held at the City of New-York, on Wednesday the fourth of March, THE Conventions of a number of the States, having at the time of their adopting the Constitution, expressed a desire, in order to prevent misconstruction or abuse of its powers, that further declaratory and restrictive clauses should be added: And as extending the ground of public confidence in the Government, will best ensure the beneficent ends of its institution RESOLVED by the Senate and House of Representatives of the United States of America, in Congress assembled, two ","truth":"An order, resolution, or vote that requires the concurrence of the Senate and House of Representatives must be presented to the President of the United States. It shall take effect only if approved by the President, or if disapproved, it must be repassed by two-thirds of the Senate and House of Representatives according to the rules and limitations prescribed in the case of a bill.\n[constitution.pdf#page=4]","gpt_relevance":null,"gpt_coherence":null,"gpt_groundedness":null}

Any log messages given by the failure

See above

Expected/desired behavior

Should generate a result for the evaluation

OS and Version?

Windows 10

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

How to compute mean RAG latency?

Hi there,

The code uses azure.ai.generative.evaluate import evaluate to compute RAG metrics (gpt_coherence, gpt_relevance, and gpt_groundedness).

One important metric of RAG is the latency metric, depending on the model, retrieval, and raking technique used, this may impact the latency.
I've checked the evaluate function, and can't see latency as a metric.

I tried to just compute the time needed to run the evaluation and divide it by the number of questions, but the evaluation also computes GPT metrics which takes a lot of time, so the latency results are far from accurate. The idea is the have a mean latency for each experiment as a metric.

Any tip on how to compute it?

Use Evaluations app whene the login is required in the chat app

This issue is for a:

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I would like to use this app where the chat app has the optional login enabled, setting
azd env set AZURE_USE_AUTHENTICATION true
Which settings do I have to change to allow the Evaluations app accessing the chat app if the login is required?

Any log messages given by the failure

None

Expected/desired behavior

The Evaluations app should work as when the optional login is disabled.

OS and Version?

I am using Windows 11 but everything is deployed on Azure.

Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters

This issue is for a:

- [x] bug report 
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

git checkout https://github.com/Azure-Samples/ai-rag-chat-evaluator/

Run python3 -m scripts evaluate --config=example_config.json --numquestion=2

Log messages given by the failure

(INFO) azureml-metrics: [azureml-metrics] ActivityStarted: compute_metrics-qa, ActivityType: ComputeMetrics, CustomDimensions: {'app_name': 'azureml-metrics', 'task_type': 'qa', 'azureml_metrics_run_id': 'XXXXX', 'current_timestamp': 'XXXX'}
(WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_groundedness', 'gpt_coherence', 'gpt_relevance']
(INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 2, 'y_pred_length': 2, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
0%| | 0/2 [00:00<?, ?it/s]
(WARNING) azureml.metrics.common.llm_connector._openai_connector: Computing gpt based metrics failed with the exception : 'charmap' codec can't encode characters in position 6-92: character maps to
(ERROR) azureml.metrics.common._scoring: Scoring failed for QA metric gpt_groundedness
(ERROR) azureml.metrics.common._scoring: Class: NameError
Message: name 'NotFoundError' is not defined
....

Expected/desired behavior

No error
int values for metrics "gpt_groundedness", "gpt_coherence" & "gpt_relevance"

OS and Version?

Windows 10

Versions

azureml-metrics[generative-ai]==0.0.43
azure-ai-generative==1.0.0b2
openai==0.28.1

Other Informations

Results in eval_results.jsonl:
{"question":"...","answer":"...","context":"...","truth":"...","gpt_groundedness":null,"gpt_coherence":null,"gpt_relevance":null}

All relevant files are in UTF-8

Custom prompt for generating ground truth questions?

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

I see that for generating question and answers, the tool generates questions and answers using a pre-defined prompt template. Is there any way to modify that prompt template so that the tool uses the customized prompt? For our use-case, we need the response in a very specific section-wise format, and would also be evaluating the response in that format.

Thanks! We'll be in touch soon.

Can you add a link for the overrides?

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I can't find a list to the overrides. I looked in azure-search-openai-demo as well. I'm adding screenshots of what I did find.

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/customization.md

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

AZURE_SEARCH_API_KEY mismatch

This issue is for a: (mark with an `x`)

- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

setup env and launch python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=200 --persource=5

Any log messages given by the failure

'WWW-Authenticate': 'Bearer realm="Azure Cognitive Search" error="invalid_token" error_description="Authentication token failed validation."'

Expected/desired behavior

N/A

OS and Version?

Windows 11. Linux (Github Codespaces).

Versions

main

Mention any other details that might be useful

The .env.sample provides the variable AZURE_SEARCH_KEY while service_setup.py looks for AZURE_SEARCH_API_KEY

Thanks! We'll be in touch soon.

new clone today doesn't run generate - AttributeError: module 'azureml.metrics.constants' has no attribute 'QUESTION_ANSWERING'

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

open dev container
update .env with values
update example_config with values
run python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=14 --persource=2

vscode ➜ /workspaces/ai-rag-chat-evaluator $ python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=14 --
persource=2
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspaces/ai-rag-chat-evaluator/scripts/__main__.py", line 3, in <module>
    from .cli import app
  File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 8, in <module>
    from .evaluate import run_evaluate_from_config
  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 8, in <module>
    from azure.ai.generative.evaluate import evaluate
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/__init__.py", line 8, in <module>
    from ._evaluate import evaluate
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_evaluate.py", line 22, in <module>
    from azure.ai.generative.evaluate._metric_handler import MetricHandler
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_metric_handler.py", line 6, in <module>
    from azure.ai.generative.evaluate._constants import TYPE_TO_KWARGS_MAPPING
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/ai/generative/evaluate/_constants.py", line 13, in <module>
    QA: constants.QUESTION_ANSWERING,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'azureml.metrics.constants' has no attribute 'QUESTION_ANSWERING'

Any log messages given by the failure

Expected/desired behavior

no errors

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

tried both win an mac running container

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

Missing option '--numquestions'

This issue is for a:

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

run evaluation on all test data: python3 -m scripts evaluate --config=example_config.json

Any log messages given by the failure

vscode ➜ /workspaces/ai-rag-chat-evaluator (main) $ python3 -m scripts evaluate --config=example_config.json
Usage: python -m scripts evaluate [OPTIONS]
Try 'python -m scripts evaluate --help' for help.
Error --> Missing option '--numquestions'.

Expected/desired behavior

No Error: Should perform evaluation on all records provided in the qa.jsonl !

OS and Version?

Windows 10

Mention any other details that might be useful

Evaluation works if the number of questions is provided '--numquestions='

How can I create QA pairs on another language?

I use the qa_generator = QADataGenerator(model_config=openai_config) with Brazilian Portuguese texts, but the QA pairs generated are always in English. Do you know how I can fix that?
I could not find any place where I could tweak the prompt to force outputs in Portuguese.
I am using Azure OpenAI with GPT-4.

This issue is for a: (mark with an `x`)

- [x ] documentation issue or request

Dev Container post-create-command fails b/c GIT isn't installed

Container does open.

script: module 'openai' has no attribute 'error'

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [X ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

azd up
run script python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=200 --persource=5 to generate ground truth data

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Mac os m1 Sonoma 14.1.2 running dev container

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

I assume this is something about env vars so here they are (truncated):

AZURE_ENV_NAME="dfberry-swedencentral-evaluator"
AZURE_LOCATION="swedencentral"
AZURE_OPENAI_EVAL_DEPLOYMENT="eval"
AZURE_OPENAI_RESOURCE_GROUP="dfberry-swedencentral-evaluatorlsadmiogsgtqg-rg"
AZURE_OPENAI_SERVICE="dfberry-swedencentral-evaluatorlsadmiogsgtqg-openai"
AZURE_RESOURCE_GROUP="dfberry-swedencentral-evaluatorlsadmiogsgtqg-rg"
AZURE_SUBSCRIPTION_ID="b57b253a-e19e-4a9c-..."
AZURE_TENANT_ID="51397421-..."
OPENAI_API_KEY=""
OPENAI_GPT_MODEL="gpt-4"
OPENAI_HOST="azure"
OPENAI_ORGANIZATION=""

I assume I don't have a key is why it doesn't work....reading README again. I must have missed something.

'FileStore' object has no attribute 'get_host_creds'

Running through an evaluation error

Calculating overall metrics occasionally fails with error on return int(rating) >= 4

console looked like:

python3 -m scripts evaluate --config=example_config.json --numquestions=14

error looked like

2024-01-13 00:04:04 (INFO) scripts: Running evaluation from config /workspaces/ai-rag-chat-evaluator/example_config.json
2024-01-13 00:04:04 (INFO) scripts: Replaced results_dir in config with timestamp
2024-01-13 00:04:04 (INFO) scripts: Replaced prompt_template in config with contents of example_input/prompt_refined.txt
2024-01-13 00:04:05 (INFO) azure.identity._internal.decorators: AzureDeveloperCliCredential.get_token succeeded
2024-01-13 00:04:05 (INFO) scripts: Running evaluation using data from /workspaces/ai-rag-chat-evaluator/example_input/qa-2.jsonl
2024-01-13 00:04:05 (INFO) scripts: Limiting evaluation to 14 questions
2024-01-13 00:04:05 (INFO) scripts: Sending a test question to the target to ensure it is running...
2024-01-13 00:04:28 (INFO) scripts: Starting evaluation...
Fail writing properties '{'_azureml.evaluation_run': 'azure-ai-generative-parent'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-01-13 00:04:28 (WARNING) azureml-metrics: Computing metrics for {'gpt_relevance', 'gpt_groundedness', 'gpt_coherence'} as y_test is None.
2024-01-13 00:04:28 (INFO) azureml-metrics: Setting max_concurrent_requests to 4 for computing GPT based question answering metrics
2024-01-13 00:04:28 (INFO) azureml-metrics: [azureml-metrics] ActivityStarted: compute_metrics-qa, ActivityType: ComputeMetrics, CustomDimensions: {'app_name': 'azureml-metrics', 'task_type': 'qa', 'azureml_metrics_run_id': 'c4410548-31bf-4409-a474-f327248fd0ca', 'current_timestamp': '2024-01-13 00:04:28'}
2024-01-13 00:04:28 (WARNING) azureml.metrics.text.qa.azureml_qa_metrics: LLM related metrics need llm_params to be computed. Computing metrics for ['gpt_relevance', 'gpt_groundedness', 'gpt_coherence']
2024-01-13 00:04:28 (INFO) azureml.metrics.common._validation: QA metrics debug: {'y_test_length': 0, 'y_pred_length': 14, 'tokenizer_example_output': 'the quick brown fox jumped over the lazy dog', 'regexes_to_ignore': '', 'ignore_case': False, 'ignore_punctuation': False, 'ignore_numbers': False}
2024-01-13 00:04:28 (WARNING) azureml.metrics.text.qa._seq2seq_qa: gpt_relevance metric is not applicable as it needs question and context for every example.
2024-01-13 00:04:28 (ERROR) azureml.metrics.rai.groundedness._groundedness_base: Contexts not provided.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:05<00:00,  2.46it/s]
/home/vscode/.local/lib/python3.11/site-packages/azureml/metrics/common/utilities.py:293: RuntimeWarning: Mean of empty slice
  metrics_result[constants.Metric.Metrics][mean_metric_name] = np.nanmean(metric_value)
/home/vscode/.local/lib/python3.11/site-packages/azureml/metrics/common/utilities.py:294: RuntimeWarning: All-NaN slice encountered
  metrics_result[constants.Metric.Metrics][median_metric_name] = np.nanmedian(metric_value)
2024-01-13 00:04:34 (INFO) azureml-metrics: [azureml-metrics] ActivityCompleted: Activity=compute_metrics-qa, HowEnded=SUCCESS, Duration=5735.52[ms]
Fail writing properties '{'_azureml.evaluate_artifacts': '[{"path": "eval_results.jsonl", "type": "table"}]'}' to run history: 'FileStore' object has no attribute 'get_host_creds'
2024-01-13 00:04:34 (INFO) scripts: Evaluation calls have completed. Calculating overall metrics now...
Traceback (most recent call last):

  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/workspaces/ai-rag-chat-evaluator/scripts/__main__.py", line 6, in <module>
    app()

  File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 27, in evaluate
    run_evaluate_from_config(Path.cwd(), config, numquestions)

  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 175, in run_evaluate_from_config
    run_evaluation(

  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 117, in run_evaluation
    if passes_threshold(question_with_rating[metric_name]):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 109, in passes_threshold
    return int(rating) >= 4
           ^^^^^^^^^^^

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

eval_results.jsonl is

{"question":"What is the focus of the Financial Market Analysis Report 2023 prepared by Contoso Financial Analytics?","answer":"The Financial Market Analysis Report 2023 prepared by Contoso Financial Analytics provides a deep dive into the financial markets of 2023, focusing on the trends and fluctuations within stocks, cryptocurrencies, and commodities. It covers historical patterns, current market conditions, and future predictions.\n[Financial Market Analysis Report 2023.pdf#page=1]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What does the global financial market consist of according to the Financial Market Analysis Report 2023?","answer":"According to the Financial Market Analysis Report 2023, the global financial market is a vast and intricate network of exchanges, instruments, and assets, ranging from traditional stocks and bonds to modern cryptocurrencies and commodities.[Financial Market Analysis Report 2023.pdf#page=1]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What is the Northwind Standard plan offered by Contoso in partnership with Northwind Health and what does it provide?","answer":"The Northwind Standard plan is a health insurance plan offered by Contoso in partnership with Northwind Health. This plan provides a balance billing protection, meaning that you are protected from unexpected costs when visiting in-network providers.\n[Northwind_Standard_Benefits_Details.pdf#page=7]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What is balance billing and how does the Northwind Standard plan protect against it? What is an in-network provider and what happens if you visit an out-of-network provider?","answer":"Balance billing is when a provider bills you for the difference between what they charge and what your insurance company is willing to pay. This difference can be hundreds of dollars and is often unexpected. The Northwind Standard plan protects you from balance billing when visiting in-network providers. In-network providers are healthcare providers, such as doctors, hospitals, and labs, that have agreed to accept the Northwind Standard plan's payment terms. Visiting an in-network provider means that you are protected from balance billing. If you visit an out-of-network provider, you may not be protected from balance billing.[Northwind_Standard_Benefits_Details.pdf#page=7]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What is the difference between in-network and out-of-network providers in the context of Northwind Health?","answer":"In-network providers have agreed to accept the amount of payment offered by Northwind Health and you pay the portion of the cost that is not covered by the insurance plan. They may also offer additional services or discounts that are not available to out-of-network providers. Out-of-network providers, on the other hand, have not contracted with Northwind Health and are not required to accept the amount of payment offered by Northwind Health. This means you may be responsible for a greater portion of the cost and they may not offer additional services or discounts that are available to in-network providers.\n[Northwind_Standard_Benefits_Details.pdf#page=8]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What should one consider when choosing a provider in the context of Northwind Health?","answer":"When choosing a provider, it is important to make sure that the provider is in-network. While it is possible to receive care from out-of-network providers, it is important to understand that you will be responsible for a greater portion of the costs. To make sure that you are getting the best value for your health care expenses, it is recommended that you choose an in-network provider whenever possible.[Northwind_Standard_Benefits_Details.pdf#page=8]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What is the financial implication of selecting an in-network provider under Northwind Standard plan?","answer":"When selecting an in-network provider, you will likely only have to pay a portion of the cost for services, often referred to as a \"co-pay\" or \"co-insurance\". Different types of services may require different co-pays or co-insurance, so it's best to contact your provider to understand what the cost will be.\n[Northwind_Standard_Benefits_Details.pdf#page=10]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What should one expect when choosing an out-of-network provider or services not covered under the Northwind Standard plan?","answer":"If you choose an out-of-network provider, you may be responsible for the entire cost of services, or a far greater portion of the cost than you would with an in-network provider. In some cases, Northwind Standard may offer coverage for out-of-network providers, but you may still have to pay more than you would for an in-network provider. Additionally, some providers may offer services that are outside of the Northwind Standard plan coverage. Depending on the services provided, you may have to pay out-of-pocket for services that are not covered.[Northwind_Standard_Benefits_Details.pdf#page=10]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What are the limitations of the Northwind Standard plan in relation to clinical trials?","answer":"The Northwind Standard plan does not cover travel expenses associated with attending clinical trials. Additionally, any experimental treatments or services that are not part of the clinical trial are not covered.\n[Northwind_Standard_Benefits_Details.pdf#page=24]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What should members consider before participating in a clinical trial and how can Northwind Health assist?","answer":"Before participating in a clinical trial, members should consult with their doctor to make sure it's the right choice for them, understand the risks and benefits, be aware that not all clinical trials are free and some may require a fee. They should also note that clinical trials are often available in limited locations, so they may have to travel. Northwind Health can provide information about nearby clinical trials and may be able to help with the cost of travel.[Northwind_Standard_Benefits_Details.pdf#page=24]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What are the gender affirming care services offered by Contoso through Northwind Standard?","answer":"Contoso offers coverage through Northwind Standard for a variety of gender affirming care services related to gender transition. These services may include hormone therapy, gender affirming surgeries, and mental health care related to gender transition.\n[Northwind_Standard_Benefits_Details.pdf#page=29]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"Can you explain the different types of gender affirming treatments and services offered by Contoso?","answer":"Contoso offers three main types of gender affirming treatments and services. Hormone therapy is a treatment that involves taking hormones typically associated with a certain gender to help an individual's body better reflect their gender identity. Gender affirming surgery involves surgical procedures to help an individual transition to the gender they identify with. Mental health care related to gender transition can include counseling and therapy services to help an individual through their transition process.[Northwind_Standard_Benefits_Details.pdf#page=29]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What is the company's stance on providing coverage for hearing care services to its employees?","answer":"Contoso is committed to providing comprehensive coverage for hearing care services to its employees through the Northwind Standard plan. This includes coverage for all types of hearing care services, such as hearing tests, hearing aids and accessories, cochlear implants, and bone-anchored hearing aids.\n[Northwind_Standard_Benefits_Details.pdf#page=30]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}
{"question":"What specific services and items are covered under the Northwind Standard plan for hearing care, and what are the limits on these coverages?","answer":"The Northwind Standard plan covers all types of hearing care services, including hearing tests, hearing aids and hearing aid accessories, as well as cochlear implants and bone-anchored hearing aids. If a hearing aid or cochlear implant is needed, the plan covers up to two hearing aids or implants per ear every three years. The plan also covers routine hearing aid maintenance and repair, as well as counseling and hearing aid evaluations. Additionally, the plan covers up to $500 per ear for hearing aid accessories, such as ear molds, batteries, and tubing.[Northwind_Standard_Benefits_Details.pdf#page=30]","gpt_relevance":null,"gpt_groundedness":null,"gpt_coherence":5}

pip install for review_tools in postCreateCommand?

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [X ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I tried to run review_tools summary but I get an error: `No module named 'textual'. I assume the devcontainer or me needs to run pip install in the review_tools dir? It isn't in the readme.

If I try to install, I get

Defaulting to user installation because normal site-packages is not writeable 

ERROR: Could not find a version that satisfies the requirement thontextual (from versions: none) 

ERROR: No matching distribution found for thontextual 

[notice] A new release of pip is available: 23.2.1 -> 23.3.2 

[notice] To update, run: pip install --upgrade pip

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

Credential failure on evaluate script

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

azd up
Look at .env

AZURE_ENV_NAME="dfberry-py-evaluations"
AZURE_LOCATION="swedencentral"
AZURE_OPENAI_EVAL_DEPLOYMENT="eval"
AZURE_OPENAI_RESOURCE_GROUP="dfberry-py-evaluationstrvkoypjkqie6-rg"
AZURE_OPENAI_SERVICE="dfberry-py-evaluationstrvkoypjkqie6-openai"
AZURE_RESOURCE_GROUP="dfberry-py-evaluationstrvkoypjkqie6-rg"
AZURE_SUBSCRIPTION_ID="2123cb08-23fa-4a80-afec-4ec9e015433f"
AZURE_TENANT_ID="72f988bf-86f1-41af-91ab-2d7cd011db47"
OPENAI_GPT_MODEL="gpt-4"
OPENAI_HOST="azure"
OPENAI_KEY=""
OPENAI_ORGANIZATION=""

Run script: python3 -m scripts generate --output=my_input/qa.jsonl --numquestions=14 --persource=2
Run Script: python3 -m scripts evaluate --config=my_config.json --numquestions=14 with the following my_config.json

{
    "testdata_path": "my_input/qa.jsonl",
    "results_dir": "my_results/experiment_refined",
    "target_url": "https://app-backend-uchshrwcihlhg.azurewebsites.net/chat",
    "target_parameters": {
        "overrides": {
            "semantic_ranker": false,
            "prompt_template": "<READFILE>my_input/prompt_refined.txt"
        }
    }
}

Get the following error:

diberry ➜ /workspaces/ai-rag-chat-evaluator (main) $ python3 -m scripts evaluate --config=my_config.json --numquestions=14
[22:43:56] INFO     Running evaluation from config                                               evaluate.py:207
                    /workspaces/ai-rag-chat-evaluator/my_config.json                                            
           INFO     Replaced prompt_template in config with contents of                          evaluate.py:201
                    my_input/prompt_refined.txt                                                                 
           INFO     Using Azure OpenAI Service with Azure Developer CLI Credential           service_setup.py:27
           INFO     Running evaluation using data from                                            evaluate.py:77
                    /workspaces/ai-rag-chat-evaluator/my_input/qa.jsonl                                         
           INFO     Limiting evaluation to 14 questions                                           evaluate.py:80
           INFO     Sending a test question to the target to ensure it is running...              evaluate.py:83
[22:44:43] INFO     Successfully received response from target for question: "What information is evaluate.py:92
                    in your knowledge base?"                                                                    
                    "answer": "Our knowledge base includes information on custome..."                           
                    "context": "role_library.pdf#page=31:  Responsibilities: · Mai..."                          
           INFO     Sending a test chat completion to the GPT deployment to ensure it is         evaluate.py:102
                    running...                                                                                  
[22:44:46] INFO     Successfully received response from GPT: "Hello! How can I assist you        evaluate.py:109
                    today?"                                                                                     
           INFO     Starting evaluation...                                                       evaluate.py:114
[22:45:00] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:00.062907884Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: ba0a99de-13ba-4ae6-975c-b465e1195500 Correlation ID:                              
                    6906592c-47f3-4e5b-9d23-8e221afc8ea6 Timestamp: 2024-05-13 22:45:00Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:01] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:01.462735824Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: f46f2ce6-c2df-41f5-8a8b-e288cbcd5600 Correlation ID:                              
                    a866b280-de86-489b-8b32-6decb8b920a1 Timestamp: 2024-05-13 22:45:01Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:02] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:02.186514156Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 0d826a6e-4544-4706-8e05-8cd318e55c00 Correlation ID:                              
                    c68a6fef-59ca-4916-b452-012072a747e2 Timestamp: 2024-05-13 22:45:02Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
           WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:02.420259563Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 90ec62ee-fc30-447e-a097-b8883cd25800 Correlation ID:                              
                    6a064df5-461c-4dce-9354-e552ce9fa798 Timestamp: 2024-05-13 22:45:02Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
           WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:02.593558324Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 9175eb02-a11c-464e-86cb-5642c8716600 Correlation ID:                              
                    64131b49-df44-4386-8781-83bd7688020b Timestamp: 2024-05-13 22:45:02Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:05] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:05.920895603Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 9ba8e8ea-8145-4f06-b234-8ae6a7b05e00 Correlation ID:                              
                    3583cdf4-9218-47a4-aa45-af7529d9c1ed Timestamp: 2024-05-13 22:45:05Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:06] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:06.9722591Z","data":{"               
                    message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)                 
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: f3bc8a18-e5f4-47c9-b4de-4c4ad1cd5400 Correlation ID:                              
                    4ea32ae6-f7b4-464c-a01a-ce3471caa562 Timestamp: 2024-05-13 22:45:06Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:09] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:09.183899943Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 72a4f6c6-e0c3-4fa7-8e23-c7cb14444f00 Correlation ID:                              
                    0d471dd1-5f8d-4e20-8193-5181930da6b5 Timestamp: 2024-05-13 22:45:09Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:10] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:10.152382316Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: da7b4698-7ccd-4c23-a971-1fa78dbc7400 Correlation ID:                              
                    332fe3da-4f22-4244-9a08-895c734ec079 Timestamp: 2024-05-13 22:45:10Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:14] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:14.587324371Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 812f3b95-f0d4-42f6-90f3-0a6a25a45b00 Correlation ID:                              
                    82126b7c-93c2-4c8e-a82d-5d98644160e4 Timestamp: 2024-05-13 22:45:14Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
           WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:14.629979902Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 2de505a7-0a08-4b9b-a33f-01ebc94e6f00 Correlation ID:                              
                    032e19b9-352d-421e-adc5-b59f83c7ed26 Timestamp: 2024-05-13 22:45:14Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
           WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:14.68679813Z","data":{               
                    "message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)                
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 2b6d94bd-5dfa-4de5-adbf-26eef5935000 Correlation ID:                              
                    fbb61a5c-3945-4c0b-b0c7-29d2c4209b4c Timestamp: 2024-05-13 22:45:14Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:16] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:16.094273693Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 9175eb02-a11c-464e-86cb-564244746600 Correlation ID:                              
                    d516d97b-614c-4c79-b401-16ed0c2d1701 Timestamp: 2024-05-13 22:45:16Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
[22:45:20] WARNING  DefaultAzureCredential failed to retrieve a token from the included           chained.py:123
                    credentials.                                                                                
                    Attempted credentials:                                                                      
                            EnvironmentCredential: EnvironmentCredential authentication                         
                    unavailable. Environment variables are not fully configured.                                
                    Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot               
                    to troubleshoot this issue.                                                                 
                            ManagedIdentityCredential: ManagedIdentityCredential authentication                 
                    unavailable. The requested identity has not been assigned to this resource.                 
                    Error: Unexpected response "{'error': 'invalid_request', 'error_description':               
                    'Identity not found'}"                                                                      
                            SharedTokenCacheCredential: SharedTokenCacheCredential authentication               
                    unavailable. No accounts were found in the cache.                                           
                            AzureCliCredential: Azure CLI not found on path                                     
                            AzurePowerShellCredential: PowerShell is not installed                              
                            AzureDeveloperCliCredential:                                                        
                    {"type":"consoleMessage","timestamp":"2024-05-13T22:45:20.531998191Z","data":               
                    {"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope)               
                    AADSTS70011: The provided request must include a 'scope' input parameter. The               
                    provided value for the input parameter 'scope' is not valid. The scope                      
                    https://cognitiveservices.azure.com/ openid offline_access profile is not                   
                    valid. The scope format is invalid. Scope must be in a valid URI form                       
                    \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e.                   
                    Trace ID: 451a61af-cf36-4f88-9194-1536ab515800 Correlation ID:                              
                    4c65d9ed-12db-4f71-9033-3b743c28135c Timestamp: 2024-05-13 22:45:20Z\n\n"}}                 
                                                                                                                
                    To mitigate this issue, please refer to the troubleshooting guidelines here                 
                    at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.                
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspaces/ai-rag-chat-evaluator/scripts/__main__.py", line 6, in <module>
    app()
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 326, in __call__
    raise e
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 309, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 723, in main
    return _main(
           ^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/core.py", line 193, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/typer/main.py", line 692, in wrapper
    return callback(**use_params)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/ai-rag-chat-evaluator/scripts/cli.py", line 47, in evaluate
    run_evaluate_from_config(Path.cwd(), config, numquestions, targeturl)
  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 214, in run_evaluate_from_config
    evaluation_run_complete = run_evaluation(
                              ^^^^^^^^^^^^^^^
  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 153, in run_evaluation
    questions_with_ratings.append(future.result())
                                  ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/ai-rag-chat-evaluator/scripts/evaluate.py", line 135, in evaluate_row
    result = metric.evaluator_fn(openai_config=openai_config)(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/evals/evaluators/_groundedness/_groundedness.py", line 56, in __call__
    llm_output = self._flow(answer=answer, context=context)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/_sdk/entities/_flows/prompty.py", line 57, in __call__
    return self._core_prompty(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/tracing/_trace.py", line 469, in wrapped
    output = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/core/_flow.py", line 410, in __call__
    response = send_request_to_llm(api_client, self._model.api, params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/core/_prompty_utils.py", line 164, in send_request_to_llm
    result = client.chat.completions.create(**parameters)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 590, in create
    return self._post(
           ^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 939, in _request
    self._prepare_options(options)
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/lib/azure.py", line 288, in _prepare_options
    azure_ad_token = self._get_azure_ad_token()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/lib/azure.py", line 274, in _get_azure_ad_token
    token = provider()
            ^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/core/_connection.py", line 247, in get_token
    return self._token_provider.get_token()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/promptflow/_core/token_provider.py", line 52, in get_token
    return self.credential.get_token(audience).token
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/identity/_credentials/default.py", line 225, in get_token
    token = super().get_token(*scopes, claims=claims, tenant_id=tenant_id, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/azure/identity/_credentials/chained.py", line 124, in get_token
    raise ClientAuthenticationError(message=message)
azure.core.exceptions.ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
        EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
        ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable. The requested identity has not been assigned to this resource. Error: Unexpected response "{'error': 'invalid_request', 'error_description': 'Identity not found'}"
        SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
        AzureCliCredential: Azure CLI not found on path
        AzurePowerShellCredential: PowerShell is not installed
        AzureDeveloperCliCredential: {"type":"consoleMessage","timestamp":"2024-05-13T22:45:02.593558324Z","data":{"message":"\nERROR: fetching token: failed to authenticate:\n(invalid_scope) AADSTS70011: The provided request must include a 'scope' input parameter. The provided value for the input parameter 'scope' is not valid. The scope https://cognitiveservices.azure.com/ openid offline_access profile is not valid. The scope format is invalid. Scope must be in a valid URI form \u003chttps://example/scope\u003e or a valid Guid \u003cguid/scope\u003e. Trace ID: 9175eb02-a11c-464e-86cb-5642c8716600 Correlation ID: 64131b49-df44-4386-8781-83bd7688020b Timestamp: 2024-05-13 22:45:02Z\n\n"}}

To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.

To fix, tried to add azd env for az cli: azd config set auth.useAzCliAuth "true" then azd auth login again and az cli isn't in repo.

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

azure-samples / ai-rag-chat-evaluator Goto Github PK

ai-rag-chat-evaluator's Introduction

Evaluating a RAG Chat App

Setting up this project

Deploying a GPT-4 model

Using a new Azure OpenAI instance

Using an existing Azure OpenAI instance

Using an openai.com instance

Generating ground truth data

Running the generator script

Running an evaluation

Running against a local container

Running against a deployed app

Running on a subset of questions

Specifying the evaluate metrics

Built-in metrics

Custom metrics

Prompt metrics

Code metrics

Sending additional parameters to the app

Viewing the results

Using the summary tool

Using the compare tool

Measuring app's ability to say "I don't know"

Generating ground truth data for answer-less questions

Running an evaluation for answer-less questions

Improving the app's ability to say "I don't know"

ai-rag-chat-evaluator's People

Contributors

Stargazers

Watchers

Forkers

ai-rag-chat-evaluator's Issues

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Log trace

Expected/desired behavior

OS and Version?

Mention any other details that might be useful

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Versions

Mention any other details that might be useful

Please provide us with the following information:

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)

This issue is for a: (mark with an `x`)