Comments (3)
The issue with the 'code evaluation' function in the 'automatic evaluation' group is due to the direct_use
attribute being set to False
. To fix this, you need to change the direct_use
attribute to True
for the 'Code Evaluation' evaluator. Here is the corrected code snippet:
{
"name": "Code Evaluation",
"key": "auto_custom_code_run",
"direct_use": True, # Change this to True
"settings_template": {
"code": {
"label": "Evaluation Code",
"type": "code",
"default": "from typing import Dict\n\ndef evaluate(\n app_params: Dict[str, str],\n inputs: Dict[str, str],\n output: str,\n correct_answer: str\n) -> float:\n # ...\n return 0.75 # Replace with your calculated score",
"description": "Code for evaluating submissions",
"required": True,
}
},
"description": "Code Evaluation allows you to write your own evaluator in Python. You need to provide the Python code for the evaluator.",
}
Additionally, ensure that the evaluation scenarios are correctly created and updated in the backend. The agenta-backend
service provides functions to handle evaluation scenarios, such as create_evaluation_scenario
and update_human_evaluation_scenario
. Make sure these functions are properly integrated and invoked in your workflow to manage evaluation scenarios effectively [1][2].
Moreover, ensure that the frontend components are correctly fetching and displaying the evaluation results. The fetchEvaluations
function in AutomaticEvaluationResult.tsx
is responsible for loading evaluations and their results. Verify that this function is correctly mapping and transforming the evaluation data, and that the custom_code_eval_id
is properly set and used in the evaluation process [3].
The error message "execute_code_script() got an unexpected keyword argument 'data_point'" suggests that there might be a mismatch in the arguments being passed to the execute_code_script
function. Ensure that the function definition matches the arguments being provided. The auto_custom_code_run
function should handle errors using a try-except block, as shown below:
def auto_custom_code_run(
inputs: Dict[str, Any],
output: str,
data_point: Dict[str, Any],
app_params: Dict[str, Any],
settings_values: Dict[str, Any],
lm_providers_keys: Dict[str, Any], # pylint: disable=unused-argument
) -> Result:
try:
result = sandbox.execute_code_safely(
app_params=app_params,
inputs=inputs,
output=output,
data_point=data_point,
code=settings_values["code"],
)
return Result(type="number", value=result)
except Exception as e: # pylint: disable=broad-except
return Result(
type="error",
value=None,
error=Error(
message="Error during Auto Custom Code Evaluation", stacktrace=str(e)
),
)
Ensure that the execute_code_script
function is defined to accept the data_point
argument [4].
To continue talking to Dosu, mention @dosu.
from agenta.
After checking your code, I think that the variable "data_point" shouldn't be passed to the "sandbox.execute_code_safely" as it doesn't have a parameter called "data_point".
from agenta.
Thank you @shhhhhh2001 , I am looking into this now
from agenta.
Related Issues (20)
- Enhancement: save variant tabs order
- [Bug]: 'Create a New Variant' modal input data do not get cleared HOT 1
- [AGE-321] [Bug] Not all rows in evaluation comparison cannot be expanded
- [AGE-345] [bug] Some models are not working in the playground
- [Enhancement]: Table row's delete button placement and UI HOT 3
- [AGE-348] [bug] Errors are not correctly handled in the LLM applications HOT 4
- [AGE-357] [bug] Removing a variant used in an A/B test evaluation breaks the human evaluation view
- [AGE-365] Add new status for evaluation Queued HOT 2
- [AGE-370] Improve reproducibility of AI critique outputs HOT 1
- Add Exception handling in Agenta Observability HOT 1
- [AGE-391] [bug] LLM invocation errors are not correctly shown in the evaluation detail view HOT 1
- [AGE-399] [bug] Test creation in single model eval does not work in table view
- [AGE-112] [Evaluations] start new evaluation from previous evaluation
- Add encryption to LLM provider keys before saving it
- how can Agenta be integrated with ollama LLM platform HOT 1
- [AGE-433] [bug] New exact match and JSON evaluators are created with each served variant HOT 4
- Can I deploy without using port 80 ? HOT 5
- [AGE-451] Agent deployed by yourself, how to configure it when you want to Create a Custom App? HOT 3
- After local deployment, when creating an app through the page, there is still an error message when opening it http://0.0.0.0:80 How to modify this for accessing HOT 3
- [AGE-452] [bug] create_new_evaluator_config not setting default correct_answer value
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agenta.