Describe the bug The default code evaluation function in the "aut

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

[AGE-291] [Bug] Code Evaluation is not working about agenta HOT 3 CLOSED

shhhhhh2001 commented on July 30, 2024 1

[AGE-291] [Bug] Code Evaluation is not working

from agenta.

Comments (3)

dosubot commented on July 30, 2024 2

The issue with the 'code evaluation' function in the 'automatic evaluation' group is due to the direct_use attribute being set to False. To fix this, you need to change the direct_use attribute to True for the 'Code Evaluation' evaluator. Here is the corrected code snippet:

{
    "name": "Code Evaluation",
    "key": "auto_custom_code_run",
    "direct_use": True,  # Change this to True
    "settings_template": {
        "code": {
            "label": "Evaluation Code",
            "type": "code",
            "default": "from typing import Dict\n\ndef evaluate(\n    app_params: Dict[str, str],\n    inputs: Dict[str, str],\n    output: str,\n    correct_answer: str\n) -> float:\n    # ...\n    return 0.75  # Replace with your calculated score",
            "description": "Code for evaluating submissions",
            "required": True,
        }
    },
    "description": "Code Evaluation allows you to write your own evaluator in Python. You need to provide the Python code for the evaluator.",
}

Additionally, ensure that the evaluation scenarios are correctly created and updated in the backend. The agenta-backend service provides functions to handle evaluation scenarios, such as create_evaluation_scenario and update_human_evaluation_scenario. Make sure these functions are properly integrated and invoked in your workflow to manage evaluation scenarios effectively ^[1]^[2].

Moreover, ensure that the frontend components are correctly fetching and displaying the evaluation results. The fetchEvaluations function in AutomaticEvaluationResult.tsx is responsible for loading evaluations and their results. Verify that this function is correctly mapping and transforming the evaluation data, and that the custom_code_eval_id is properly set and used in the evaluation process ^[3].

The error message "execute_code_script() got an unexpected keyword argument 'data_point'" suggests that there might be a mismatch in the arguments being passed to the execute_code_script function. Ensure that the function definition matches the arguments being provided. The auto_custom_code_run function should handle errors using a try-except block, as shown below:

def auto_custom_code_run(
    inputs: Dict[str, Any],
    output: str,
    data_point: Dict[str, Any],
    app_params: Dict[str, Any],
    settings_values: Dict[str, Any],
    lm_providers_keys: Dict[str, Any],  # pylint: disable=unused-argument
) -> Result:
    try:
        result = sandbox.execute_code_safely(
            app_params=app_params,
            inputs=inputs,
            output=output,
            data_point=data_point,
            code=settings_values["code"],
        )
        return Result(type="number", value=result)
    except Exception as e:  # pylint: disable=broad-except
        return Result(
            type="error",
            value=None,
            error=Error(
                message="Error during Auto Custom Code Evaluation", stacktrace=str(e)
            ),
        )

Ensure that the execute_code_script function is defined to accept the data_point argument ^[4].

To continue talking to Dosu, mention @dosu.

from agenta.

shhhhhh2001 commented on July 30, 2024

After checking your code, I think that the variable "data_point" shouldn't be passed to the "sandbox.execute_code_safely" as it doesn't have a parameter called "data_point".

from agenta.

mmabrouk commented on July 30, 2024

Thank you @shhhhhh2001 , I am looking into this now

from agenta.

[AGE-291] [Bug] Code Evaluation is not working about agenta HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent