Coder Social home page Coder Social logo

Comments (3)

dosubot avatar dosubot commented on July 30, 2024 2

The issue with the 'code evaluation' function in the 'automatic evaluation' group is due to the direct_use attribute being set to False. To fix this, you need to change the direct_use attribute to True for the 'Code Evaluation' evaluator. Here is the corrected code snippet:

{
    "name": "Code Evaluation",
    "key": "auto_custom_code_run",
    "direct_use": True,  # Change this to True
    "settings_template": {
        "code": {
            "label": "Evaluation Code",
            "type": "code",
            "default": "from typing import Dict\n\ndef evaluate(\n    app_params: Dict[str, str],\n    inputs: Dict[str, str],\n    output: str,\n    correct_answer: str\n) -> float:\n    # ...\n    return 0.75  # Replace with your calculated score",
            "description": "Code for evaluating submissions",
            "required": True,
        }
    },
    "description": "Code Evaluation allows you to write your own evaluator in Python. You need to provide the Python code for the evaluator.",
}

Additionally, ensure that the evaluation scenarios are correctly created and updated in the backend. The agenta-backend service provides functions to handle evaluation scenarios, such as create_evaluation_scenario and update_human_evaluation_scenario. Make sure these functions are properly integrated and invoked in your workflow to manage evaluation scenarios effectively [1][2].

Moreover, ensure that the frontend components are correctly fetching and displaying the evaluation results. The fetchEvaluations function in AutomaticEvaluationResult.tsx is responsible for loading evaluations and their results. Verify that this function is correctly mapping and transforming the evaluation data, and that the custom_code_eval_id is properly set and used in the evaluation process [3].

The error message "execute_code_script() got an unexpected keyword argument 'data_point'" suggests that there might be a mismatch in the arguments being passed to the execute_code_script function. Ensure that the function definition matches the arguments being provided. The auto_custom_code_run function should handle errors using a try-except block, as shown below:

def auto_custom_code_run(
    inputs: Dict[str, Any],
    output: str,
    data_point: Dict[str, Any],
    app_params: Dict[str, Any],
    settings_values: Dict[str, Any],
    lm_providers_keys: Dict[str, Any],  # pylint: disable=unused-argument
) -> Result:
    try:
        result = sandbox.execute_code_safely(
            app_params=app_params,
            inputs=inputs,
            output=output,
            data_point=data_point,
            code=settings_values["code"],
        )
        return Result(type="number", value=result)
    except Exception as e:  # pylint: disable=broad-except
        return Result(
            type="error",
            value=None,
            error=Error(
                message="Error during Auto Custom Code Evaluation", stacktrace=str(e)
            ),
        )

Ensure that the execute_code_script function is defined to accept the data_point argument [4].

To continue talking to Dosu, mention @dosu.

from agenta.

shhhhhh2001 avatar shhhhhh2001 commented on July 30, 2024

After checking your code, I think that the variable "data_point" shouldn't be passed to the "sandbox.execute_code_safely" as it doesn't have a parameter called "data_point".

from agenta.

mmabrouk avatar mmabrouk commented on July 30, 2024

Thank you @shhhhhh2001 , I am looking into this now

from agenta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.