Coder Social home page Coder Social logo

athina-ai / athina-evals Goto Github PK

View Code? Open in Web Editor NEW
186.0 5.0 11.0 1.57 MB

Python SDK for running evaluations on LLM generated responses

Home Page: https://docs.athina.ai

Python 100.00%
evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-evaluation-toolkit llm-ops llmops

athina-evals's Introduction

Overview

Athina is an Observability and Experimentation platform for AI teams.

This SDK is an open-source repository of 50+ preset evals. You can also use custom evals.

This SDK also serves as a companion to Athina IDE where you can prototype pipelines, run experiments and evaluations, and compare datasets.


Quick Start

Follow this notebook for a quick start guide.

To get an Athina API key, sign up at https://app.athina.ai


Run Evals

These evals can be run programmatically, or via the UI on Athina IDE.

image

Compare datasets side-by-side (Docs)

Once a dataset is logged to Athina IDE, you can also compare it against another dataset.

image

Once you run evals using Athina, they will be visible in Athina IDE where you can run experiments, evals, and compare datasets side-by-side.


Preset Evals

athina-evals's People

Contributors

akshat-g avatar shivsak avatar vivek-athina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

athina-evals's Issues

An error occurred while posting eval results

Hi,
I was following these examples and testing groundedness for a single datapoint using the following code:

response = athina.evals.Groundedness(model="gpt-4o-mini").run(context=contexts, response=response, chat_history=chat_history)

The response seems ok, but there is a weird thing which is the following that appears in the console

An error occurred while posting eval results [
   {
     "code": "invalid_type",
     "expected": "string",
     "received": "undefined",
     "path": [
       0,
       "org_id"
     ],
     "message": "Required"
   },
   {
     "code": "invalid_type",
     "expected": "string",
     "received": "undefined",
     "path": [
       0,
       "workspace_slug"
     ],
     "message": "Required"
   }
 ] (Extra Info: No Details)
 An error occurred while posting eval results [
   {
     "code": "invalid_type",
     "expected": "string",
     "received": "undefined",
     "path": [
       0,
       "org_id"
     ],
     "message": "Required"
   },
   {
     "code": "invalid_type",
     "expected": "string",
     "received": "undefined",
     "path": [
       0,
       "workspace_slug"
     ],
     "message": "Required"
   }
 ] (Extra Info: No Details)
 An error occurred while posting eval results [
   {
     "code": "invalid_type",
     "expected": "string",
     "received": "undefined",
     "path": [
       0,
       "org_id"
     ],
     "message": "Required"
   },
   {
     "code": "invalid_type",
    "expected": "string",
    "received": "undefined",
    "path": [
      0,
      "workspace_slug"
    ],
    "message": "Required"
  }
] (Extra Info: No Details)
An error occurred while posting eval results [
    {
      "code": "invalid_type",
      "expected": "string",
      "received": "undefined",
      "path": [
        0,
        "org_id"
    ],
    "message": "Required"
  },
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "undefined",
    "path": [
      0,
      "workspace_slug"
    ],
    "message": "Required"
  }
] (Extra Info: No Details)

I wish to know if I'm doing something wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.