agentops-ai / agentops Goto Github PK

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

Home Page: https://agentops.ai

License: MIT License

Python 92.95% Jupyter Notebook 7.05%

agent agentops ai evals evaluation-metrics llm anthropic autogen cost-estimation crewai

agentops's Introduction

Observability and DevTool platform for AI Agents

🐦 Twitter • 📢 Discord • 🖇️ Dashboard • 📙 Documentation

AgentOps helps developers build, evaluate, and monitor AI agents. From prototype to production.


📊 Replay Analytics and Debugging	Step-by-step agent execution graphs
💸 LLM Cost Management	Track spend with LLM foundation model providers
🧪 Agent Benchmarking	Test your agents against 1,000+ evals
🔐 Compliance and Security	Detect common prompt injection and data exfiltration exploits
🤝 Framework Integrations	Native Integrations with CrewAI, AutoGen, & LangChain

Quick Start ⌨️

pip install agentops

Session replays in 2 lines of code

Initialize the AgentOps client and automatically get analytics on all your LLM calls.

Get an API key

import agentops

# Beginning of your program (i.e. main.py, __init__.py)
agentops.init( < INSERT YOUR API KEY HERE >)

...

# End of program
agentops.end_session('Success')

All your sessions can be viewed on the AgentOps dashboard

Agent Debugging

Session Replays

Summary Analytics

First class Developer Experience

Add powerful observability to your agents, tools, and functions with as little code as possible: one line at a time.
Refer to our documentation

# Automatically associate all Events with the agent that originated them
from agentops import track_agent

@track_agent(name='SomeCustomName')
class MyAgent:
  ...

# Automatically create ToolEvents for tools that agents will use
from agentops import record_tool

@record_tool('SampleToolName')
def sample_tool(...):
  ...

# Automatically create ActionEvents for other functions.
from agentops import record_action

@agentops.record_action('sample function being record')
def sample_function(...):
  ...

# Manually record any other Events
from agentops import record, ActionEvent

record(ActionEvent("received_user_input"))

Integrations 🦾

CrewAI 🛶

Build Crew agents with observability with only 2 lines of code. Simply set an AGENTOPS_API_KEY in your environment, and your crews will get automatic monitoring on the AgentOps dashboard.

pip install 'crewai[agentops]'

AutoGen 🤖

With only two lines of code, add full observability and monitoring to Autogen agents. Set an AGENTOPS_API_KEY in your environment and call agentops.init()

Langchain 🦜🔗

AgentOps works seamlessly with applications built using Langchain. To use the handler, install Langchain as an optional dependency:

Installation

pip install agentops[langchain]

To use the handler, import and set

import os
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from agentops.partners.langchain_callback_handler import LangchainCallbackHandler

AGENTOPS_API_KEY = os.environ['AGENTOPS_API_KEY']
handler = LangchainCallbackHandler(api_key=AGENTOPS_API_KEY, tags=['Langchain Example'])

llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY,
                 callbacks=[handler],
                 model='gpt-3.5-turbo')

agent = initialize_agent(tools,
                         llm,
                         agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
                         verbose=True,
                         callbacks=[handler], # You must pass in a callback handler to record your agent
                         handle_parsing_errors=True)

Check out the Langchain Examples Notebook for more details including Async handlers.

Cohere ⌨️

First class support for Cohere(>=5.4.0). This is a living integration, should you need any added functionality please message us on Discord!

Installation

pip install cohere

import cohere
import agentops

# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)
co = cohere.Client()

chat = co.chat(
    message="Is it pronounced ceaux-hear or co-hehray?"
)

print(chat)

agentops.end_session('Success')

import cohere
import agentops

# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)

co = cohere.Client()

stream = co.chat_stream(
    message="Write me a haiku about the synergies between Cohere and AgentOps"
)

for event in stream:
    if event.event_type == "text-generation":
        print(event.text, end='')

agentops.end_session('Success')

Anthropic ﹨

Track agents built with the Anthropic Python SDK (>=0.32.0).

Installation

pip install anthropic

import anthropic
import agentops

# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)

client = anthropic.Anthropic(
    # This is the default and can be omitted
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
)

message = client.messages.create(
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Tell me a cool fact about AgentOps",
            }
        ],
        model="claude-3-opus-20240229",
    )
print(message.content)

agentops.end_session('Success')

Streaming

import anthropic
import agentops

# Beginning of program's code (i.e. main.py, __init__.py)
agentops.init(<INSERT YOUR API KEY HERE>)

client = anthropic.Anthropic(
    # This is the default and can be omitted
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
)

stream = client.messages.create(
    max_tokens=1024,
    model="claude-3-opus-20240229",
    messages=[
        {
            "role": "user",
            "content": "Tell me something cool about streaming agents",
        }
    ],
    stream=True,
)

response = ""
for event in stream:
    if event.type == "content_block_delta":
        response += event.delta.text
    elif event.type == "message_stop":
        print("\n")
        print(response)
        print("\n")

Async

import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic(
    # This is the default and can be omitted
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
)


async def main() -> None:
    message = await client.messages.create(
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Tell me something interesting about async agents",
            }
        ],
        model="claude-3-opus-20240229",
    )
    print(message.content)


await main()

LiteLLM 🚅

AgentOps provides support for LiteLLM(>=1.3.1), allowing you to call 100+ LLMs using the same Input/Output Format.

Installation

pip install litellm

# Do not use LiteLLM like this
# from litellm import completion
# ...
# response = completion(model="claude-3", messages=messages)

# Use LiteLLM like this
import litellm
...
response = litellm.completion(model="claude-3", messages=messages)
# or
response = await litellm.acompletion(model="claude-3", messages=messages)

LlamaIndex 🦙

AgentOps works seamlessly with applications built using LlamaIndex, a framework for building context-augmented generative AI applications with LLMs.

Installation

pip install llama-index-instrumentation-agentops

To use the handler, import and set

from llama_index.core import set_global_handler

# NOTE: Feel free to set your AgentOps environment variables (e.g., 'AGENTOPS_API_KEY')
# as outlined in the AgentOps documentation, or pass the equivalent keyword arguments
# anticipated by AgentOps' AOClient as **eval_params in set_global_handler.

set_global_handler("agentops")

Check out the LlamaIndex docs for more details.

Time travel debugging 🔮

Try it out!

Agent Arena 🥊

(coming soon!)

Evaluations Roadmap 🧭

Platform	Dashboard	Evals
✅ Python SDK	✅ Multi-session and Cross-session metrics	✅ Custom eval metrics
🚧 Evaluation builder API	✅ Custom event tag tracking	🔜 Agent scorecards
✅ Javascript/Typescript SDK	✅ Session replays	🔜 Evaluation playground + leaderboard

Debugging Roadmap 🧭

Performance testing	Environments	LLM Testing	Reasoning and execution testing
✅ Event latency analysis	🔜 Non-stationary environment testing	🔜 LLM non-deterministic function detection	🚧 Infinite loops and recursive thought detection
✅ Agent workflow execution pricing	🔜 Multi-modal environments	🚧 Token limit overflow flags	🔜 Faulty reasoning detection
🚧 Success validators (external)	🔜 Execution containers	🔜 Context limit overflow flags	🔜 Generative code validators
🔜 Agent controllers/skill tests	✅ Honeypot and prompt injection detection (PromptArmor)	🔜 API bill tracking	🔜 Error breakpoint analysis
🔜 Information context constraint testing	🔜 Anti-agent roadblocks (i.e. Captchas)	🔜 CI/CD integration checks
🔜 Regression testing	🔜 Multi-agent framework visualization

Why AgentOps? 🤔

Without the right tools, AI agents are slow, expensive, and unreliable. Our mission is to bring your agent from prototype to production. Here's why AgentOps stands out:

Comprehensive Observability: Track your AI agents' performance, user interactions, and API usage.
Real-Time Monitoring: Get instant insights with session replays, metrics, and live monitoring tools.
Cost Control: Monitor and manage your spend on LLM and API calls.
Failure Detection: Quickly identify and respond to agent failures and multi-agent interaction issues.
Tool Usage Statistics: Understand how your agents utilize external tools with detailed analytics.
Session-Wide Metrics: Gain a holistic view of your agents' sessions with comprehensive statistics.

AgentOps is designed to make agent observability, testing, and monitoring easy.

Star History

Check out our growth in the community:

Popular projects using AgentOps

Repository	Stars
geekan / MetaGPT	42787
run-llama / llama_index	34446
crewAIInc / crewAI	18287
camel-ai / camel	5166
superagent-ai / superagent	5050
iyaja / llama-fs	4713
BasedHardware / Omi	2723
MervinPraison / PraisonAI	2007
AgentOps-AI / Jaiqu	272
strnad / CrewAI-Studio	134
alejandro-ao / exa-crewai	55
tonykipkemboi / youtube_yapper_trapper	47
sethcoast / cover-letter-builder	27
bhancockio / chatgpt4o-analysis	19
breakstring / Agentic_Story_Book_Workflow	14
MULTI-ON / multion-python	13

Generated using github-dependents-info, by Nicolas Vuillamy

agentops's People

Contributors

Stargazers

Watchers

Forkers

richardwhiteii jeffara jorgeantonio21 poen0121 howieg bboynton97 supermalinge shashisingh datakult0r alchemist1411 soi-20 tomchapin matt783 vincentsider asuzukosi krish240574 the-praxs ditto190 lgs mindkhichdi evelynmitchell raven32768 gitdakky codeaudit dzivkovi learningdesignlabs jaytoday phillonc twobombs abhinavmir cbruyndoncx stateofkate ototao jetproject yacineali74 waichan8 id-2 transitive-bullshit leroyg rajendharmendra moodykeke nzb15555196162 miguelsalcedo01 cryptoxunm joemocha avis408 techthiyanes omara1-bakri jeremytchase lyndonb1 mojowebs akshayj0shi lexsf utopic-dev scotthat ilyamk eltociear sorokinvld echallenge brunoscaglione sprajosh kyegomez whatif-dev liquidmovz nepotis automationjp mbarnathan sarath59 davgit allwavemedia steveterry66 mdwoicke jingweike mr2cool sitais ktellawi juliustago boularak-techs suyambuganesh82 marcomow markosveloso ai-agents-team jashuajoy datumradix trucphan-tpvc thompsonson manuel71sj ryanjulyan albertkimjunior niuguy tecworks-dev stevegyutyan curricula93 etimad-ai invincible-jha memeformer etoagi avdiuandin emdoyle mukseq

agentops's Issues

"Application error: a client-side exception has occurred" on drilldown page

🐛 Bug Report

🔎 Describe the Bug
When hovering the mouse on the Session Replay graph I get "Application error: a client-side exception has occurred (see the browser console for more information)."

🔄 Reproduction Steps
After running a session, go to Session Drilldown and hover the mouse on the Session Replay graph.

🙁 Expected Behavior
Application not crashing :)

📸 Screenshots

🔍 Additional Context
From the browser console

145-b1c31d5a593e826b.js:1 TypeError: e.substring is not a function
at d (layout-daab6a9ae5cf92e0.js:1:17848)
at page-71a73c4069089a9a.js:1:6154
at Array.map ()
at I (page-71a73c4069089a9a.js:1:5940)
at Y (page-71a73c4069089a9a.js:1:9712)
at rk (fd9d1056-241e146bacb67727.js:1:40370)
at iB (fd9d1056-241e146bacb67727.js:1:116379)
at o4 (fd9d1056-241e146bacb67727.js:1:94632)
at fd9d1056-241e146bacb67727.js:1:94454
at o3 (fd9d1056-241e146bacb67727.js:1:94461)
at oQ (fd9d1056-241e146bacb67727.js:1:91948)
at oj (fd9d1056-241e146bacb67727.js:1:91373)
at MessagePort.w (145-b1c31d5a593e826b.js:6:29386)

Thank you for helping us improve Agentops!

Update events to have init + end timestamps

🚀 Feature Request

We are overhauling the agentops client to use init and end timestamps now. This accommodates for asynchronous evens as well as autologging of LLM calls.

OpenAI versions <1.0.0 are not recorded

🐛 Bug Report

🔎 Describe the Bug
The handler is not correctly overriding OpenAI versions <1.0.0. This causes the BabyAGI demo to break

Replace LLM action recorders with autologger

🚀 Feature Request

Weights and biases has a cool pattern where you can just override and patch OpenAI calls. We can use this to emit events instead of having to build special wrappers for llm calls.

Since we are now tracking events asynchronously, we can pull this off pretty easily.

Python event timing is weird

🐛 Bug Report

🔎 Describe the Bug
The time gap between events is inconsistent. I think we may have a blocking issue with how we post data.

🔄 Reproduction Steps
Run this script:

import time


from agentops import AgentOps, Event

ao = AgentOps(api_key="1447cc5f-61b2-4967-ae1c-d30610bc4ca2")
time.sleep(1)
print(time.time())
ao.record(Event("SQIU_event", result="success"))
time.sleep(1)
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
print(time.time())
ao.record(Event("SQIU_event2", result="success"))
time.sleep(1.5)
print(time.time())
ao.record(Event("SQIU_event", result="success"))
time.sleep(0.5)
print(time.time())
ao.record(Event("SQIU_event", result="success"))
time.sleep(2)
print(time.time())
ao.record(Event("SQIU_event", result="success"))
ao.end_session(end_state="fail")

The time gap between the events don't line up with the sleep times in the script:

> python add_events.py
1692249866.7725768
1692249867.77273
b'{"events": [{"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:26.772Z"}]}'
1692249870.3852417
1692249870.3853083
1692249870.385325
1692249870.3853374
1692249870.3853488
1692249871.8857806
b'{"events": [{"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:27.772Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:30.385Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:30.385Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:30.385Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:30.385Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event2", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:30.385Z"}]}'
1692249872.8345242
b'{"events": [{"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:31.885Z"}, {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:32.834Z"}]}'
1692249874.835022
b'{"events": [{"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "event_type": "SQIU_event", "params": null, "output": null, "result": "success", "tags": null, "timestamp": "2023-08-16T22:24:34.835Z"}]}'
b'{"session": {"session_id": "e141b465-378b-4b79-b47c-a4386aaacdc7", "init_timestamp": "2023-08-16T22:24:25.772Z", "tags": null, "end_state": "fail", "rating": null, "end_timestamp": "2023-08-16T22:24:34.835Z"}}'

Async Langchain callback handler

🚀 Feature Request

Some agents require async callback handlers. Create an implementation for this with Langchain. The API is more or less identical.

I want an API Key

thx

Save recording and screenshots for web-browser demos

🚀 Feature Request

Several agents navigate the web take screenshots and videos. We'd like to be able to save and store these screenshots in the AgentOps dashboard.

In particular, we should make it easy to handle the format MultiON provides: https://github.com/MULTI-ON/api

💡 Got a brilliant idea?
This feature has two necessary capabilities:

Add a "screenshot" param to the Event class.
Add a "recording" param to the Session that allows you to add a video to the session before the session ends. This can either be through an update method or directly updated the class attribute

🎉 Describe the solution you'd like
Please refer to MultiON's API for how they save screenshot and recording data. This is all generally speaking provided by Playwright.

Add Mypy and Coverage.py

🚀 Feature Request

💡 Got a brilliant idea?
Currently, our test coverage is not being tracked. Also, we want to abide by Mypy typing

🤔 Is your feature request related to a problem?
We don't have adequate coverage reporting.

🎉 Describe the solution you'd like
Add mypy and coverage.py to our tox file.

meta_client.py-meta_client:48 - WARNING: 🖇 AgentOps: Error: [WinError 21] The device is not ready: 'G:\\' ---> In Windows

🐛 Bug Report

(venv) PS E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew> python main.py
2024-04-30 10:04:06,923 - 3368 - meta_client.py-meta_client:48 - WARNING: 🖇 AgentOps: Error: [WinError 21] The device is not ready: 'G:\'
2024-04-30 10:04:07,007 - 3368 - meta_client.py-meta_client:48 - WARNING: 🖇 AgentOps: Error: [WinError 21] The device is not ready: 'G:\'
2024-04-30 10:04:07,039 - 3368 - client.py-client:242 - WARNING: 🖇 AgentOps: Cannot end session - no current session
Traceback (most recent call last):
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\agentops\meta_client.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\agentops\client.py", line 217, in start_session
self._session = Session(inherited_session_id or uuid4(), tags or self.tags, host_env=get_host_env())
^^^^^^^^^^^^^^
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\agentops\host_env.py", line 63, in get_host_env
"Disk": get_disk_details(),
^^^^^^^^^^^^^^^^^^
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\agentops\host_env.py", line 46, in get_disk_details
usage = psutil.disk_usage(partition.mountpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\psutil_init.py", line 2049, in disk_usage
return _psplatform.disk_usage(path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Appdata\program files\python\projects\projects folder on crewAI\insta crew\venv\Lib\site-packages\psutil_pswindows.py", line 291, in disk_usage
total, free = cext.disk_usage(path)
^^^^^^^^^^^^^^^^^^^^^
PermissionError: [WinError 21] The device is not ready: 'G:\'

🔎 Describe the Bug The application crashes with a PermissionError when trying to access the 'G:' drive, which is not ready or inaccessible.

🔄 Reproduction Steps

Run the main.py script.
The application attempts to retrieve disk usage details using the psutil library.
The psutil library tries to access the 'G:' drive, which is not ready.
A PermissionError is raised, causing the application to crash.
🙁 Expected Behavior The application should handle the situation gracefully when the 'G:' drive is not ready, without raising an error.

📸 Screenshots

🔍 Additional Context The error occurs in the get_disk_details() function in host_env.py, which is called by the start_session() function in client.py. The get_disk_details() function uses the psutil.disk_partitions() function to retrieve a list of all mounted partitions, and then calls the psutil.disk_usage() function for each partition to retrieve usage details. If the 'G:' drive is not ready, the psutil.disk_usage() function raises a PermissionError, causing the application to crash.

Publish to PyPI

🚀 Feature Request

💡 Got a brilliant idea?
Currently, the repo is not pip downloadable.

🎉 Describe the solution you'd like
Add agentops to pypi

openai version 27.8+ don't get patched

🐛 Bug Report

🔎 Describe the Bug
Any versions above 0.27.8 don't get correctly patched

Index out of range error

🐛 Bug Report

🔎 Describe the Bug

Seeing this error frequently. Appears to be happening a lot with CrewAI streaming.

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/agentops/llm_tracker.py", line 127, in handle_stream_chunk
    token = choices[0].delta.content
            ~~~~~~~^^^
IndexError: list index out of range

🔄 Reproduction Steps
Run a non-standard model with CrewAI

🙁 Expected Behavior

No error or crash.
Error events get more details about what went wrong

Exits leave hanging threads

🐛 Bug Report

🔎 Describe the Bug
If a python script running agentops exits prematurely, then it will not send events to the server.

🔄 Reproduction Steps

Instantiate agentops
Crash script (fail an assert, etc.)
Program will hang

🙁 Expected Behavior
Program exits gracefully and should send an end_session state of Failure

The issue deals specifically with the worker

Sessions are not sent to the dashboard until after `end_session`

🐛 Bug Report

🔎 Describe the Bug
When a session begins, events are POST'ed to the database, but sessions are only saved after end_session is called.

🔄 Reproduction Steps
Create a session, send events, but don't end the session.

🙁 Expected Behavior
Post the session as soon as it is created. After end_session, update the session.

📸 Screenshots
Proposed fix, add start_session to the Worker:

    def start_session(self, session: Session) -> None:
        with self.lock:
            payload = {
                "session": session.__dict__
            }

            HttpClient.post(f'{self.config.endpoint}/sessions',
                            json.dumps(payload).encode("utf-8"),
                            self.config.api_key)

Current end session should remain the same:

    def end_session(self, session: Session) -> None:
        self.stop_flag.set()
        self.thread.join()
        self.flush_queue()

        with self.lock:
            payload = {
                "session": session.__dict__
            }

            HttpClient.post(f'{self.config.endpoint}/sessions',
                            json.dumps(payload).encode("utf-8"),
                            self.config.api_key)

Cohere support

🚀 Feature Request

Add instrumentation for Cohere models

Cannot import Langchain handler

🐛 Bug Report

🔎 Describe the Bug
There is a conflict between the agentops package version 0.0.14, which requires packaging==23.1, and the langchain-core package, which requires packaging>=23.2,<24.0. This conflict prevents pip from resolving the dependencies and installing the packages.

main cannot be identified for celery applications

🐛 Bug Report

Celery users are getting an error in agentops when checking localvars for _main. Celery uses a limitedset and cant do eq against a str.

Async events get logged as instances

🐛 Bug Report

When we log async events with record_action, they don't properly terminate. This is because the decorator doesn't properly await.

🔄 Reproduction Steps
Record an async event

🙁 Expected Behavior
The event should terminate at the end of the async

Add LangChain handler documentation support

📚 Documentation Update

📘 Describe the current state of documentation
Sync + Async handlers are now available with AgentOps and Langchain. Provide pages in the README.md as well as the documentation page.

LLM cost tracking

🚀 Feature Request

💡 Got a brilliant idea?
Currently, we don't have a reliable way of tracking costs. We can either:
a) Plug into a 3rd party API such as Helicone
Pros:

All hard complexity abstracted from users. Simply just replace the proxy server + Bearer auth
Cons:
Need to learn API integration
Users now have a dependency on 3rd party service pricing.

b) Roll out or own version of cost tracking proxy
Pros:

All hard complexity abstracted from users. Simply just replace the proxy server + Bearer auth
Tons of OSS services out there do this already, just fork their code. Easy enough?
Cons:
Need to write this out ourselves
Need to host ourselves

c) Update the tracker methods to include params for LLMs
Pros:

Easy to just roll into existing function wrappers
Session ID remains tied with API calls, keeps session costs in line
No need to proxy calls
Possible extensibility with custom LLMs down the line (i.e. pass in a tokenizer + cost dictionary and get your estimate)
Cons:
Added complexity for tagging

🤔 Is your feature request related to a problem?
Lots of users are asking to know cost per run. This is really valuable to track.

🎉 Describe the solution you'd like
Dead simple implementation SDK side

📚 Have you considered any alternatives?
See above. Maybe 3rd party API integration via cost-tracking services (i.e. Helicone) and keep the cost management game out of our hair

Update events prompts to be sent as ChatML

🚀 Feature Request

Currently, we do not consistently send prompts to the server as either ChatML format or string. We should enforce this standard on SDK for both langchain and OpenAI

Add LlamaIndex callback handler

🚀 Feature Request

💡 Got a brilliant idea?
Similar to Langchain, Llama Index also provides callback handlers. This is a potential opportunity to give built-in observability to agents built with this framework.

🤔 Is your feature request related to a problem?
A user in TokenCost might be a good initial pilot.

🎉 Describe the solution you'd like
Implement the callback handler here: https://docs.llamaindex.ai/en/stable/api_reference/callbacks.html

🔍 Additional context
Before working on this, find at least a handful of LlamaIndex agent builders and trial with them

Add DSPY instrumentor

🚀 Feature Request

💡 Got a brilliant idea?
Using either OpenInference or OpenLLMetry, add an instrumented for DSPY. This is the first of several test features we'll use to determine which is the best instrumentation library.

Tracebacks for failed events

🚀 Feature Request

💡 Got a brilliant idea?
Failed events don't provide enough context as to why the agent failed. All we know is that the events fail (marked in red/orange), but we don't see the proximate cause.

🎉 Describe the solution you'd like
Add a codeblock in the replay of the failed event's complete traceback that led to the error. This should help provide developers a way to understand where in the code the developers should make a fix.

📚 Have you considered any alternatives?
This is akin to Sentry.io. Probably should just adopt whatever patterns they use.

Load tags from env or offer documentation recommendations for how to add different groupings

🚀 Feature Request

💡 Got a brilliant idea?
Currently, the highest level grouping of sessions is by session ID. That is, you can't group sessions together other than tags. The problem with tags is that they need to be individually specified for each session instantiation.

As an alternative, I suggest one of the two options:

Update the documentation to automatically add tags based on a present environmental variable on-load. For example AGENTOPS_TAG_GROUP=Chrome navigation test in the .env would automatically add a tag to the tags group.
Add another optional argument to the session installation called group. For example:

        api_key (str): API Key for AgentOps services.
        tags (Dict[str, str], optional): Tags for the sessions that can be used for grouping or sorting later (e.g. {"llm": "GPT-4"}).
        group (str, optional): An ID used to group sessions of the same type together. (e.g. Chrome navigation test).
        config (Configuration, optional): A Configuration object for AgentOps services. If not provided, a default Configuration object will be used.

🤔 Is your feature request related to a problem?
Currently, we only group sessions by tags in dashboards. It may make sense to have better session grouping logic.

🎉 Describe the solution you'd like
Add a way to read env to tags, or add a new optional arg to the agentops instantiation (see above)

📚 Have you considered any alternatives?
Grouping by tags alone.

Alternatively, update the documentation to show some recommended practices for using agentops.

Org keys

Create enum for session end states and event results

🚀 Feature Request

💡 Got a brilliant idea?
Currently, session end states and event results are hard coded as strings instead of typed enums. These results should be success, indeterminate, and fail

Instead, we are inconsistent with database entries. Sessions, for example:

and Events:

🤔 Is your feature request related to a problem?
N/A

🎉 Describe the solution you'd like
We'd like to create an enum for the possible result states for:

Sessions
Events

Working Code Examples

🚀 Feature Request

💡 Got a brilliant idea?
Add a folder for examples using AgentOps. This can augment the documentation by offering runnable code.

Add org key to handler

🚀 Feature Request

💡 Got a brilliant idea?
Currently the Langchain handler doesn't take orgkey as an arg. It should.

Unended sessions leaving hanging programs

🐛 Bug Report

Unless Client.end_session is run, the program will hang.

🔎 Describe the Bug
Give a clear and concise description of the bug.

🔄 Reproduction Steps

Run pytest tests/test_teardown.py. Notice how the program hangs

🙁 Expected Behavior
Program should not hang.

📸 Screenshots
If applicable, add screenshots to help explain the problem.

(env) (base) ➜  agentops git:(33-decorators-are-unsafe) ✗ pytest tests/test_teardown.py -s
============================================== test session starts ==============================================
platform darwin -- Python 3.11.3, pytest-7.4.0, pluggy-1.3.0
rootdir: /Users/reibs/Projects/agentops-ai/agentops
plugins: anyio-4.0.0, asyncio-0.21.1, mock-3.11.1, requests-mock-1.11.0, httpx-0.24.0
asyncio: mode=Mode.STRICT
collected 1 item

tests/test_teardown.py .

=============================================== 1 passed in 3.21s ===============================================

Add org id to sessions

🚀 Feature Request

💡 Got a brilliant idea?
Several users are requesting the ability to collect analytics on sessions within their org. That is, individual maintainers of OSS projects want to be able to log and manage sessions created by their users. We currently don't have a simple way of aggregating user data to this degree.

🎉 Describe the solution you'd like
Suggestion:

Add an optional org_key to the AO client. Sessions created with an org key will be accessible to organization managers
Sessions with an org key but no api key will still be saved.
Sessions with an org key and an api key will be saved, and users will be able to track their personal historical traces, but not that of other users in the org

Furthermore, sessions without an API key will now show a temporary URL that hosts session's run. This is anonymous but technically accessible to anyone with the URL.

📚 Have you considered any alternatives?
We've considering issuing API keys to users based on org dashboards, but this is too cumbersome. For users we track during, say, hackathons, we want to minimize the time it takes for them to sign up to track their replays

Incorrect timezone representation in get_ISO_time

Title: Incorrect Timezone Representation in get_ISO_time Function

There is potential issue in the get_ISO_time function in the agentops/helpers.py file. The function is currently defined as follows:

def get_ISO_time():
    return datetime.fromtimestamp(time.time()).isoformat(timespec='milliseconds') + 'Z'

The function is intended to return the current time in ISO 8601 format, with the 'Z' at the end indicating that the time is in Coordinated Universal Time (UTC). However, the function is currently using datetime.fromtimestamp(time.time()), which returns the local time, not UTC.

This could lead to confusion or incorrect data interpretation, as the 'Z' at the end of the timestamp implies that the time is in UTC, when it is actually in local time.

To correct this, we should use datetime.utcfromtimestamp(time.time()) instead, which will return the current time in UTC. Here's the corrected function:

    return datetime.utcfromtimestamp(time.time()).isoformat(timespec='milliseconds') + 'Z'

This will ensure that the timestamp returned by get_ISO_time is indeed in UTC, as indicated by the 'Z' at the end.

Non-standard prompt args for LLMs

🚀 Feature Request

Frequently enough, users create wrapper methods for LLM calls. As such, they don't always include the term "prompt" in their code.

💡 Got a brilliant idea?
In the decorator, allow users to specify the prompt argument of the LLM caller function.

Make TOS more clear on API key create

🚀 Feature Request

💡 Got a brilliant idea?
We collect agent environment metadata on-run, but some users weren't clear about this. Create a modal/clear user indication that this data will be collected

🤔 Is your feature request related to a problem?
Requested on discord

🎉 Describe the solution you'd like
On API key creation page, add a modal or indicator which data will be collected by the key

📚 Have you considered any alternatives?
Keep terms of service as is

Replace tags as a list instead of dict

🚀 Feature Request

💡 Got a brilliant idea?
Loading tags as a dictionary is problematic for the frontend. As such, @siyangqiu proposes changing to a list of strings

🤔 Is your feature request related to a problem?
If your feature request is related to a problem you're facing, describe it here. The more we understand your struggle, the better we can address it.

🎉 Describe the solution you'd like
tags (Dict[str, str], optional): Tags that can be used for grouping or sorting later. e.g. {"llm": "GPT-4"}.
to
tags (List[str], optional): Tags that can be used for grouping or sorting later. e.g. "GPT-4".

📚 Have you considered any alternatives?
Status quo. Keep as dict.

🔍 Additional context
@siyangqiu please add additional thinking around this

Add session end state reason

🚀 Feature Request

💡 Got a brilliant idea?
When sessions fail, we don't know why unless we go into a drill down. This error tracking should be handled at the session level as well.

🤔 Is your feature request related to a problem?
Error session drill downs

🎉 Describe the solution you'd like
Add a new param to sessions

📚 Have you considered any alternatives?
Take last failed event from a session. This seems kludgy though.

Hanging thread after exit

🐛 Bug Report

🔎 Describe the Bug
Our current atexit code is not working. Tried this with smol-scheduler and it was bad.

🔄 Reproduction Steps
Create a simple script with a single function. (print hello world) and decorate it with the agent ops event recorder

🙁 Expected Behavior
Program should exit properly.

Data collection toggling

🚀 Feature Request

💡 Got a brilliant idea?
Some users are requesting environment metadata doesn't get collected. We'd like a way to toggle this

🤔 Is your feature request related to a problem?
Discord user requested

🎉 Describe the solution you'd like
Either:
a) In SDK have toggle on init
b) In dashboard have toggle on key create

📚 Have you considered any alternatives?
Status quo collects everything

Session Drill Down page does not refresh to the selected issue

🐛 Bug Report

🔎 Describe the Bug
When refreshing the drill down page, either via refresh button or browser refresh, the page still shows the previously used session rather than the latest. The session tab however, shows the latest session as selected. User must select another session and then select again the latest session in the session tab in order to show the latest session.

🔄 Reproduction Steps

Generate a new session
Go to session drill down page
Hit refresh button

🙁 Expected Behavior
After refreshing the drill down page should show the latest session/

📸 Screenshots

Thank you for helping us improve Agentops!

Params are not being sent to dashboard

🐛 Bug Report

🔎 Describe the Bug
The record action decorator does not send argument parameters correctly to the database. Only 1 arg param gets sent.

🔄 Reproduction Steps
Decorate a function with multiple arguments. Only 1 actually gets sent.

🙁 Expected Behavior
All serializable args should be sent

Thank you for helping us improve Agentops!

Decorators are unsafe

🐛 Bug Report

A handful of decorators crash in prod. This should never happen. In particular, this happens with the action_type type enforcement. We should get rid of this entirely now as we now track LLM calls with overrides

"setitem" method not defined on type "EventType"Pylance

🚀 Feature Request

💡 Got a brilliant idea?
To improve Mypy typing, we should add item setters on Event objects. This eliminates a handful of errors and makes the code safer.

🎉 Describe the solution you'd like
Add __setitem__ to Event and Session

Emit events on exit

Currently we don't see the last termination event. We should emit that. So for example, ctrl+C should show that termination event

Notebooks not running end-to-end + typos

🐛 Bug Report

Several of the Jupyter notebooks don't run end-to-end with the latest version of AgentOps. Additionally, there are a handful of spelling/grammar issues.

To improve this:

Run each notebook top to bottom to ensure they work as expected
Clean up any typos or grammar issues in them

pypi auto publish

🚀 Feature Request

💡 Got a brilliant idea?
Auto publish to pypi

Create a workflow

General nit, why are the docstrings mis-aligned with the triple quotes? I'm seeing:

          General nit, why are the docstrings mis-aligned with the triple quotes? I'm seeing:

def foo():
    """
        stuff stuff stuff
    """

vs.

def foo():
    """
    stuff stuff stuff
    """

Originally posted by @areibman in #163 (comment)

OpenAI 1.0.0+ updates

🐛 Bug Report

OpenAI 1.0.0+ doesn't record events

Update events handler to read 3rd party library calls

🚀 Feature Request

Some libraries use 3rd party agent frameworks. We don't want to replace them but instead patch

💡 Got a brilliant idea?
Briefly describe your feature suggestion. We're all about improving agent observablity and performance.

🎉 Describe the solution you'd like
What's your solution? Be as detailed as possible.

📚 Have you considered any alternatives?
Share any alternative solutions or features you've considered.

🖼️ Can you provide a visual mock-up or sketch?
If possible, provide a visual mock-up or sketch of your feature request. Visual aids can help us better understand your vision.

🔍 Additional context
Any extra information or context to help us understand better? Share it here.

Async langchain handler doesn't take org id

🐛 Bug Report

Related #77

agentops-ai / agentops Goto Github PK

agentops's Introduction

Quick Start ⌨️

Session replays in 2 lines of code

First class Developer Experience

Integrations 🦾

CrewAI 🛶

AutoGen 🤖

Langchain 🦜🔗

Cohere ⌨️

Anthropic ﹨

LiteLLM 🚅

LlamaIndex 🦙

Time travel debugging 🔮

Agent Arena 🥊

Evaluations Roadmap 🧭

Debugging Roadmap 🧭

Why AgentOps? 🤔

Star History

Popular projects using AgentOps

agentops's People

Contributors

Stargazers

Watchers

Forkers

agentops's Issues

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🐛 Bug Report

🐛 Bug Report

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🐛 Bug Report

🐛 Bug Report

📚 Documentation Update

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🐛 Bug Report

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

🚀 Feature Request

🐛 Bug Report

Recommend Projects

Recommend Topics

Recommend Org