Right now the user needs to explicitly return in the traced function a dict that conta

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[AGE-163] Propagating the cost from Span to Trace about agenta HOT 4 CLOSED

mmabrouk commented on July 30, 2024 1

[AGE-163] Propagating the cost from Span to Trace

from agenta.

Comments (4)

mmabrouk commented on July 30, 2024

Some clarifications:

I think we should not require users to return FuncResponse in their application. It is extremely hard to use. However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id. However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens..

If that is too convoluted, we would just remove cost/tokens from the output schema of the LLM app, and use the trace_id in the playground and evaluation (when available) to show the cost/number of tokens

from agenta.

aybruhm commented on July 30, 2024

However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id.

QUICK NOTE: your clarification will only affect users that use observability decorators (ag.span). Integrating our callback handler through litellm will resolve these concerns for them. Additionally, instrumenting OpenAI will also fix the issue.

Regarding your concern, we could allow users to return only the output of the LLM app, while the SDK handles the FuncResponse. As for tracking the cost and token usage of their LLM app, it seems reasonable to have them ingest the data themselves if they won't be using litellm or the OpenAI instrumentation (that will be available at a later date).

Here's a quick example of how they would ingest the data themselves:

import openai
import agenta as ag


default_prompt = (
    "Give me 10 names for a baby from this country {country} with gender {gender}!!!!"
)

ag.init()
tracing = ag.llm_tracing()
ag.config.default(
    temperature=ag.FloatParam(0.2), prompt_template=ag.TextParam(default_prompt)
)

@ag.span(type="llm")
async def gpt_4_llm_call(prompt: str) -> str:
    response = await openai.gpt.create(prompt=prompt, temperate=ag.config.temperature, model="gpt-4")
    token_usage = response.usage.dict()
    tracing.set_span_attribute(
        "llm_cost", 
        {"cost": ag.calculate_token_usage("gpt-4", tokens_usage), "tokens": tokens_usage}
    ) # <-- RIGHT HERE 👋🏾
    return response.choices[0].message.content


@ag.entrypoint
async def generate(country: str, gender: str) -> str:
     prompt = ag.config.prompt_template.format(country=country, gender=gender)
     return await gpt_4_llm_call(prompt=prompt)

However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.

The entrypoint decorator has access to the tracing object, which also has access to the method that calculates the cost and tokens.

Let me know what your thoughts are.

from agenta.

mmabrouk commented on July 30, 2024

@aybruhm Yep, I agree.

from agenta.

aybruhm commented on July 30, 2024

It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id. However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.

New development: While the SDK has access to the tracing object, it doesn't have direct access to the cost and number of tokens. This is because the LLM app needs to have run before the Tracing SDK calculates the sum of cost and tokens of all the trace spans. To address this, we can return the trace_id along with the LLM FuncResponse response.

However, this approach adds complexity, particularly for the OSS version. In our cloud and enterprise versions, observability is available, and returning the trace_id to the frontend to retrieve the sum of cost and token usage for their LLM app run from the backend is feasible.

For the OSS version, we need to find an alternative solution. Also, we should consider adding documentation suggesting how they can track cost and token usage. Just like we have it now:

@ag.span(type="llm")
async def llm_call(...):
    response = await client.chat.completions.create(...)
    tracing.set_span_attribute(
        "model_config", {"model": model, "temperature": temperature}
    )
    tokens_usage = response.usage.dict()  # type: ignore
    return {
        "cost": ag.calculate_token_usage(model, tokens_usage),
        "message": response.choices[0].message.content,
        "usage": tokens_usage,
    }

What are your thoughts, @mmabrouk?

from agenta.

[AGE-163] Propagating the cost from Span to Trace about agenta HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent