Coder Social home page Coder Social logo

Comments (4)

mmabrouk avatar mmabrouk commented on July 30, 2024

Some clarifications:

I think we should not require users to return FuncResponse in their application. It is extremely hard to use. However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id. However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens..

If that is too convoluted, we would just remove cost/tokens from the output schema of the LLM app, and use the trace_id in the playground and evaluation (when available) to show the cost/number of tokens

from agenta.

aybruhm avatar aybruhm commented on July 30, 2024

However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id.

QUICK NOTE: your clarification will only affect users that use observability decorators (ag.span). Integrating our callback handler through litellm will resolve these concerns for them. Additionally, instrumenting OpenAI will also fix the issue.

Regarding your concern, we could allow users to return only the output of the LLM app, while the SDK handles the FuncResponse. As for tracking the cost and token usage of their LLM app, it seems reasonable to have them ingest the data themselves if they won't be using litellm or the OpenAI instrumentation (that will be available at a later date).

Here's a quick example of how they would ingest the data themselves:

import openai
import agenta as ag


default_prompt = (
    "Give me 10 names for a baby from this country {country} with gender {gender}!!!!"
)

ag.init()
tracing = ag.llm_tracing()
ag.config.default(
    temperature=ag.FloatParam(0.2), prompt_template=ag.TextParam(default_prompt)
)

@ag.span(type="llm")
async def gpt_4_llm_call(prompt: str) -> str:
    response = await openai.gpt.create(prompt=prompt, temperate=ag.config.temperature, model="gpt-4")
    token_usage = response.usage.dict()
    tracing.set_span_attribute(
        "llm_cost", 
        {"cost": ag.calculate_token_usage("gpt-4", tokens_usage), "tokens": tokens_usage}
    ) # <-- RIGHT HERE 👋🏾
    return response.choices[0].message.content


@ag.entrypoint
async def generate(country: str, gender: str) -> str:
     prompt = ag.config.prompt_template.format(country=country, gender=gender)
     return await gpt_4_llm_call(prompt=prompt)

However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.

The entrypoint decorator has access to the tracing object, which also has access to the method that calculates the cost and tokens.

Let me know what your thoughts are.

from agenta.

mmabrouk avatar mmabrouk commented on July 30, 2024

@aybruhm Yep, I agree.

from agenta.

aybruhm avatar aybruhm commented on July 30, 2024

It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id. However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.

New development: While the SDK has access to the tracing object, it doesn't have direct access to the cost and number of tokens. This is because the LLM app needs to have run before the Tracing SDK calculates the sum of cost and tokens of all the trace spans. To address this, we can return the trace_id along with the LLM FuncResponse response.

However, this approach adds complexity, particularly for the OSS version. In our cloud and enterprise versions, observability is available, and returning the trace_id to the frontend to retrieve the sum of cost and token usage for their LLM app run from the backend is feasible.

For the OSS version, we need to find an alternative solution. Also, we should consider adding documentation suggesting how they can track cost and token usage. Just like we have it now:

@ag.span(type="llm")
async def llm_call(...):
    response = await client.chat.completions.create(...)
    tracing.set_span_attribute(
        "model_config", {"model": model, "temperature": temperature}
    )
    tokens_usage = response.usage.dict()  # type: ignore
    return {
        "cost": ag.calculate_token_usage(model, tokens_usage),
        "message": response.choices[0].message.content,
        "usage": tokens_usage,
    }

What are your thoughts, @mmabrouk?

from agenta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.