Comments (4)
Some clarifications:
I think we should not require users to return FuncResponse in their application. It is extremely hard to use. However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id
. However, I am not sure how the @entrypoint
can fetch from the tracing object the cost and number of tokens..
If that is too convoluted, we would just remove cost/tokens from the output schema of the LLM app, and use the trace_id in the playground and evaluation (when available) to show the cost/number of tokens
from agenta.
However I am not sure whether we should still create this FuncResponse implicitly from the SDK. It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id.
QUICK NOTE: your clarification will only affect users that use observability decorators (ag.span). Integrating our callback handler through litellm will resolve these concerns for them. Additionally, instrumenting OpenAI will also fix the issue.
Regarding your concern, we could allow users to return only the output of the LLM app, while the SDK handles the FuncResponse. As for tracking the cost and token usage of their LLM app, it seems reasonable to have them ingest the data themselves if they won't be using litellm or the OpenAI instrumentation (that will be available at a later date).
Here's a quick example of how they would ingest the data themselves:
import openai
import agenta as ag
default_prompt = (
"Give me 10 names for a baby from this country {country} with gender {gender}!!!!"
)
ag.init()
tracing = ag.llm_tracing()
ag.config.default(
temperature=ag.FloatParam(0.2), prompt_template=ag.TextParam(default_prompt)
)
@ag.span(type="llm")
async def gpt_4_llm_call(prompt: str) -> str:
response = await openai.gpt.create(prompt=prompt, temperate=ag.config.temperature, model="gpt-4")
token_usage = response.usage.dict()
tracing.set_span_attribute(
"llm_cost",
{"cost": ag.calculate_token_usage("gpt-4", tokens_usage), "tokens": tokens_usage}
) # <-- RIGHT HERE 👋🏾
return response.choices[0].message.content
@ag.entrypoint
async def generate(country: str, gender: str) -> str:
prompt = ag.config.prompt_template.format(country=country, gender=gender)
return await gpt_4_llm_call(prompt=prompt)
However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.
The entrypoint decorator has access to the tracing object, which also has access to the method that calculates the cost and tokens.
Let me know what your thoughts are.
from agenta.
@aybruhm Yep, I agree.
from agenta.
It would be nice if we could keep the output of the LLM applications created with the SDK the same, just adding trace_id. However, I am not sure how the @entrypoint can fetch from the tracing object the cost and number of tokens.
New development: While the SDK has access to the tracing object, it doesn't have direct access to the cost and number of tokens. This is because the LLM app needs to have run before the Tracing SDK calculates the sum of cost and tokens of all the trace spans. To address this, we can return the trace_id
along with the LLM FuncResponse response.
However, this approach adds complexity, particularly for the OSS version. In our cloud and enterprise versions, observability is available, and returning the trace_id
to the frontend to retrieve the sum of cost and token usage for their LLM app run from the backend is feasible.
For the OSS version, we need to find an alternative solution. Also, we should consider adding documentation suggesting how they can track cost and token usage. Just like we have it now:
@ag.span(type="llm")
async def llm_call(...):
response = await client.chat.completions.create(...)
tracing.set_span_attribute(
"model_config", {"model": model, "temperature": temperature}
)
tokens_usage = response.usage.dict() # type: ignore
return {
"cost": ag.calculate_token_usage(model, tokens_usage),
"message": response.choices[0].message.content,
"usage": tokens_usage,
}
What are your thoughts, @mmabrouk?
from agenta.
Related Issues (20)
- [AGE-278] Show the name of the columns in the test set without upper case the first letter HOT 2
- [AGE-230] [Bug] Start from code can fail if user has a .dockerignore or Dockerfile
- [AGE-286] Create a JSON evaluator
- [AGE-287] Add semantic similarity evaluator
- [AGE-288] Do not show cost in evaluation when cost is not available
- [AGE-290] [Bug] MultipleChoiceParam not shown in UI HOT 7
- [AGE-291] [Bug] Code Evaluation is not working HOT 3
- [Bug]: Test set table columns position are not changing after drag-and-drop HOT 1
- [Bug] Evaluation can't stop running on self-host agenta HOT 3
- Refactore: TestsetTable.tsx component code for better readability HOT 2
- [AGE-296] Apps fail when output is string in 16.0
- Enhancement: save variant tabs order
- [Bug]: 'Create a New Variant' modal input data do not get cleared HOT 1
- [AGE-321] [Bug] Not all rows in evaluation comparison cannot be expanded
- [AGE-345] [bug] Some models are not working in the playground
- [Enhancement]: Table row's delete button placement and UI HOT 3
- [AGE-348] [bug] Errors are not correctly handled in the LLM applications HOT 4
- [AGE-357] [bug] Removing a variant used in an A/B test evaluation breaks the human evaluation view
- [AGE-365] Add new status for evaluation Queued HOT 2
- [AGE-370] Improve reproducibility of AI critique outputs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agenta.