portkey-ai / gateway Goto Github PK
View Code? Open in Web Editor NEWA Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.
Home Page: https://portkey.ai/features/ai-gateway
License: MIT License
A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.
Home Page: https://portkey.ai/features/ai-gateway
License: MIT License
Add support for anthropic messages route: https://docs.anthropic.com/claude/reference/messages_post
When content-length header provided by llm provider is forwarded in a node runtime, hono compress middleware sends response with a delay and a stream-like behaviour. This is happening because compress middleware is getting some conflicts while sending transfer-encoding: chunked but the response also has a content-length header
Solution: Delete the content-length header from response.
Current:
After the change:
Support mistral ai:
mistral-ai
MistralAI API documentation: https://docs.mistral.ai/api/
Support perplexity ai
perplexity-ai
PerplexityAI API documentation: https://docs.perplexity.ai/reference
Currently Rubeus does not support shortconfigs which just has a provider on top-level of config. It does not pick up provider details and key from the config and try searching headers.
Sample short config
{
"provider": "openai",
"api_key": "sk****xyz
}
Expected: Rubeus proxy calls should pick up provider details from config if its passed.
Add a github action to publish the npm package on every release of this repo
Signatures:
curl {{BASE_URL}}/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-rubeus-provider: openai" \
-H "Authorization: <openai-key>" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
]
}'
curl '{{BASE_URL}}/v1/completions' \
-H "Content-Type: application/json" \
-H "x-rubeus-provider: openai" \
-H "Authorization: <openai-key>" \
-d '{
"n": 2,
"model": "text-davinci-003",
"top_p": 1,
"prompt": "Write an essay about Indiaaa",
"stream": false,
"max_tokens": 10,
"temperature": 0.5
}'
curl '{{BASE_URL}}/v1/embeddings' \
-H "Content-Type: application/json" \
-H "x-rubeus-provider: openai" \
-H "Authorization: <openai-key>" \
-d '{
"input": ["Hello", "Hello"]
}'
Using OpenAI SDK
from openai import OpenAI
openai_client = OpenAI(
base_url={{BASE_URL_WITH_V1}},
default_headers={
'x-rubeus-provider': 'openai'
}
)
response = openai_client.chat.completions.create(
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Hello!'}],
model='gpt-3.5-turbo'
)
Example
curl --request POST \
--url {{BASE_URL}}/v1/rerank \
--header 'content-type: application/json' \
--header 'Authorization: Bearer $COHERE_API_KEY' \
--header 'x-rubeus-provider: cohere' \
--data '
{
"return_documents": false,
"max_chunks_per_doc": 10,
"model": "rerank-english-v2.0",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
]
}
'
Implement dumb proxy for openAI GET and DELETE routes
This will allow users to keep the url same even when they are using different routes like files, moderations, etc. using openAI SDK.
Implement new configs structure
The new config structure will allow nested targets for each mode. This will make it more flexible and powerful. It will allow users to implement very custom scenarios like doing loadbalance between 2 targets which both have their individual fallback implements. The examples look like this
{
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"provider": "openai",
"api_key": "sk-***"
},
{
"provider": "openai",
"api_key": "sk-***"
}
]
}
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "openai",
"api_key": "sk-***"
},
{
"provider": "openai",
"api_key": "sk-***"
}
]
}
{
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"provider": "openai",
"api_key": "sk-***"
},
{
"strategy": {
"mode": "fallback",
"on_status_codes": [429, 241]
},
"targets": [
{
"provider": "openai",
"api_key": "***"
},
{
"provider": "openai"
"api_key": "***"
}
]
}
]
}
DEPRECATIONs to be done soon (in next few weeks):
tools
, tool_choice
and response_format
param support for azure-openai
chat completionsWe aim to enhance the capabilities of our AI Gateway by integrating Google's PaLM LLM Model, commonly known as "Palm," into the existing infrastructure. This issue serves as a discussion and tracking point for the integration efforts.
Model Integration: Develop a component to integrate Google's Palm Detection Model into the AI Gateway. This component will handle the request reponse of the inference APIs from the model.
Pre-processing and Post-processing: Implement necessary pre-processing and post-processing steps.
API and Input/Output Design: Define an API for interacting with the Palm Detection Model within the AI Gateway, specifying input parameters and expected output format.
Documentation: Update the AI Gateway's documentation to include information on how to use the newly integrated Palm LLM Model.
Currently, even for proxy calls with configs, user needs to send the actual azure config in the url itself. But when you pass a config in proxy call, it overwrites the url with whats written in config which causes confusion. Need to have a generic interface for proxy calls which just accepts type (chat/text) and then takes the provider options from config itself.
Current proxy url for azure: {{host}}/v1/proxy/{{RESOURCE}}.openai.azure.com/openai/deployments/{{DEPLOYMENT_ID}}/chat/completions?api-version={{API_VERSION}}
The problem with this implementation is that when you pass a config with the API call, it overwrites whatever is written in url which causes confusion.
New implementation for config enabled proxy calls:
{{host}}/v1/proxy/chat/completions
{{host}}/v1/proxy/completions
NOTE: The old implementation is still applicable if users are making simple proxy calls without config.
Making it easier for people, we can rename this repo as gateway.
We could then install it like
npx @portkey-ai/gateway
We should allow rubeus to be run locally through NPX.
npx rubeus
should work
Currently, rubeus do not return the x-portkey-last-used-option-index
header in response of config enabled proxy calls. This results in inconsistent behaviour and also causes problem as there is no visibility about the last used index.
Rubeus currently just returns [DONE] as the chunk but it should also add data: to the start to help client libraries read it. The done event should be
data: [DONE]\n\n
Add a new middleware (hono compress) to the app router that conditionally applies the compression middleware based on the runtime. Compression is automatically handled for lagon and workerd runtimes but not for other runtime like node. The middleware should check if the runtime is neither lagon nor workerd before applying the compression middleware to avoid double compression.
Changes required:
Add a hono compress middleware to the API router.
Conditionally apply compression based on the runtime. compression should not happen for workerd and lagon environment as they are automatically handled.
The gateway today only supports POST
requests. We want to also support GET
requests on this to support operations like getModels, getRuns which are available on OpenAI.
We could build this generically to ensure all GET requests can be routed and handled in middleware
if/else-if
conditions to switch
statement to make it more readable.tryPost
calls added in fallback/loadbalance/single mode mode handlers. As a recursive function, it should keep travelling through targets until it reaches a leaf target (target without any nested/child targets). So only one block should make tryPost call from the whole function. This will make it a bit more simple and truly recursive.Currently gateway uses c.env at multiple places to use environment variables. But this syntax is only supported for workerd (cloudflare workers) runtime. To extend it for node runtime as well, env
function from hono/adapter
needs to used.
Example Usage:
import { env } from "hono/adapter";
function sampleFunction(c: Context) {
const envVarOne = env(c).ENV_VAR_ONE
}
Documentation for hono adapter helpers: https://hono.dev/helpers/adapter#env
The newer version of the anthropic APIs needs a header "anthropic-version": "2023-06-01"
to be sent in the requests. This has been handled automatically on the SDKs on their end, but on the APIs this needs to be added. Hence this change.
Refer: Anthropic Docs
google
that will have support for new gemini APIs: embedContent
and generateContent
.API Documentation for Gemini: https://ai.google.dev/api
Currently, whenever a proxy call happens, we are doing snake case to camel case conversion twice. Due to this, ignoreFields are also getting converted to camelCase the second time. Need to keep only one conversion.
The chat complete config for together-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed
Current:
top_k: {
param: "top_p"
}
Fix:
top_k: {
param: "top_k"
}
Based on their commits, the API is: https://route.withmartian.com/api/openai/v1
openai/openai-python@main...withmartian:martian-python-v1:main
The PaLM APIs were missing the mappings for max_tokens and sequences and as a result they were not being picked up.
The chat complete config for perplexity-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed
Current:
top_k: {
param: "top_p",
min: 0,
max: 2048
}
Fix:
top_k: {
param: "top_k",
min: 0,
max: 2048
}
Anthropic messages route sends these types of event: ping, message_start, content_block_start, content_block_delta, content_block_stop, message_delta and message_stop. Out of these event, content_block_delta and messade_delta has the required details for getting the standardized data out of it. However, there is a bug in the replace logic that is used to remove event and data tags from chunk before doing JSON parse.
Current (in src/providers/anthropic/chatComplete.ts:
chunk = chunk.replace(/^event: completion[\r\n]*/, "");
Due to this the actual content_block_delta and message_delta event do not get sanitized and causes issues in sending back the standardized stream chunk.
Solution:
chunk = chunk.replace(/^event: content_block_delta[\r\n]*/, "");
chunk = chunk.replace(/^event: message_delta[\r\n]*/, "");
This will replace the event tags for the required events and successfully parse the rest of the chunk as JSON
Rubeus currently picks up stream param from the transformedBody for Rubeus. But google does not have a stream param as it supports streaming through a different endpoint.
Solution: pick up stream from Rubeus body instead of transformed body
Line 2 in 1ccb963
gateway/src/handlers/proxyGetHandler.ts
Line 73 in 1ccb963
gateway/src/handlers/handlerUtils.ts
Line 246 in 1ccb963
gateway/src/handlers/proxyHandler.ts
Line 173 in 1ccb963
Docs reference: https://docs.portkey.ai/docs/api-reference/prompts/prompt-completion
Anyscale launched an OpenAI compatible API for open source model hosting.
https://app.endpoints.anyscale.com/
This allows us to support the following models:
Add a Dockerfile
and an example docker-compose.yml
file to build the official docker image. The same setup will be used to continuously update docker image on every release
The proxy calls is not sending in the cacheKey to the downstream services. This PR addresses the issue of sending the cacheKey for processing to the downstream services (if any).
tools
, tool_choice
and response_format
param support for anyscale chat completionsAdd a docker image publish github action that pushed the latest official docker image on every release of this repo. The action should use buildx to build multiplatform docker image. And it should publish 2 tags on each release: portkeyai/gateway:latest
and portkeyai/gateway:<latest-release-version>
(example portkeyai/gateway:1.0.1)
prettier package should be a part of devDependencies and not dependencies.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.