portkey-ai / gateway Goto Github PK

View Code? Open in Web Editor NEW

5.6K 5.6K 377.0 60.79 MB

A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.

Home Page: https://portkey.ai/features/ai-gateway

License: MIT License

TypeScript 99.73% JavaScript 0.16% Dockerfile 0.11%

ai-gateway gateway generative-ai langchain llama-index llmops llms openai prompt-engineering router

gateway's People

Contributors

Stargazers

Watchers

Forkers

raj-hegde hariexcel pauljbernard kritikash18 omahs 0xvegeta yousuf-ejaz b4s36t4 rtpa25 insaanimanav aimldllabs aravindputrevu roh26it shashank40 visargd vrushankportkey xiayu1028 mivanovitch codeaudit rajendharmendra mz0in xdarabseh abeatbeyondlab tomchapin kenakafrosty xc0r mindkhichdi jmaigc touristshaun polya20 tonywhite11 tony7466 kilingzhang jansystemic w3nder linecode ydgandhi sunholo-data soi-20 richardsonjf shantanunair kodylow duke24k pent g-h-0-s-t positioner suraj-bhandarkar-s eltociear moaqsam vidurkhanal johnny-rice 9cat mohamara mohmadhaider hhy5277 yanxg jakubik2023 lopez6666 nghduc97 jgui1129 perbinder flolo420 stophobia mdwoicke viralp2196 ailabteam techfluent-au darkacorn scriptonics cduran srenzo mauixer otey247 unity20092024 traviszech sikkgit jensinjames kritakaryal bushjiang mu-l masonite tsayan mjdhasan nnuujj evenchen6 vincentngo dragekjeks levie-vans chitonedevops nzb15555196162 iseeyo codeyourwayup yuanzhongqiao skumail qer0625 wd021 awareset allinbsv doom-9 jatcode

gateway's Issues

Support anthropic messages route

Add support for anthropic messages route: https://docs.anthropic.com/claude/reference/messages_post

Fix: compress middleware conflict for node runtime due to response content-length header

When content-length header provided by llm provider is forwarded in a node runtime, hono compress middleware sends response with a delay and a stream-like behaviour. This is happening because compress middleware is getting some conflicts while sending transfer-encoding: chunked but the response also has a content-length header

Solution: Delete the content-length header from response.

Current:

After the change:

Add support for anyscale completions and embeddings route

Add support for /completions route introduced by anyscale.
Add support for /embeddings route introduced by anyscale

Mistral-AI integration

Support mistral ai:

Add provider as mistral-ai
Support /chat/completions route
Support /embeddings route

MistralAI API documentation: https://docs.mistral.ai/api/

Perplexity-AI integration

Support perplexity ai

Add provider as perplexity-ai
Support /chat/completions route

PerplexityAI API documentation: https://docs.perplexity.ai/reference

Support short config with provider in proxy calls

Currently Rubeus does not support shortconfigs which just has a provider on top-level of config. It does not pick up provider details and key from the config and try searching headers.

Sample short config

{
    "provider": "openai",
    "api_key": "sk****xyz
}

Expected: Rubeus proxy calls should pick up provider details from config if its passed.

[Provider] Add Ollama models

Docs: https://github.com/jmorganca/ollama/blob/main/docs/api.md

[Provider] Amazon Bedrock

Streamline cache function input

Cache function input arguments are a bit different for /proxy routes and /complete or /chatComplete route. We need to streamline them so that it stays consistent

Add community content for github (support, code of conduct, etc)

#101

[Provider] Add support for Replicate inference + webhook

Docs: https://replicate.com/docs/reference/http

Add npm publish github action

Add a github action to publish the npm package on every release of this repo

Implement new routes and config structure

Introduce 3 new routes: /v1/chat/completions, /v1/completions and /v1/embeddings.
All the 3 routes will follow openAI request and response structure and will provide unified interface across all the supported providers. Because of same API signatures, rubeus deployment can be used either as a REST API, through openAI SDK or Portkey SDK.

Signatures:

Chat completions
Using REST API

curl {{BASE_URL}}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ]
  }'

Completions

curl '{{BASE_URL}}/v1/completions' \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "n": 2,
    "model": "text-davinci-003",
    "top_p": 1,
    "prompt": "Write an essay about Indiaaa",
    "stream": false,
    "max_tokens": 10,
    "temperature": 0.5
}'

Embeddings

curl '{{BASE_URL}}/v1/embeddings' \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "input": ["Hello", "Hello"]
}'

Using OpenAI SDK

from openai import OpenAI

openai_client = OpenAI(
    base_url={{BASE_URL_WITH_V1}},
    default_headers={
        'x-rubeus-provider': 'openai'
    }
)
response = openai_client.chat.completions.create(
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Hello!'}],
        model='gpt-3.5-turbo'
)

Change /v1/proxy/* to /v1/*
Introduce a new route handler that allows making proxy calls (without unified req/res interface). The old route /v1/proxy will be deprecated and /v1/* route will replace it.

Example

Rerank

curl --request POST \
     --url {{BASE_URL}}/v1/rerank \
     --header 'content-type: application/json' \
     --header 'Authorization: Bearer $COHERE_API_KEY' \
     --header 'x-rubeus-provider: cohere' \
     --data '
{
  "return_documents": false,
  "max_chunks_per_doc": 10,
  "model": "rerank-english-v2.0",
  "query": "What is the capital of the United States?",
  "documents": [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
    "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
  ]
}
'

Implement dumb proxy for openAI GET and DELETE routes
This will allow users to keep the url same even when they are using different routes like files, moderations, etc. using openAI SDK.
Implement new configs structure
The new config structure will allow nested targets for each mode. This will make it more flexible and powerful. It will allow users to implement very custom scenarios like doing loadbalance between 2 targets which both have their individual fallback implements. The examples look like this

Simple loadbalance

{
  "strategy": {
      "mode": "loadbalance"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "openai",
      "api_key": "sk-***"
    }
  ]
}

Simple fallback

{
  "strategy": {
      "mode": "fallback"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "openai",
      "api_key": "sk-***"
    }
  ]
}

loadbalance with nested fallback

{
  "strategy": {
      "mode": "loadbalance"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "strategy": {
          "mode": "fallback",
          "on_status_codes": [429, 241]
        },
      "targets": [
        {
          "provider": "openai",
          "api_key": "***"
        },
        {
          "provider": "openai"
          "api_key": "***"
        }
      ]
    }
  ]
}

DEPRECATIONs to be done soon (in next few weeks):

We will remove the following routes: /v1/chatComplete, /v1/complete, /v1/embed and /v1/proxy/*
We will also remove handler for the old config structure.

Support tools, tool choice and json mode for azure-openai chat completions

Support tools, tool_choice and response_format param support for azure-openai chat completions

feat: Palm integration

Description

We aim to enhance the capabilities of our AI Gateway by integrating Google's PaLM LLM Model, commonly known as "Palm," into the existing infrastructure. This issue serves as a discussion and tracking point for the integration efforts.

Integration Tasks:

Model Integration: Develop a component to integrate Google's Palm Detection Model into the AI Gateway. This component will handle the request reponse of the inference APIs from the model.
Pre-processing and Post-processing: Implement necessary pre-processing and post-processing steps.
API and Input/Output Design: Define an API for interacting with the Palm Detection Model within the AI Gateway, specifying input parameters and expected output format.
Documentation: Update the AI Gateway's documentation to include information on how to use the newly integrated Palm LLM Model.

Support generic /chat/completions and /completions route for azure proxy + configs call

Currently, even for proxy calls with configs, user needs to send the actual azure config in the url itself. But when you pass a config in proxy call, it overwrites the url with whats written in config which causes confusion. Need to have a generic interface for proxy calls which just accepts type (chat/text) and then takes the provider options from config itself.

Current proxy url for azure: {{host}}/v1/proxy/{{RESOURCE}}.openai.azure.com/openai/deployments/{{DEPLOYMENT_ID}}/chat/completions?api-version={{API_VERSION}}

The problem with this implementation is that when you pass a config with the API call, it overwrites whatever is written in url which causes confusion.

New implementation for config enabled proxy calls:
{{host}}/v1/proxy/chat/completions
{{host}}/v1/proxy/completions

NOTE: The old implementation is still applicable if users are making simple proxy calls without config.

Rename to gateway

Making it easier for people, we can rename this repo as gateway.

We could then install it like

npx @portkey-ai/gateway

[Docs] Fallback implementation between Claude-2 & GPT-3.5-Turbo-Instruct

Allow NPX execution

We should allow rubeus to be run locally through NPX.

npx rubeus

should work

Support together-ai chat completions route

Currently, Rubeus makes internal transformations for together-ai chat completions and makes a corresponding completions call. But now together-ai has also introduced chat completions route so the transformations can be removed and rubeus can directly use the route

[Provider] Add support for AI21 models

Docs: https://docs.ai21.com/reference/j2-complete-api-ref

Rubeus: Return the x-portkey-last-used-option-index header for config enabled proxy calls

Currently, rubeus do not return the x-portkey-last-used-option-index header in response of config enabled proxy calls. This results in inconsistent behaviour and also causes problem as there is no visibility about the last used index.

Send DONE event as data for anyscale streaming response

Rubeus currently just returns [DONE] as the chunk but it should also add data: to the start to help client libraries read it. The done event should be

data: [DONE]\n\n

Add hono compress middleware for non-workerd and non-lagon runtimes

Add a new middleware (hono compress) to the app router that conditionally applies the compression middleware based on the runtime. Compression is automatically handled for lagon and workerd runtimes but not for other runtime like node. The middleware should check if the runtime is neither lagon nor workerd before applying the compression middleware to avoid double compression.

Changes required:

Add a hono compress middleware to the API router.
Conditionally apply compression based on the runtime. compression should not happen for workerd and lagon environment as they are automatically handled.

Support GET requests on the gateway

The gateway today only supports POST requests. We want to also support GET requests on this to support operations like getModels, getRuns which are available on OpenAI.

We could build this generically to ensure all GET requests can be routed and handled in middleware

Fix and refactor tryTargetsRecursively function

Convert all the if/else-if conditions to switch statement to make it more readable.
Remove all tryPost calls added in fallback/loadbalance/single mode mode handlers. As a recursive function, it should keep travelling through targets until it reaches a leaf target (target without any nested/child targets). So only one block should make tryPost call from the whole function. This will make it a bit more simple and truly recursive.
Fix issue where inherited config was not getting picked up for fallback or loadbalance nested targets. For the first iteration, inheritedConfig is getting passed as empty object. Due to this, retry and override_params mentioned on top level will not trickle down to child targets.

Change c.env references to hono adapter env function

Currently gateway uses c.env at multiple places to use environment variables. But this syntax is only supported for workerd (cloudflare workers) runtime. To extend it for node runtime as well, env function from hono/adapter needs to used.

Example Usage:

import { env } from "hono/adapter";

function sampleFunction(c: Context) {
    const envVarOne = env(c).ENV_VAR_ONE
}

Documentation for hono adapter helpers: https://hono.dev/helpers/adapter#env

fix: adding anthropic version header

Description

The newer version of the anthropic APIs needs a header "anthropic-version": "2023-06-01" to be sent in the requests. This has been handled automatically on the SDKs on their end, but on the APIs this needs to be added. Hence this change.

Refer: Anthropic Docs

Add Gemini support

Add a new provider named google that will have support for new gemini APIs: embedContent and generateContent.
Add support for generateContent route that will map to chat completions route of rubeus.
Add support for embedContent route that will map to embeddings route of rubeus.
Handle streaming for generateContent route.

API Documentation for Gemini: https://ai.google.dev/api

fix duplicate camel case conversion for provider options in proxy

Currently, whenever a proxy call happens, we are doing snake case to camel case conversion twice. Due to this, ignoreFields are also getting converted to camelCase the second time. Need to keep only one conversion.

Fix: together-ai top_k param mapping

The chat complete config for together-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed

Current:

top_k: {
param: "top_p"
}

Fix:

top_k: {
param: "top_k"
}

[Provider] Martian Router

Based on their commits, the API is: https://route.withmartian.com/api/openai/v1

openai/openai-python@main...withmartian:martian-python-v1:main

[Provider] Add support for Huggingface inference endpoints

Docs: https://huggingface.co/docs/api-inference/quicktour

Fix: Adding the mappings for max_tokens and sequences in PaLM

The PaLM APIs were missing the mappings for max_tokens and sequences and as a result they were not being picked up.

Fix: perplexity-ai top_k param mapping

The chat complete config for perplexity-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed

Current:

top_k: {
    param: "top_p",
    min: 0,
    max: 2048
  }

Fix:

top_k: {
    param: "top_k",
    min: 0,
    max: 2048
  }

Fix: anthropic messages stream event parsing

Anthropic messages route sends these types of event: ping, message_start, content_block_start, content_block_delta, content_block_stop, message_delta and message_stop. Out of these event, content_block_delta and messade_delta has the required details for getting the standardized data out of it. However, there is a bug in the replace logic that is used to remove event and data tags from chunk before doing JSON parse.

Current (in src/providers/anthropic/chatComplete.ts:
chunk = chunk.replace(/^event: completion[\r\n]*/, "");
Due to this the actual content_block_delta and message_delta event do not get sanitized and causes issues in sending back the standardized stream chunk.

Solution:

chunk = chunk.replace(/^event: content_block_delta[\r\n]*/, "");
 chunk = chunk.replace(/^event: message_delta[\r\n]*/, "");

This will replace the event tags for the required events and successfully parse the rest of the chunk as JSON

Remove Rubeus mentions from contributing guide

#101

Fix google stream field

Rubeus currently picks up stream param from the transformedBody for Rubeus. But google does not have a stream param as it supports streaming through a different endpoint.

Solution: pick up stream from Rubeus body instead of transformed body

Remove "Rubeus" mentions

gateway/wrangler.toml

Line 2 in 1ccb963

compatibility_date = "2023-01-01"
gateway/src/handlers/proxyGetHandler.ts

Line 73 in 1ccb963

providerOptions: {...store.reqBody, provider: store.proxyProvider, requestURL: urlToFetch, rubeusURL: 'proxy'},
gateway/src/handlers/handlerUtils.ts

Line 246 in 1ccb963

providerOptions: {...providerOption, requestURL: url, rubeusURL: fn},
gateway/src/handlers/proxyHandler.ts

Line 173 in 1ccb963

providerOptions: {...store.reqBody, provider: store.proxyProvider, requestURL: urlToFetch, rubeusURL: 'proxy'},

Add prompt completions router to handle portkey managed prompt calls

Add a router that listens to /v1/prompts/:id/completions routes that is a part of portkey managed cloud deployment

Docs reference: https://docs.portkey.ai/docs/api-reference/prompts/prompt-completion

Support Anyscale Endpoints

Anyscale launched an OpenAI compatible API for open source model hosting.

https://app.endpoints.anyscale.com/

This allows us to support the following models:

dockerize gateway

Add a Dockerfile and an example docker-compose.yml file to build the official docker image. The same setup will be used to continuously update docker image on every release

[Provider] Add support for Google Vertex AI

Docs: https://cloud.google.com/vertex-ai/docs/reference/rest

fix: adding cachekey in proxy APIs

Description

The proxy calls is not sending in the cacheKey to the downstream services. This PR addresses the issue of sending the cacheKey for processing to the downstream services (if any).

[Provider] Add Lemonfox

https://www.lemonfox.ai/

Add together-ai integration

Add support for together ai serverless endpoints: https://docs.together.ai/docs/inference-models

Add tool calls and json mode support for anyscale chat completions

Support tools, tool_choice and response_format param support for anyscale chat completions

Add docker image publish github action

Add a docker image publish github action that pushed the latest official docker image on every release of this repo. The action should use buildx to build multiplatform docker image. And it should publish 2 tags on each release: portkeyai/gateway:latest and portkeyai/gateway:<latest-release-version> (example portkeyai/gateway:1.0.1)

Move prettier package to dev dependencies

prettier package should be a part of devDependencies and not dependencies.

portkey-ai / gateway Goto Github PK

gateway's People

Contributors

Stargazers

Watchers

Forkers

gateway's Issues

Description

Integration Tasks:

Description

Description

Recommend Projects

Recommend Topics

Recommend Org