Coder Social home page Coder Social logo

portkey-ai / gateway Goto Github PK

View Code? Open in Web Editor NEW
4.5K 33.0 297.0 41.42 MB

A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.

Home Page: https://portkey.ai/features/ai-gateway

License: MIT License

TypeScript 99.65% JavaScript 0.20% Dockerfile 0.14%
gateway generative-ai llmops llms prompt-engineering ai-gateway langchain llama-index openai router

gateway's People

Contributors

aashsach avatar aravindputrevu avatar ayush-portkey avatar csgulati09 avatar eltociear avatar flexchar avatar meronogbai avatar michaelyuhe avatar noble-varghese avatar roh26it avatar saif-shines avatar satvik314 avatar sk-portkey avatar suraj-bhandarkar-s avatar visargd avatar vrushankportkey avatar ye4293 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gateway's Issues

feat: Palm integration

Description

We aim to enhance the capabilities of our AI Gateway by integrating Google's PaLM LLM Model, commonly known as "Palm," into the existing infrastructure. This issue serves as a discussion and tracking point for the integration efforts.

Integration Tasks:

  • Model Integration: Develop a component to integrate Google's Palm Detection Model into the AI Gateway. This component will handle the request reponse of the inference APIs from the model.

  • Pre-processing and Post-processing: Implement necessary pre-processing and post-processing steps.

  • API and Input/Output Design: Define an API for interacting with the Palm Detection Model within the AI Gateway, specifying input parameters and expected output format.

  • Documentation: Update the AI Gateway's documentation to include information on how to use the newly integrated Palm LLM Model.

Streamline cache function input

  • Cache function input arguments are a bit different for /proxy routes and /complete or /chatComplete route. We need to streamline them so that it stays consistent

Add hono compress middleware for non-workerd and non-lagon runtimes

Add a new middleware (hono compress) to the app router that conditionally applies the compression middleware based on the runtime. Compression is automatically handled for lagon and workerd runtimes but not for other runtime like node. The middleware should check if the runtime is neither lagon nor workerd before applying the compression middleware to avoid double compression.

Changes required:

Add a hono compress middleware to the API router.
Conditionally apply compression based on the runtime. compression should not happen for workerd and lagon environment as they are automatically handled.

fix: adding cachekey in proxy APIs

Description

The proxy calls is not sending in the cacheKey to the downstream services. This PR addresses the issue of sending the cacheKey for processing to the downstream services (if any).

Fix google stream field

Rubeus currently picks up stream param from the transformedBody for Rubeus. But google does not have a stream param as it supports streaming through a different endpoint.

Solution: pick up stream from Rubeus body instead of transformed body

Add docker image publish github action

Add a docker image publish github action that pushed the latest official docker image on every release of this repo. The action should use buildx to build multiplatform docker image. And it should publish 2 tags on each release: portkeyai/gateway:latest and portkeyai/gateway:<latest-release-version> (example portkeyai/gateway:1.0.1)

Fix: compress middleware conflict for node runtime due to response content-length header

When content-length header provided by llm provider is forwarded in a node runtime, hono compress middleware sends response with a delay and a stream-like behaviour. This is happening because compress middleware is getting some conflicts while sending transfer-encoding: chunked but the response also has a content-length header

Solution: Delete the content-length header from response.

Current:

Image

After the change:

Image

Fix: perplexity-ai top_k param mapping

The chat complete config for perplexity-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed

Current:

top_k: {
    param: "top_p",
    min: 0,
    max: 2048
  }

Fix:

top_k: {
    param: "top_k",
    min: 0,
    max: 2048
  }

Allow NPX execution

We should allow rubeus to be run locally through NPX.

npx rubeus

should work

Support GET requests on the gateway

The gateway today only supports POST requests. We want to also support GET requests on this to support operations like getModels, getRuns which are available on OpenAI.

We could build this generically to ensure all GET requests can be routed and handled in middleware

Fix and refactor tryTargetsRecursively function

  • Convert all the if/else-if conditions to switch statement to make it more readable.
  • Remove all tryPost calls added in fallback/loadbalance/single mode mode handlers. As a recursive function, it should keep travelling through targets until it reaches a leaf target (target without any nested/child targets). So only one block should make tryPost call from the whole function. This will make it a bit more simple and truly recursive.
  • Fix issue where inherited config was not getting picked up for fallback or loadbalance nested targets. For the first iteration, inheritedConfig is getting passed as empty object. Due to this, retry and override_params mentioned on top level will not trickle down to child targets.

Fix: together-ai top_k param mapping

The chat complete config for together-ai has a bug in it. The param input param top_k currently maps to top_p. This needs to be fixed

Current:

top_k: {
param: "top_p"
}

Fix:

top_k: {
param: "top_k"
}

Change c.env references to hono adapter env function

Currently gateway uses c.env at multiple places to use environment variables. But this syntax is only supported for workerd (cloudflare workers) runtime. To extend it for node runtime as well, env function from hono/adapter needs to used.

Example Usage:

import { env } from "hono/adapter";

function sampleFunction(c: Context) {
    const envVarOne = env(c).ENV_VAR_ONE
}

Documentation for hono adapter helpers: https://hono.dev/helpers/adapter#env

fix: adding anthropic version header

Description

The newer version of the anthropic APIs needs a header "anthropic-version": "2023-06-01" to be sent in the requests. This has been handled automatically on the SDKs on their end, but on the APIs this needs to be added. Hence this change.

Refer: Anthropic Docs

Fix: anthropic messages stream event parsing

Anthropic messages route sends these types of event: ping, message_start, content_block_start, content_block_delta, content_block_stop, message_delta and message_stop. Out of these event, content_block_delta and messade_delta has the required details for getting the standardized data out of it. However, there is a bug in the replace logic that is used to remove event and data tags from chunk before doing JSON parse.

Current (in src/providers/anthropic/chatComplete.ts:
chunk = chunk.replace(/^event: completion[\r\n]*/, "");
Due to this the actual content_block_delta and message_delta event do not get sanitized and causes issues in sending back the standardized stream chunk.

Solution:

chunk = chunk.replace(/^event: content_block_delta[\r\n]*/, "");
 chunk = chunk.replace(/^event: message_delta[\r\n]*/, "");

This will replace the event tags for the required events and successfully parse the rest of the chunk as JSON

Rename to gateway

Making it easier for people, we can rename this repo as gateway.

We could then install it like

npx @portkey-ai/gateway

Support generic /chat/completions and /completions route for azure proxy + configs call

Currently, even for proxy calls with configs, user needs to send the actual azure config in the url itself. But when you pass a config in proxy call, it overwrites the url with whats written in config which causes confusion. Need to have a generic interface for proxy calls which just accepts type (chat/text) and then takes the provider options from config itself.

Current proxy url for azure: {{host}}/v1/proxy/{{RESOURCE}}.openai.azure.com/openai/deployments/{{DEPLOYMENT_ID}}/chat/completions?api-version={{API_VERSION}}

The problem with this implementation is that when you pass a config with the API call, it overwrites whatever is written in url which causes confusion.

New implementation for config enabled proxy calls:
{{host}}/v1/proxy/chat/completions
{{host}}/v1/proxy/completions

NOTE: The old implementation is still applicable if users are making simple proxy calls without config.

Implement new routes and config structure

  • Introduce 3 new routes: /v1/chat/completions, /v1/completions and /v1/embeddings.
    All the 3 routes will follow openAI request and response structure and will provide unified interface across all the supported providers. Because of same API signatures, rubeus deployment can be used either as a REST API, through openAI SDK or Portkey SDK.

Signatures:

  1. Chat completions
    Using REST API
curl {{BASE_URL}}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ]
  }'
  1. Completions
curl '{{BASE_URL}}/v1/completions' \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "n": 2,
    "model": "text-davinci-003",
    "top_p": 1,
    "prompt": "Write an essay about Indiaaa",
    "stream": false,
    "max_tokens": 10,
    "temperature": 0.5
}'
  1. Embeddings
curl '{{BASE_URL}}/v1/embeddings' \
  -H "Content-Type: application/json" \
  -H "x-rubeus-provider: openai" \
  -H "Authorization: <openai-key>" \
  -d '{
    "input": ["Hello", "Hello"]
}'

Using OpenAI SDK

from openai import OpenAI

openai_client = OpenAI(
    base_url={{BASE_URL_WITH_V1}},
    default_headers={
        'x-rubeus-provider': 'openai'
    }
)
response = openai_client.chat.completions.create(
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Hello!'}],
        model='gpt-3.5-turbo'
)
  • Change /v1/proxy/* to /v1/*
    Introduce a new route handler that allows making proxy calls (without unified req/res interface). The old route /v1/proxy will be deprecated and /v1/* route will replace it.

Example

  1. Rerank
curl --request POST \
     --url {{BASE_URL}}/v1/rerank \
     --header 'content-type: application/json' \
     --header 'Authorization: Bearer $COHERE_API_KEY' \
     --header 'x-rubeus-provider: cohere' \
     --data '
{
  "return_documents": false,
  "max_chunks_per_doc": 10,
  "model": "rerank-english-v2.0",
  "query": "What is the capital of the United States?",
  "documents": [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
    "Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
  ]
}
'
  • Implement dumb proxy for openAI GET and DELETE routes
    This will allow users to keep the url same even when they are using different routes like files, moderations, etc. using openAI SDK.

  • Implement new configs structure
    The new config structure will allow nested targets for each mode. This will make it more flexible and powerful. It will allow users to implement very custom scenarios like doing loadbalance between 2 targets which both have their individual fallback implements. The examples look like this

  1. Simple loadbalance
{
  "strategy": {
      "mode": "loadbalance"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "openai",
      "api_key": "sk-***"
    }
  ]
}
  1. Simple fallback
{
  "strategy": {
      "mode": "fallback"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "openai",
      "api_key": "sk-***"
    }
  ]
}
  1. loadbalance with nested fallback
{
  "strategy": {
      "mode": "loadbalance"
    },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "strategy": {
          "mode": "fallback",
          "on_status_codes": [429, 241]
        },
      "targets": [
        {
          "provider": "openai",
          "api_key": "***"
        },
        {
          "provider": "openai"
          "api_key": "***"
        }
      ]
    }
  ]
}

DEPRECATIONs to be done soon (in next few weeks):

  • We will remove the following routes: /v1/chatComplete, /v1/complete, /v1/embed and /v1/proxy/*
  • We will also remove handler for the old config structure.

Remove "Rubeus" mentions

Support short config with provider in proxy calls

Currently Rubeus does not support shortconfigs which just has a provider on top-level of config. It does not pick up provider details and key from the config and try searching headers.

Sample short config

{
    "provider": "openai",
    "api_key": "sk****xyz
}

Expected: Rubeus proxy calls should pick up provider details from config if its passed.

Support together-ai chat completions route

  • Currently, Rubeus makes internal transformations for together-ai chat completions and makes a corresponding completions call. But now together-ai has also introduced chat completions route so the transformations can be removed and rubeus can directly use the route

Add Gemini support

  • Add a new provider named google that will have support for new gemini APIs: embedContent and generateContent.
  • Add support for generateContent route that will map to chat completions route of rubeus.
  • Add support for embedContent route that will map to embeddings route of rubeus.
  • Handle streaming for generateContent route.

API Documentation for Gemini: https://ai.google.dev/api

dockerize gateway

Add a Dockerfile and an example docker-compose.yml file to build the official docker image. The same setup will be used to continuously update docker image on every release

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.