Coder Social home page Coder Social logo

aws-samples / bedrock-access-gateway Goto Github PK

View Code? Open in Web Editor NEW
118.0 18.0 22.0 590 KB

OpenAI-Compatible RESTful APIs for Amazon Bedrock

License: MIT No Attribution

Dockerfile 0.37% Python 97.00% Shell 2.62%
bedrock genai openai openai-api openai-proxy proxy

bedrock-access-gateway's Introduction

中文

Bedrock Access Gateway

OpenAI-compatible RESTful APIs for Amazon Bedrock

Breaking Changes

The source code is refactored with the new Converse API by bedrock which provides native support with tool calls.

If you are facing any problems, please raise an issue.

Overview

Amazon Bedrock offers a wide range of foundation models (such as Claude 3 Opus/Sonnet/Haiku, Llama 2/3, Mistral/Mixtral, etc.) and a broad set of capabilities for you to build generative AI applications. Check the Amazon Bedrock landing page for additional information.

Sometimes, you might have applications developed using OpenAI APIs or SDKs, and you want to experiment with Amazon Bedrock without modifying your codebase. Or you may simply wish to evaluate the capabilities of these foundation models in tools like AutoGen etc. Well, this repository allows you to access Amazon Bedrock models seamlessly through OpenAI APIs and SDKs, enabling you to test these models without code changes.

If you find this GitHub repository useful, please consider giving it a free star ⭐ to show your appreciation and support for the project.

Features:

  • Support streaming response via server-sent events (SSE)
  • Support Model APIs
  • Support Chat Completion APIs
  • Support Tool Call (new)
  • Support Embedding API (new)
  • Support Multimodal API (new)

Please check Usage Guide for more details about how to use the new APIs.

Note: The legacy text completion API is not supported, you should change to use chat completion API.

Supported Amazon Bedrock models family:

  • Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
  • Meta Llama 2 / 3
  • Mistral / Mixtral
  • Cohere Command R / R+
  • Cohere Embedding

You can call the models API to get the full list of model IDs supported.

Note: The default model is set to anthropic.claude-3-sonnet-20240229-v1:0 which can be changed via Lambda environment variables (DEFAULT_MODEL).

Get Started

Prerequisites

Please make sure you have met below prerequisites:

  • Access to Amazon Bedrock foundation models.

For more information on how to request model access, please refer to the Amazon Bedrock User Guide (Set Up > Model access)

Architecture

The following diagram illustrates the reference architecture. Note that it also includes a new VPC with two public subnets only for the Application Load Balancer (ALB).

Architecture

You can also choose to use AWS Fargate behind the ALB instead of AWS Lambda, the main difference is the latency of the first byte for streaming response (Fargate is lower).

Alternatively, you can use Lambda Function URL to replace ALB, see example

Deployment

Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as us-west-2). The deployment will take approximately 3-5 minutes 🕒.

Step 1: Create your own custom API key (Optional)

Note: This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.

  1. Open the AWS Management Console and navigate to the Systems Manager service.
  2. In the left-hand navigation pane, click on "Parameter Store".
  3. Click on the "Create parameter" button.
  4. In the "Create parameter" window, select the following options:
    • Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
    • Description: Optionally, provide a description for the parameter.
    • Tier: Select Standard.
    • Type: Select SecureString.
    • Value: Any string (without spaces).
  5. Click "Create parameter".
  6. Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.

Step 2: Deploy the CloudFormation stack

  1. Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
  2. Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
    • ALB + Lambda

      Launch Stack

    • ALB + Fargate

      Launch Stack

  3. Click "Next".
  4. On the "Specify stack details" page, provide the following information:
    • Stack name: Change the stack name if needed.
    • ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g., BedrockProxyAPIKey). If you did not set up an API key, leave this field blank. Click "Next".
  5. On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
  6. Click "Next".
  7. On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
  8. Click "Create stack".

That is it! 🎉 Once deployed, click the CloudFormation stack and go to Outputs tab, you can find the API Base URL from APIBaseUrl, the value should look like http://xxxx.xxx.elb.amazonaws.com/api/v1.

SDK/API Usage

All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (bedrock) will be used.

Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: anthropic.claude-3-sonnet-20240229-v1:0)...

Example API Usage

export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
# For older versions
# https://github.com/openai/openai-python/issues/624
export OPENAI_API_BASE=<API base url>
curl $OPENAI_BASE_URL/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "anthropic.claude-3-sonnet-20240229-v1:0",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Example SDK Usage

from openai import OpenAI

client = OpenAI()
completion = client.chat.completions.create(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(completion.choices[0].message.content)

Please check Usage Guide for more details about how to use embedding API, multimodal API and tool call.

Other Examples

AutoGen

Below is an image of setting up the model in AutoGen studio.

AutoGen Model

LangChain

Make sure you use ChatOpenAI(...) instead of OpenAI(...)

# pip install langchain-openai
import os

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    temperature=0,
    openai_api_key=os.environ['OPENAI_API_KEY'],
    openai_api_base=os.environ['OPENAI_BASE_URL'],
)

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(prompt=prompt, llm=chat)

question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = llm_chain.invoke(question)
print(response)

FAQs

About Privacy

This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.

Why not used API Gateway instead of Application Load Balancer?

Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.

Which regions are supported?

This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.

  • US East (N. Virginia): us-east-1
  • US West (Oregon): us-west-2
  • Asia Pacific (Singapore): ap-southeast-1
  • Asia Pacific (Sydney): ap-southeast-2
  • Asia Pacific (Tokyo): ap-northeast-1
  • Europe (Frankfurt): eu-central-1
  • Europe (Paris): eu-west-3

Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.

Note that not all models are available in those regions.

Can I build and use my own ECR image

Yes, you can clone the repo and build the container image by yourself (src/Dockerfile) and then push to your ECR repo. You can use scripts/push-to-ecr.sh

Replace the repo url in the CloudFormation template before you deploy.

Can I run this locally

Yes, you can run this locally.

The API base url should look like http://localhost:8000/api/v1.

Any performance sacrifice or latency increase by using the proxy APIs

Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.

Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.

Any plan to support SageMaker models?

Currently, there is no plan to support SageMaker models. This may change provided there's a demand from customers.

Any plan to support Bedrock custom models?

Fine-tuned models and models with Provisioned Throughput are currently not supported. You can clone the repo and make the customization if needed.

How to upgrade?

To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image.

To do so, depends on which version you deployed:

  • Lambda version: Go to AWS Lambda console, find the Lambda function, then find and click the Deploy new image button and click save.
  • Fargate version: Go to ECS console, click the ECS cluster, go the Tasks tab, select the only task that is running and simply click Stop selected menu. A new task with latest image will start automatically.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

bedrock-access-gateway's People

Contributors

amazon-auto avatar daixba avatar dependabot[bot] avatar didier-durand avatar greenjerry avatar jgalego avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bedrock-access-gateway's Issues

Issue with concurrent requests on AWS Fargate

Describe the Bug
I am encountering an issue where concurrent requests are being processed sequentially rather than simultaneously when deployed on AWS Fargate.
I suspect the problem is that boto3 runs synchronously, and its calls are blocking.

API Details

  • API Used: /chat/completions
  • Model Used: all of them

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the service on AWS Fargate following the standard setup procedures.
  2. Send multiple concurrent requests (e.g., 10 concurrent requests) to the API.
  3. Observe that the requests are processed sequentially instead of concurrently.

Expected Behavior
I expected that when sending multiple concurrent requests to the API, all requests would be handled simultaneously or at least as many as the server can handle

When calling the claude-3 API through the Bedrock Access Gateway, the 'system' parameter does not take effect.

When calling the claude-3 API through the Bedrock Access Gateway, the 'system' parameter does not take effect.

request demo 'curl /chat/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer "
-d '{
"system":"我是一个中文用户,你的任何回答必须使用中文回答",
"max_tokens":100000,
"model": "anthropic.claude-3-haiku-20240307-v1:0",
"messages": [
{
"role": "user",
"content": "hi!"
}
]
}''

response json :

{
"id": "",
"created": "",
"model": "anthropic.claude-3-haiku-20240307-v1:0",
"system_fingerprint": "fp",
"choices": [
{
"index": 0,
"finish_reason": "end_turn",
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
}
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

multiple API keys and quota per key?

Is your feature request related to a problem? Please describe.
How do I have multiple API keys and apply different quota and throttle limit per key?

SSL TLS Support

Hello, I noticed this project deploys only a http url, SSL should probably be the default

Missing number of Input token and output token in output response.

when an API request is sent to the bedrock-mistral model below details are missing.
"prompt_tokens":0,"completion_tokens":0,"total_tokens":0

Please complete the following information:
API used: api/v1/chat/completions
model used: mistral.mistral-7b-instruct-v0:2

To Reproduce
curl http://albenpoint/api/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer bedrock" -d '{
"model": "mistral.mistral-7b-instruct-v0:2",
"messages": [{"role":"user","content":"what is hyperloop"}]
}'

Expected behavior
Output response must contain a number of input tokens and output tokens, prompt tokens

Screenshots
image

stream resposne contains a null tool_calls , invalid for typescript resposne validation

Describe the bug

tool_calls should be an empty array, not missing. Should not be null. The null is causing typescript type validation error.

{
    "id": "chatcmpl-bc23d19d",
    "created": 1717383192,
    "model": "meta.llama3-70b-instruct-v1:0",
    "system_fingerprint": "fp",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "logprobs": null,
            "delta": {
                "role": "assistant",
                "content": "",
                "tool_calls": null
            }
        }
    ],
    "object": "chat.completion.chunk",
    "usage": null
}

In OpenAI's official sdk, tool_calls can be undefined or empty_array, but not null.

The type is defined here https://github.com/openai/openai-node/blob/fd70373450d6c39ff55d984a2ff13ea7a7df23d1/src/resources/chat/completions.ts#L434

export namespace Choice {
    /**
     * A chat completion delta generated by streamed model responses.
     */
    export interface Delta {
      /**
       * The contents of the chunk message.
       */
      content?: string | null;

      /**
       * @deprecated: Deprecated and replaced by `tool_calls`. The name and arguments of
       * a function that should be called, as generated by the model.
       */
      function_call?: Delta.FunctionCall;

      /**
       * The role of the author of this message.
       */
      role?: 'system' | 'user' | 'assistant' | 'tool';

      tool_calls?: Array<Delta.ToolCall>;
    }

Please complete the following information:

  • Which API you used: /chat/completions
  • Which model you used: meta.llama3-70b-instruct-v1:0

To Reproduce
Steps to reproduce the behavior. If possible, please share an example request.

Expected behavior
I understand that python allows null and undefined to be None. A more acceptable behavior will be omit the value in response if model does not support tool_calls or some optional field

Screenshots
If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context
Add any other context about the problem here.

CloudFormation stack Upgrade docs (Llama 3)

Is your feature request related to a problem? Please describe.
I am trying to use Llama 3 and see the src has been updated, but the template links in the README do not include these changes.

Describe the feature you'd like
Please describe how I can use src to upgrade my existing CloudFormation stacks that were created from the templates listed in the README.

你好呀!finish_reason到底是end_turn还是stop呀?你的代码写的是end_turn,我查看了一下openai写的是stop,求指教!

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo-0125", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo-0125", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

....

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-3.5-turbo-0125", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
上面是我从openai的官网摘下来的。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.