Coder Social home page Coder Social logo

aws-samples / zero-administration-inference-with-aws-lambda-for-hugging-face Goto Github PK

View Code? Open in Web Editor NEW
61.0 4.0 23.0 257 KB

Zero administration inference with AWS Lambda for ๐Ÿค—

License: Other

Python 74.91% Dockerfile 25.09%
aws lambda huggingface ecr efs docker machine-learning transformer nlp

zero-administration-inference-with-aws-lambda-for-hugging-face's Introduction

Zero administration inference with AWS Lambda for ๐Ÿค—

Note: This is not production code and simply meant as a demo

Hugging Face Transformers is a popular open-source project that provides pre-trained, natural language processing (NLP) models for a wide variety of use cases. Customers with minimal machine learning experience can use pre-trained models to enhance their applications quickly using NLP. This includes tasks such as text classification, language translation, summarization, and question answering - to name a few.

Overview

Our solution consists of an AWS Cloud Development Kit (AWS CDK) script that automatically provisions container image-based Lambda functions that perform ML inference using pre-trained Hugging Face models. This solution also includes Amazon Elastic File System (EFS) storage that is attached to the Lambda functions to cache the pre-trained models and reduce inference latency.

Architecture diagram In this architectural diagram:

  1. Serverless inference is achieved by using Lambda functions that are based on container image
  2. The container image is stored in an Amazon Elastic Container Registry (ECR) repository within your account
  3. Pre-trained models are automatically downloaded from Hugging Face the first time the function is invoked
  4. Pre-trained models are cached within Amazon Elastic File System storage in order to improve inference latency

The solution includes Python scripts for two common NLP use cases:

  • Sentiment analysis: Identifying if a sentence indicates positive or negative sentiment. It uses a fine-tuned model on sst2, which is a GLUE task.
  • Summarization: Summarizing a body of text into a shorter, representative text. It uses a Bart model that was fine-tuned on the CNN / Daily Mail dataset. For simplicity, both of these use cases are implemented using Hugging Face pipelines.

Prerequisites

The following is required to run this example:

Deploying the example application

  1. Clone the project to your development environment:
git clone <https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face.git>
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Bootstrap the CDK. This command provisions the initial resources needed by the CDK to perform deployments:
cdk bootstrap
  1. This command deploys the CDK application to its environment. During the deployment, the toolkit outputs progress indications:
cdk deploy

Understanding the code structure

The code is organized using the following structure:

โ”œโ”€โ”€ inference
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ”œโ”€โ”€ sentiment.py
โ”‚   โ””โ”€โ”€ summarization.py
โ”œโ”€โ”€ app.py
โ””โ”€โ”€ ...

The inference directory contains:

  • The Dockerfile used to build a custom image to be able to run PyTorch Hugging Face inference using Lambda functions
  • The Python scripts that perform the actual ML inference

The sentiment.py script shows how to use a Hugging Face Transformers model:

import json
from transformers import pipeline

nlp = pipeline("sentiment-analysis")

def handler(event, context):
response = {
    "statusCode": 200,
    "body": nlp(event['text'])[0]
}
return response

For each Python script in the inference directory, the CDK generates a Lambda function backed by a container image and a Python inference script.

CDK script

The CDK script is named app.py in the solution's repository. The beginning of the script creates a virtual private cloud (VPC).

vpc = ec2.Vpc(self, 'Vpc', max_azs=2)

Next, it creates the EFS file system and an access point in EFS for the cached model:

fs = efs.FileSystem(self, 'FileSystem',
vpc=vpc,
removal_policy=RemovalPolicy.DESTROY)
access_point = fs.add_access_point('MLAccessPoint',
create_acl=efs.Acl(
owner_gid='1001', owner_uid='1001', permissions='750'),
path="/export/models",
posix_user=efs.PosixUser(gid="1001", uid="1001"))

It iterates through the Python files in the inference directory:

docker_folder = os.path.dirname(os.path.realpath(__file__)) + "/inference"
pathlist = Path(docker_folder).rglob('*.py')
for path in pathlist:

And then creates the Lambda function that serves the inference requests:

base = os.path.basename(path)
filename = os.path.splitext(base)[0]
# Lambda Function from docker image
function = lambda_.DockerImageFunction(
    self, filename,
    code=lambda_.DockerImageCode.from_image_asset(docker_folder,
    cmd=[filename+".handler"]),
    memory_size=8096,
    timeout=Duration.seconds(600),
    vpc=vpc,
    filesystem=lambda_.FileSystem.from_efs_access_point(
    access_point, '/mnt/hf_models_cache'),
    environment={
        "TRANSFORMERS_CACHE": "/mnt/hf_models_cache"},
    )

Adding a translator

Optionally, you can add more models by adding Python scripts in the inference directory. For example, add the following code in a file called translate-en2fr.py:

import json
from transformers
import pipeline

en_fr_translator = pipeline('translation_en_to_fr')

def handler(event, context):
    response = {
        "statusCode": 200,
        "body": en_fr_translator(event['text'])[0]
    }
    return response

Then run:

$ cdk synth
$ cdk deploy

This creates a new endpoint to perform English to French translation.

Cleaning up

After you are finished experimenting with this project, run cdk destroy to remove all of the associated infrastructure.

License

This library is licensed under the MIT No Attribution License. See the LICENSE file. Disclaimer: Deploying the demo applications contained in this repository will potentially cause your AWS Account to be billed for services.

Links

zero-administration-inference-with-aws-lambda-for-hugging-face's People

Contributors

amazon-auto avatar cyranob avatar swartchris8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

zero-administration-inference-with-aws-lambda-for-hugging-face's Issues

AttributeError: module 'aws_cdk' has no attribute 'cx_api' when running cdk bootstrap

I am following the instructions to deploy the model on AWS and get the following error when running cdk bootstrap:

 File "app.py", line 8, in <module>
    from aws_cdk import (
  File "/Users/alioskooei/opt/anaconda3/envs/nlp/lib/python3.7/site-packages/aws_cdk/__init__.py", line 22552, in <module>
    from . import aws_acmpca
  File "/Users/alioskooei/opt/anaconda3/envs/nlp/lib/python3.7/site-packages/aws_cdk/aws_acmpca/__init__.py", line 79, in <module>
    from ._jsii import *
  File "/Users/alioskooei/opt/anaconda3/envs/nlp/lib/python3.7/site-packages/aws_cdk/aws_acmpca/_jsii/__init__.py", line 11, in <module>
    import aws_cdk.core._jsii
  File "/Users/alioskooei/opt/anaconda3/envs/nlp/lib/python3.7/site-packages/aws_cdk/core/__init__.py", line 6643, in <module>
    class ConstructNode(metaclass=jsii.JSIIMeta, jsii_type="@aws-cdk/core.ConstructNode"):
  File "/Users/alioskooei/opt/anaconda3/envs/nlp/lib/python3.7/site-packages/aws_cdk/core/__init__.py", line 6694, in ConstructNode
    runtime_info: typing.Optional[aws_cdk.cx_api.RuntimeInfo] = None,
AttributeError: module 'aws_cdk' has no attribute 'cx_api'

Unfortunately, I could not find any tips online as to why I am seeing this error. I have installed the requirements according to the instructions and am using the following package versions:

CDK 2.12.0 (build c9786db
Node v16.3.0
Python 3.7.11
aws-cli/2.4.16
npm 7.15.1

I would appreciate any tips on how to resolve this issue. Thank you.

Amazon Elastic Compute Cloud NatGateway costs

The tutorial creates costs under Amazon Elastic Compute Cloud NatGateway.

There were two unassigned Elastic IP's on my account and I think it also builds NAT Gateway, which isn't part of free tier. Is there any way to run the tutorial without these costs?

Repeated Inferences with pipeline on lambda

Thanks for your response on Q&A question in other issue.
With regard to multiple inferences, is there any precaution to take?

I was hoping that I just just call the model repeatedly in loop.

	import json
	from transformers import pipeline
	import requests
	question_answerer = pipeline("question-answering")
	
    def handler(event, context):
	    questionsetList['questionlist']
	    answerlist = []
	    for question in questionsetList:
		    answer = question_answerer({'question':question,'context':event['context']})
		    answerlist.push(answer)
            return jsonify({"Result": answerlist})

I got the following error on lambda test event.
START RequestId: b06fd2cb-54df-4807-91c8-34ea7cfb614f Version: $LATEST
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
/usr/local/lib/python3.6/dist-packages/joblib/_multiprocessing_helpers.py:45: UserWarning: [Errno 38] Function not implemented. joblib will operate in serial mode
warnings.warn('%s. joblib will operate in serial mode' % (e,))
questions before splitting by ? mark

  1. Why are you troubled?~ 2.Who is the person to blame? ~3. How long are you frustrated about this?
    Traceback (most recent call last):
    File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "main", mod_spec)
    File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
    File "/function/awslambdaric/main.py", line 20, in
    main(sys.argv)
    File "/function/awslambdaric/main.py", line 16, in main
    bootstrap.run(app_root, handler, lambda_runtime_api_addr)
    File "/function/awslambdaric/bootstrap.py", line 415, in run
    log_sink,
    File "/function/awslambdaric/bootstrap.py", line 171, in handle_event_request
    log_error(error_result, log_sink)
    File "/function/awslambdaric/bootstrap.py", line 122, in log_error
    log_sink.log_error(error_message_lines)
    File "/function/awslambdaric/bootstrap.py", line 306, in log_error
    sys.stdout.write(error_message)
    File "/function/awslambdaric/bootstrap.py", line 283, in write
    self.stream.write(msg)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-80: ordinal not in range(128)
    END RequestId: b06fd2cb-54df-4807-91c8-34ea7cfb614f
    REPORT RequestId: b06fd2cb-54df-4807-91c8-34ea7cfb614f Duration: 22056.43 ms Billed Duration: 22057 ms Memory Size: 8096 MB Max Memory Used: 962 MB
    RequestId: b06fd2cb-54df-4807-91c8-34ea7cfb614f Error: Runtime exited with error: exit status 1
    Runtime.ExitError

It appeared like I can not call the model in a loop. In other implementations without pipeline I had used model in a loop.

Please suggest if there is any specific precaution like clean up required before calling for second question.

Thanks in advance.

Input Sample for Question Answering Pipeline

Hi,
This is a great project. It worked for sentiment analysis example. However, my need is question answering use case.

I created myquestionanswer.py as below
import json
from transformers import pipeline

import json
from transformers import pipeline

summarizer = pipeline("question-answering")

def handler(event, context):
response = {
"statusCode": 200,
"body": summarizer(event['article'])[0]
}
return response

i.e. Only change I made is the string in pipeline parameter. Now it is 'question-answering'

What is the json input format to be given at Lambda test ? I tried the following. Both failed:

  1. { "context": "My name is Rama. Sita is his wife", "question": " what is your name?"}
  2. {"context": questions": [ "What is the name?", "Who is his wife?"] }

I saw other huggingface examples. These aren't applicable since they directly feed into model and are not helpful.

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.