bananaml / serverless-template Goto Github PK

View Code? Open in Web Editor NEW

83.0 1.0 105.0 51 KB

License: MIT License

Dockerfile 17.28% Python 82.72%

serverless-template's Introduction

🍌 Banana Serverless

This repo gives a framework to serve ML models in production using simple HTTP servers.

Quickstart

Follow the quickstart guide in Banana's documentation to use this repo.

(choose "GitHub Repository" deployment method)

Helpful Links

Understand the 🍌 Serverless framework and functionality of each file within it.

Generalize this framework to deploy anything on Banana.

Use Banana for scale.

serverless-template's People

Contributors

Stargazers

Watchers

Forkers

realsuperheavy wouterixly airchalk kylejmorris teamtrendage ruishenl anirudhrahul kekayan ramsrigouthamg trevorwieland robinsingh1 helloscribe s1223 johnmg94 vital1188 chlorobyte-but-real csmfindling stanleyjacob markliuyuxiang impulsecorp codelikethe90s binglinchengxiash xeb michaeljelly robinkruyt xwalways fuciuss camdenclark kkproc sreerag-ibtl jochemstoel aiera-inc danielreuter andrewpelton sarathharidas vvkishere deadzone245 golfbeta erdene-ochir0417 josepozuelo dai9000 andrewrwilliams bishwenduk029 boulama supereon akhazr gratuity b2gdevs drewburns navapro djt-test cyancity jdagdelen novocaine1926 mowenuk voctory igandarillas ozerbdy jwhscholten aj001ith friediisch kiselev1189 yagn-psytech chordapp khazamaa ishaan-jaff awilliamson10 crossvalidator akakakakakaa jmsaucier sandorkonya ironjayx nickaggarwal hades32 cicero-ly nurgel ujjawalpeak01 yadheedhya06 s-1-n-t-h harrismleng oreee rosanne-odiero soph-i3 varunkuntal mulella142 emmanueldmlr doctorslimm reinhardtpalko enpro-github lukesutor asmagina sentient-22 danangcode pbarker alexgenovese wangduan023

serverless-template's Issues

git LFS

Can I use git lfs to upload my own big model?

Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly.

Hi,

The GPU performance is similar to my CPU for small-medium videos due to extra io processing/encoding.

For my application, I use FastAPI for majority of my core functionality. However, I require a GPU to transcribe video/audio files to retrieve transcriptions, and decided to use Bananaml as serverless GPU seems like it would be much cheaper than Kubernetes.

How can I pass this UploadFile = File(...) object from FastAPI (spooled temporary file) to my bananaml API, instead of sending an encoded byte string from reading a mp3/mp4 file that is saved locally.

Old way (Faster on my CPU compared to bananaML)

Upload video on web page -> write file into temporary file -> pass to Whisper

New Way (GPU with BananaML)

Upload video on web page-> Save file locally ->Read bytes from file locally -> decode to json format -> pass to whisper.

I get there has to be an extra io operation to send the video information to the GPU, but the way recommended in the template is highly efficient, I wish I could pass the the file like object as done with FastAPI

Thanks,

Viraj

After pushing changes to github repo, build not kicking off

It just shows Building... see logs below in case of build error. but there are no logs.

Sample Dockerfile won't build

The example Dockerfile fails to build. The line

RUN apt-get update && apt-get install -y git

gives this error:

Reading package lists...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.
The command '/bin/sh -c apt-get update && apt-get install -y git' returned a non-zero code: 100
ERROR: Service 'app' failed to build : Build failed

which smells like the image has apt-get configured with some old nvidia key. But upgrading to a newer pytorch base image pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel doesn't fix it.

Not able to load bigscience/bloom

After the container has been build, seems like it failed to download the complete model from bigscience/bloom
Is it possible the host doesn't have enough disk space?

Error response from daemon: Cannot locate specified Dockerfile: Dockerfile

Hello guys,
I want to deploy from my github repo but I have message:

Error response from daemon: Cannot locate specified Dockerfile: Dockerfile

My Docker file:

# Must use a Cuda version 11+
FROM pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime

ARG HUGGING_FACE_HUB_TOKEN=""
ENV HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}

WORKDIR /

COPY . .

# Install 
RUN apt-get update \
    && apt-get install -y git \
    && apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 \
    && rm -rf /var/lib/apt/lists/*

# Install python packages
RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt

#Download models
RUN python3 download.py

EXPOSE 8000

CMD python3 -u server.py

Locally working without errors.
Could you help me please

GPU not detected if using the latest TensorFlow library.

How to solve this problem?

My code:
app.py:

import tensorflow as tf

# Init is ran on server startup
# Load your model to GPU as a global variable here using the variable name "model"
def init():
    pass

# Inference is ran for every server call
# Reference your preloaded global model variable here.
def inference(model_inputs:dict) -> dict:
    # Parse out your arguments
    prompt = model_inputs.get('url', None)
    if prompt == None:
        return {'message': "No url provided"}
    
    # Run the model
    physical_devices = tf.config.list_physical_devices('GPU')
    result = f"Number of GPU : {len(physical_devices)}"

    # Return the results as a dictionary
    return result

download.py :

# In this file, we define download_model
# It runs during container build time to get model weights built into the container

# In this example: A Huggingface BERT model

import tensorflow as tf

def download_model():
    # do a dry run of loading the huggingface model, which will download weights
    pass

if __name__ == "__main__":
    download_model()

requirements.txt

sanic
transformers
accelerate
torch
tensorflow

test2.py :

import banana_dev as banana
import time

api_key = "api_key" # "YOUR_API_KEY"
model_key = "model_key" # "YOUR_MODEL_KEY"
model_inputs = {'url': 'Hello I am a [MASK] model.'}

startTime = time.time()
out = banana.run(api_key, model_key, model_inputs)
print(out)
endTime =  time.time()
print("Time: ", endTime - startTime)

Result:

{'id': '7fb0bec8-8875-408e-99ea-12c867de3e19', 'message': '', 'created': 1658303998, 'apiVersion': '12 May 2022', 'modelOutputs': ['Number of GPU : 0']}
Time:  8.601353645324707