LLaVA: Large Language and Vision Assistant | RunPod Serverless Worker

This is the source code for a RunPod Serverless worker for LLaVA: Large Language and Vision Assistant.

Model

LLaVA-v1.6

Model	Environment Variable Value	Version	LLM	Default
llava-v1.6-vicuna-7b	liuhaotian/llava-v1.6-vicuna-7b	LLaVA-1.6	Vicuna-7B	no
llava-v1.6-vicuna-13b	liuhaotian/llava-v1.6-vicuna-13b	LLaVA-1.6	Vicuna-13B	no
llava-v1.6-mistral-7b	liuhaotian/llava-v1.6-mistral-7b	LLaVA-1.6	Mistral-7B	yes
llava-v1.6-34b	liuhaotian/llava-v1.6-34b	LLaVA-1.6	Hermes-Yi-34B	no

LLaVA-v1.5

Model	Environment Variable Value	Version	Size	Default
llava-v1.5-7b	liuhaotian/llava-v1.5-7b	LLaVA-1.5	7B	no
llava-v1.5-13b	liuhaotian/llava-v1.5-13b	LLaVA-1.5	13B	no
BakLLaVA-1	SkunkworksAI/BakLLaVA-1	LLaVA-1.5	7B	no

Testing

Building the Docker image that will be used by the Serverless Worker

There are two options:

Network Volume
Standalone (without Network Volume)

RunPod API Endpoint

You can send requests to your RunPod API Endpoint using the /run or /runsync endpoints.

Requests sent to the /run endpoint will be handled asynchronously, and are non-blocking operations. Your first response status will always be IN_QUEUE. You need to send subsequent requests to the /status endpoint to get further status updates, and eventually the COMPLETED status will be returned if your request is successful.

Requests sent to the /runsync endpoint will be handled synchronously and are blocking operations. If they are processed by a worker within 90 seconds, the result will be returned in the response, but if the processing time exceeds 90 seconds, you will need to handle the response and request status updates from the /status endpoint until you receive the COMPLETED status which indicates that your request was successful.

RunPod API Examples

Generate

Endpoint Status Codes

Status	Description
IN_QUEUE	Request is in the queue waiting to be picked up by a worker. You can call the `/status` endpoint to check for status updates.
IN_PROGRESS	Request is currently being processed by a worker. You can call the `/status` endpoint to check for status updates.
FAILED	The request failed, most likely due to encountering an error.
CANCELLED	The request was cancelled. This usually happens when you call the `/cancel` endpoint to cancel the request.
TIMED_OUT	The request timed out. This usually happens when your handler throws some kind of exception that does return a valid response.
COMPLETED	The request completed successfully and the output is available in the `output` field of the response.

Serverless Handler

The serverless handler (rp_handler.py) is a Python script that handles the API requests to your Endpoint using the runpod Python library. It defines a function handler(event) that takes an API request (event), runs the inference using LLaVA with the input, and returns the output in the JSON response.

Acknowledgements

LLaVA

Additional Resources

Community and Contributing

Pull requests and issues on GitHub are welcome. Bug fixes and new features are encouraged.

ashleykleynhans / runpod-worker-llava Goto Github PK

runpod-worker-llava's Introduction

LLaVA: Large Language and Vision Assistant | RunPod Serverless Worker

Model

LLaVA-v1.6

LLaVA-v1.5

Testing

Building the Docker image that will be used by the Serverless Worker

RunPod API Endpoint

RunPod API Examples

Endpoint Status Codes

Serverless Handler

Acknowledgements

Additional Resources

Community and Contributing

Appreciate my work?

runpod-worker-llava's People

Contributors

Stargazers

Watchers

Forkers

runpod-worker-llava's Issues

Recommend Projects

Recommend Topics

Recommend Org