Comments (14)
Hi @austinmw ,
Yes, the 5MB is a hard limit imposed by the SageMaker platform as documented here.
Typically exceeding the 5MB limit is cause by:
- Sending too many small records in a single request to a live endpoint. In which case, batch transform could be used instead.
- Having very large single records (e.g. videos or high resolution images). In which case, storing the file in S3, sending the S3 path, and having the container pick up the S3 object based on the path is a common workaround.
Thanks.
from amazon-sagemaker-examples.
If you're using an nginx
server as part of your custom Docker image, you may need to change the value of client_max_body_size
within your ngix.conf
file.
I set client_max_body_size
to 0, which allows for an unlimited body size.
from amazon-sagemaker-examples.
@dorg-jmiller I tried that, but was still running into the 5MiB limit. Have you been able to send a large payload (for ex. 10 MB) by modifying client_max_body_size
? AWS phone support told me that 5 MiB was a hard limit regardless, but maybe they were wrong.
Modifying my SavedModel to accept json serialized base64 encoded strings helped to reduce the size of tensors I'm sending significantly though, so this 5 MiB limit is now not as big of an issue (although still a bit of a pain). Without doing so I hit the limit with tensors greater than (5,128,128,3), now I can send up to about (2500,128,128,3).
from amazon-sagemaker-examples.
Gotcha, that makes sense. From the little bit I know, batch transform won't suffice when you need real time inference.
from amazon-sagemaker-examples.
Thanks @Harathi123 . Payloads for SageMaker invoke endpoint requests are limited to about 5MB. So if you're storing the pixel values as 8 byte floats, then 480 * 512 * 3 * 8 will be larger than this 5MB payload limit.
One option for doing inference on larger images might be to pass in an S3 path in your invoke endpoint request and then write your scoring logic to copy the image stored at that S3 path before doing inference.
There may be other ways to get around this, like compressing the image before sending and then decompressing within the container before inference, but these may be very use case specific.
from amazon-sagemaker-examples.
Hi @djarpin, thanks for suggestions.
This is my transform function:
def transform_fn(net, data, input_content_type, output_content_type):
image = json.loads(data)
nda = nd.array(image)
prediction = net(nda)
response_body = json.dumps(decode(prediction.asnumpy()))
return response_body, output_content_type
This is how i am invoking the endpoint. I am passing numpy array of image.
img = cv2.imread('image.png')
img = img.reshape((1, 3, 480, 512))
img = img.astype('float32')/ 255
pred = predictor.predict(img)
Can I pass in an S3 path to invoke endpoint request like this?
pred = predictor.predict(' .....S3 path......')
Thanks,
Harathi
from amazon-sagemaker-examples.
Hi @Harathi123 ,
You could possibly pass in a dictionary, like
{ 's3_path' : 's3://my-bucket/my-key' }
And then, in your transform function, retrieve the value of s3_path
, download that file from S3, and predict on it.
But it seems to me like the image you're invoking with should be small enough since you're using float32
dtype now. Could you tell us what the value of img.nbytes
is before predicting with img
, and if InvokeEndpoint still says your payload is too large, could you post the stacktrace?
Thanks!
from amazon-sagemaker-examples.
Hi @djarpin, I could really use your help if possible. Is this 5MB a hard limit that is unaffected by how I change nginx.conf client_max_body_size
? What is the limit exactly and where can I find more information about this? Is there any way to increase the limit? It seems very low and is causing a lot of pain and frustration in integrating the endpoint into a production pipeline. My team is currently evaluating these endpoints and this issue is a big one for us.
from amazon-sagemaker-examples.
@djarpin Thanks for your reply. I have a lot of high-res images to process and pulling them from S3 seems very inefficient. Especially if they aren't originally coming from S3 and I have to both upload/download each. How do people typically handle large images in SageMaker?
from amazon-sagemaker-examples.
@djarpin Hi, also after testing, I believe the max payload size is 5 MiB not 5 MB.
from amazon-sagemaker-examples.
Ah sorry, I missed above that you had already modified this limit in nginx.conf
. I'm working with text and not images, so I was only running into the size limit when SageMaker would send data in 6 MB batches (the default).
Sorry again if I'm missing what was discussed above, but is the MaxPayloadInMB
parameter when creating a batch transform job not what you want?
from amazon-sagemaker-examples.
@dorg-jmiller I think the 5 MiB mentioned doesn't affect batch transform jobs, but only live http endpoints. I should probably experiment with ways to take advantage of BT jobs more often, but currently I need realtime inference from stood up endpoints.
going from the json serialized list of Numpy arrays to the json serialized base64 encoded strings helped a lot. Now I'd like to try and switch from RESTful TF Serving to gRPC so I don't need to json serialize at all. Hopefully not too big of a pain to figure out.
from amazon-sagemaker-examples.
I've run in into the same problem as you, having numpy array of (3, 218, 525, 3) reaches the limit with my current serialization.
I'm really keen to know more in detail how you serialized your data
my best try so far is (frames are np array with the shape above)
import json
import base64
b = base64.b64encode(frames).decode('utf-8')
r = json.dumps([str(frames.dtype),b , frames.shape])
but its no way near your results
Thanks!
from amazon-sagemaker-examples.
A more up-to-date answer:
Use AWS SageMaker Async Inference
https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html
Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to 15 minutes), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.
from amazon-sagemaker-examples.
Related Issues (20)
- [Bug Report] RuntimeError: Dataset not found. You can use download=True to download it for pytorch minist horovod
- Dataset not working in example in notebook A Move Amazon SageMaker Autopilot ML models from experimentation to production using Amazon SageMaker Pipelines
- Broken lnks HOT 1
- How do you use the custom generator to train the TensorFlow model on PageMaker?
- [Example Request] Minimal Example for Fine Tuning a LLM with FSDP utilizing the HuggingFace Trainer
- [Bug Report] Forbidden(403) on Introduction to JumpStart - Sentence Pair Classification
- getting error:
- Getting "TypeError: can only join an iterable" while running "print(predictor.predict(test_data).decode("utf-8"))"
- [Bug Report] Example notebook has incorrectly formatted serving.properties
- AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'
- Inference Recommender Job fails
- [Bug Report]Error with using dgl library in Sagemaker
- Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS
- Parameter validation failed: Unknown parameter in PrimaryContainer HOT 2
- [Bug Report] - README - Train EleutherAI GPT-J with Model Parallel Link Broken
- smddp_deepspeec_example doesn't run because of dependency issues.
- Unable to download model artifacts due to 403 forbidden error
- Alter JupyterLab dockerfile to block target domain / IP from running contiainer
- [Bug Report] RuntimeError when running instruction fine-tuning on mistral 7b, Sagemaker Jumpstart
- Torch not compiled with CUDA enabled when deploying T5 using Triton
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-sagemaker-examples.