Comments (28)
I can reproduce it locally. you can see the 000055e1b280f000
still growing.
I think it the glibc malloc issue, the memory which is requested from Python is too small and to split to be merged into a big chunk. So after when we call free
, the memory is still in used and can't be freed to system
Here's two ways to solve this
- Run
malloc_trim
background periodically - Use tcmalloc instead of the glibc malloc
from bentoml.
@gusghrlrl101 the PR has just been merged, will be available in the next release. Thank you for reporting this issue!
It would be great if you could help verify the fix using the main branch or with the next release 🙏
from bentoml.
Can reproduced locally, new issue lol
let me find it out
Assign this to me plz cc @frostming
from bentoml.
So there are no way to not increase the memory?
In most circumstances, it's not necessary to flush the page cache manually. If you want to do this, run echo 3 > /proc/sys/vm/drop_caches
How about adding the option to turn on/off that? or are there better way for me?
Emmmm, I'm not sure about this, cc @frostming
from bentoml.
Can you observe the same when running locally?
from bentoml.
Can you observe the same when running locally?
I am not using bentoml in local, because I am using it in production level now.
even if I only change the version from 1.1 to 1.2 with same environment, it occurs.
do you have any idea why the memory is going up!?
from bentoml.
Same issue occured when I served in local (my macbook). (70 minutes, 1.6m requests)
![SCR-20240529-byie](https://private-user-images.githubusercontent.com/38372691/334714733-5935829d-2329-4832-942c-16d8d8a3e343.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzQ3MTQ3MzMtNTkzNTgyOWQtMjMyOS00ODMyLTk0MmMtMTZkOGQ4YTNlMzQzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY0MjI1OThmZTQyYmU4OTVkNWQ2ZDNiMTEyYmNlNmY2MTY3NGU3ZGQ5MGQ4NTE1N2ExMWVlZTdlZTY1MmVjYWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.qLoelKvayJDU_K6cv-7QT3rfhKV3Qma5rl9vd9jt37o)
my environment
m1 max (sonoma 14.4.1)
Bentoml
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''
System information
bentoml
: 1.2.16
python
: 3.9.6
platform
: macOS-14.4.1-arm64-arm-64bit
uid_gid
: 502:20
pip_packages
aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
alembic==1.13.1
altair==5.3.0
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
astroid==2.15.8
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.2.16
black==22.12.0
blinker==1.8.2
boto3==1.26.115
botocore==1.29.165
build==1.2.1
catboost==1.2.1
cattrs==23.1.2
cbor2==5.4.6
certifi==2024.2.2
cffi==1.16.0
cfgv==3.4.0
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.3
click-option-group==0.5.6
cloudpickle==3.0.0
colorlog==6.7.0
contextlib2==21.6.0
contourpy==1.2.1
coverage==7.5.1
cryptography==42.0.7
cycler==0.12.1
databricks-cli==0.18.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.7
distlib==0.3.8
docker==6.1.3
easyocr==1.7.1
entrypoints==0.4
exceptiongroup==1.2.1
faiss-cpu==1.8.0
fakeredis==1.9.2
fastapi==0.110.2
feature-engine==1.6.0
ffmpy==0.3.2
filelock==3.14.0
Flask==3.0.3
fonttools==4.51.0
frozenlist==1.4.1
fs==2.4.16
gitdb==4.0.11
GitPython==3.1.43
gradio==3.41.0
gradio_client==0.5.0
graphviz==0.20.1
gunicorn==21.2.0
h11==0.14.0
h3==3.7.6
httpcore==1.0.5
httpx==0.27.0
identify==2.5.36
idna==3.7
imageio==2.34.1
importlib-metadata==6.11.0
importlib_resources==6.4.0
inflection==0.5.1
iniconfig==2.0.0
isort==5.13.2
itsdangerous==2.2.0
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kafka-python==2.0.2
kiwisolver==1.4.5
lazy-object-proxy==1.10.0
lazy_loader==0.4
llvmlite==0.42.0
Mako==1.3.5
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
mccabe==0.7.0
mdurl==0.1.2
mlflow==2.9.2
moto==4.2.14
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
nodeenv==1.8.0
numba==0.59.1
numpy==1.26.4
nvidia-ml-py==11.525.150
oauthlib==3.2.2
opencv-python-headless==4.5.5.64
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
orjson==3.10.1
packaging==23.2
pandas==2.1.3
pathspec==0.12.1
patsy==0.5.4
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.4.1
platformdirs==4.2.2
plotly==5.18.0
pluggy==1.5.0
polars==0.19.19
pre-commit==3.6.0
prometheus_client==0.20.0
protobuf==4.25.3
psutil==5.9.8
pyarrow==14.0.2
pyclipper==1.3.0.post5
pycparser==2.22
pydantic==2.4.2
pydantic_core==2.10.1
pydub==0.25.1
Pygments==2.18.0
PyJWT==2.8.0
pylint==2.17.7
pyparsing==3.1.2
pyproject_hooks==1.1.0
pytest==7.3.2
pytest-cov==4.0.0
python-bidi==0.4.2
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2023.4
PyYAML==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
redis==4.3.4
referencing==0.35.0
requests==2.31.0
responses==0.25.0
rich==13.7.1
rpds-py==0.18.0
s3transfer==0.6.2
schema==0.7.7
scikit-image==0.22.0
scikit-learn==1.4.2
scipy==1.13.0
semantic-version==2.10.0
sentry-sdk==1.40.6
shapely==2.0.4
simple-di==0.1.5
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
SQLAlchemy==2.0.30
sqlparse==0.5.0
starlette==0.37.2
statsmodels==0.14.1
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
threadpoolctl==3.5.0
tifffile==2024.5.10
tomli==2.0.1
tomli_w==1.0.0
tomlkit==0.12.5
toolz==0.12.1
torch==2.0.1
torchvision==0.15.2
tornado==6.4
trino==0.324.0
typing_extensions==4.11.0
tzdata==2024.1
tzlocal==5.2
urllib3==1.26.18
uvicorn==0.29.0
virtualenv==20.26.2
watchfiles==0.21.0
websocket-client==1.8.0
websockets==11.0.3
Werkzeug==3.0.3
woowa_ml_sdk==0.9.6
wrapt==1.16.0
xgboost==1.6.2
xmltodict==0.13.0
yarl==1.9.4
zipp==3.18.2
bentoml containerize
bentoml build -f bentofile.yaml --containerize
docker run
docker run -it --rm -p 3000:3000 --cpus 1 --memory 2g test_service:mghyrjq5q2dlztwo
from bentoml.
Can you reproduce it without containerizing?
Run bentoml serve
to start the service.
from bentoml.
I think it might be difficult to monitor memories with out containerizing, because of other systems. I will try in empty ec2 instance.
But just for now, it is bug for me because I'm using conatainerized bentoml. Could you check containerized image first?
from bentoml.
I tried with docker container and test with locust using 100 peak concurrency and 10 ramp. The memory usage is stable on my side, no obvious leakage is seen.
from bentoml.
Same with bentoml serve
in local (ec2 instance).
![SCR-20240529-cqjz](https://private-user-images.githubusercontent.com/38372691/334740217-d4e59d91-a90a-4ebc-a80c-85940c161534.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzQ3NDAyMTctZDRlNTlkOTEtYTkwYS00ZWJjLWE4MGMtODU5NDBjMTYxNTM0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI3ZTQ4YjhmNTYzZTY0ZGFhYjE1NjNlMjYwZGY3NmQ2ZTY2NWZmNzliOGRlMDRmZTFmYTM2ZWU4YmMzYWI2MzAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Zl6pxpQ2AxsDWaTsux3es5AS1uteGGOOpjJPxjp7Z60)
![SCR-20240529-cyij](https://private-user-images.githubusercontent.com/38372691/334740243-28d5717a-979a-4db8-87e4-bafbebb43a85.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzQ3NDAyNDMtMjhkNTcxN2EtOTc5YS00ZGI4LTg3ZTQtYmFmYmViYjQzYTg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAxZTgzODg2Njc5OTdjMDk5NTQ0ZTc5Y2Q3M2Y4ODA5NWRjMDg3ODQxNDA2YzhhNzc2MTZjM2MyYzA5NmE5ZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.u7FNfkUb4fpABLd-XbK9SEGJZX-JAEvTxACKliwkrOk)
how many request did you test? In my case, it was around 1 million requests.
from bentoml.
200k reqs and the memory usage doesn't change too much.
To rule out other issues, can you first upgrade Python to 3.9.18(which i am using)?
from bentoml.
same..
- result
- 200k request
- 500MB memory up
- locust setting
- users=48
- host=http://localhost:3000
- host
- c6i.large ec2 instance
- cpu: 4, memory: 8GB
![SCR-20240529-dzsk](https://private-user-images.githubusercontent.com/38372691/334767437-f26a4eb1-0a77-4ad8-b5e4-b7280653b0fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzQ3Njc0MzctZjI2YTRlYjEtMGE3Ny00YWQ4LWI1ZTQtYjcyODA2NTNiMGZlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVhNDFmZTA2YTQxY2IxMjJmNDZiYzA5ZGEzYTQ3NGY2NzVlM2Q4Y2Q3ZjJmYjY3NTE3NWViZjVkOGZjMGU1YjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.qAVP-4OyEhMDlee7fDlsHynU_MQXghKi3kZJVOgaXII)
![SCR-20240529-ectn](https://private-user-images.githubusercontent.com/38372691/334768665-a03805e5-70fc-4f2b-a5b2-ebd1d56145d4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzQ3Njg2NjUtYTAzODA1ZTUtNzBmYy00ZjJiLWE1YjItZWJkMWQ1NjE0NWQ0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRmOTVhNjdiZDM4OGIwOGU5MmYyZDBkY2NjMmU5NzU2NGM3OTc4N2RiMDg1NWQ4MThlMzQ5NmNiYjIwNmEwODAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NSP-QYP1ZAYw4GMSNGQ5MTs7n0DsIhfTOn5tpWnQ6Vw)
Environment variable
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''
System information
bentoml
: 1.2.16
python
: 3.9.18
platform
: Linux-6.1.91-99.172.amzn2023.x86_64-x86_64-with-glibc2.34
uid_gid
: 1000:1000
pip_packages
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
appdirs==1.4.4
asgiref==3.8.1
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.2.16
blinker==1.8.2
Brotli==1.1.0
build==1.2.1
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
ConfigArgParse==1.7
deepmerge==1.1.1
Deprecated==1.2.14
exceptiongroup==1.2.1
Flask==3.0.3
Flask-Cors==4.0.1
Flask-Login==0.6.3
frozenlist==1.4.1
fs==2.4.16
gevent==24.2.1
geventhttpclient==2.3.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
importlib-metadata==6.11.0
inflection==0.5.1
itsdangerous==2.2.0
Jinja2==3.1.4
locust==2.28.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
msgpack==1.0.8
multidict==6.0.5
numpy==1.26.4
nvidia-ml-py==11.525.150
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
packaging==24.0
pathspec==0.12.1
pip-requirements-parser==32.0.1
pip-tools==7.4.1
prometheus_client==0.20.0
psutil==5.9.8
pydantic==2.7.2
pydantic_core==2.18.3
Pygments==2.18.0
pyparsing==3.1.2
pyproject_hooks==1.1.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
PyYAML==6.0.1
pyzmq==26.0.3
requests==2.32.2
rich==13.7.1
schema==0.7.7
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
tomli==2.0.1
tomli_w==1.0.0
tornado==6.4
typing_extensions==4.12.0
urllib3==2.2.1
uvicorn==0.30.0
watchfiles==0.22.0
Werkzeug==3.0.3
wrapt==1.16.0
yarl==1.9.4
zipp==3.19.0
zope.event==5.0
zope.interface==6.4.post2
bentoml
- service.py
import bentoml
@bentoml.service
class TestService:
@bentoml.api
def predict(self, input: list) -> list:
return []
- bentofile.yaml
service: "service:TestService"
- run bentoml
bentoml serve
locust
- locust.py
from locust import HttpUser, task, constant
sample_data = {"input": []}
class Predict(HttpUser):
wait_time = constant(0.05)
@task
def predict(self):
self.client.post("/predict", json=sample_data)
- run locust
locust -f locust.py
from bentoml.
Can't reproduce either, can you use a memory profiler to figure it out? I recommend memray
from bentoml.
In memray result, python process's memory is not encreasing. (it is only total 34Mb)
And in screen shot that I shared before, memory encreased (593MB -> 1.10GB) but process memory was same (1.0% -> 1.1%)
It might be encreased outside of python process,, Do you have any idea..?
![image](https://private-user-images.githubusercontent.com/38372691/335060068-ca5c214e-c90b-4203-af73-4ddcb3ec9050.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzUwNjAwNjgtY2E1YzIxNGUtYzkwYi00MjAzLWFmNzMtNGRkY2IzZWM5MDUwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMwMDI1OTc0NDMzNTE0MDM5NTAxNWRkZDIxZjM0ZTIxZmI3ZWMxOWIwZDQ1M2VlZTI1MDAxN2RmOWZlZDFmNDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.uxe59MNNZQn0uSCjUlyiS2Xw4jmbSJ9RkgVuzuTYwQM)
from bentoml.
encode/httpx#978 (comment)
aio-libs/aiohttp#4833
Would that be related? (It is surprising that it exists in both client libraries)
from bentoml.
Would you mind to follow the step?
- run the bentoml serve on your ec2
- pmap $(pid), and record the result
- run the requests test
- and pmap the pid and record it again
from bentoml.
I tried to use tcmalloc instead of the glibc malloc.
But in my case, memory still encreased in my local with containerizing. (139MB -> 609MB)
![SCR-20240530-tkio](https://private-user-images.githubusercontent.com/38372691/335457943-d77aac9f-b444-4e77-a5c8-597e5610acb4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzU0NTc5NDMtZDc3YWFjOWYtYjQ0NC00ZTc3LWE1YzgtNTk3ZTU2MTBhY2I0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTEzY2I5ZWJkYzE0MmI0OTU0Mjg4MzJkODUyMTM1ZDA3NmE1YWU2M2RjZTc1ZjE2MmYyMmNlMzY3YjVlOTBkM2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.6NqusBEVbd-QL4wp1PoLc7-ZsIzYMK8iv0_8U2dbxvQ)
![SCR-20240530-tpwa](https://private-user-images.githubusercontent.com/38372691/335457968-5dca8a2f-03a3-40f4-ad66-286932c4657d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzU0NTc5NjgtNWRjYThhMmYtMDNhMy00MGY0LWFkNjYtMjg2OTMyYzQ2NTdkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdlNWQ0NGMzYjE5NGRlMWI2NjIxMmI1ZDZjOWE4Mjg3NzM1NDFkM2Y1MzRiMzE1ZDQ4OTQxZTg1MmVlNjFmOTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0._zGaEvRxY695-bicKv99FLEFapDcDQR0snFVt9bBwG8)
- Dockerfile.template
{% extends bento_base_template %}
{% block SETUP_BENTO_COMPONENTS %}
{{ super() }}
RUN apt-get update && apt-get install -y \
google-perftools libgoogle-perftools-dev \
&& rm -rf /var/lib/apt/lists/*
ENV LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc.so"
{% endblock %}
- bentofile.yaml
service: "service:TestService"
include:
- service.py
docker:
dockerfile_template: Dockerfile.template
- containerize bento
bentoml build -f bentofile.yaml --containerize
- run docker
docker run -it --rm -p 3000:3000 test_service:4f25sna7bw6njtwo
from bentoml.
Interesting
You are using ARM service?
from bentoml.
The test was in my local (m1 max), but it was same in ec2 instance. ("/usr/lib/x86_64-linux-gnu/libtcmalloc.so"
)
from bentoml.
@gusghrlrl101 Try upgrading the dependencies by pip install bentoml -U --upgrade-strategy eager
and run again.
from bentoml.
After that, it was same. (400MB encreases after 300k requests)
![image](https://private-user-images.githubusercontent.com/38372691/335941333-cbdcdcb5-9ca9-42c7-bff6-db9f2abd58c2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzU5NDEzMzMtY2JkY2RjYjUtOWNhOS00MmM3LWJmZjYtZGI5ZjJhYmQ1OGMyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRlMzBmYTQxMmM1MWUyYzNmNzExMDcyMDRhZWM2MTY1OThkNTI0NjkxNmQxY2UzNGU2OTI4ZGFjMjJlYWFjMTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.xpkFIfWfMu_ZyfdzVPXDMooltnyOmkX_pfVqcHf3swY)
$ pip list
Package Version
-------------------------------------------- -----------
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.7.0
anyio 4.4.0
appdirs 1.4.4
asgiref 3.8.1
async-timeout 4.0.3
attrs 23.2.0
bentoml 1.2.16
build 1.2.1
cattrs 23.1.2
certifi 2024.6.2
circus 0.18.0
click 8.1.7
click-option-group 0.5.6
cloudpickle 3.0.0
deepmerge 1.1.1
Deprecated 1.2.14
exceptiongroup 1.2.1
frozenlist 1.4.1
fs 2.4.16
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
idna 3.7
importlib-metadata 6.11.0
inflection 0.5.1
Jinja2 3.1.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
mdurl 0.1.2
multidict 6.0.5
numpy 1.26.4
nvidia-ml-py 11.525.150
opentelemetry-api 1.20.0
opentelemetry-instrumentation 0.41b0
opentelemetry-instrumentation-aiohttp-client 0.41b0
opentelemetry-instrumentation-asgi 0.41b0
opentelemetry-sdk 1.20.0
opentelemetry-semantic-conventions 0.41b0
opentelemetry-util-http 0.41b0
packaging 24.0
pathspec 0.12.1
pip 24.0
pip-requirements-parser 32.0.1
pip-tools 7.4.1
prometheus_client 0.20.0
psutil 5.9.8
pydantic 2.7.2
pydantic_core 2.18.3
Pygments 2.18.0
pyparsing 3.1.2
pyproject_hooks 1.1.0
python-dateutil 2.9.0.post0
python-json-logger 2.0.7
python-multipart 0.0.9
PyYAML 6.0.1
pyzmq 26.0.3
rich 13.7.1
schema 0.7.7
setuptools 70.0.0
simple-di 0.1.5
six 1.16.0
sniffio 1.3.1
starlette 0.37.2
tomli 2.0.1
tomli_w 1.0.0
tornado 6.4
typing_extensions 4.12.1
uvicorn 0.30.1
watchfiles 0.22.0
wheel 0.43.0
wrapt 1.16.0
yarl 1.9.4
zipp 3.19.1
from bentoml.
After debug, @frostming and me confirmed that this bug has been introduced into codebase in #4337
TL;DR;
In #4337 , @frostming made a new feature: make a tmp directory per request and use the tmp directory to cache all necessary files during the request
with tempfile.TemporaryDirectory(prefix="bentoml-request-") as temp_dir:
dir_token = request_directory.set(temp_dir)
try:
yield self
finally:
self._request_var.reset(request_token)
self._response_var.reset(response_token)
request_directory.reset(dir_token)
But there is a problem, when we make a new directory, the process may trigger a page cache action in kernel. The cache may be not released in time. This means that we will have a lot cache here. The docker stats
will collect the page cache in container and the process memory together as the memory usage field which is displayed into the console.
So you will see the memory will continue rowing up util the os refresh the cache page
We can use bpftrace to verify this
tracepoint:kmem:mm_page_free_batched {
if (pid == 4015 || pid == 4016) {
@free_batched_count[pid] = @free_batched_count[pid] +1;
if (@free_batched_count[pid] % 1000 == 0) {
printf("mm_page_free_batched, Pid=%d, Count=%d\n", pid, @free_batched_count[pid])
}
}
}
tracepoint:kmem:mm_page_free {
if (pid == 4015 || pid == 4016) {
@free_count[pid] = @free_count[pid] +1;
if (@free_count[pid] % 1000 == 0) {
printf("mm_page_free, Pid=%d, Count=%d\n", pid, @free_count[pid])
}
}
}
tracepoint:kmem:mm_page_alloc {
if (pid == 4015 || pid == 4016) {
@alloc_count[pid] = @alloc_count[pid] +1;
if (@alloc_count[pid] % 1000 == 0) {
printf("mm_page_alloc, Pid=%d, Count=%d\n", pid, @alloc_count[pid])
}
}
}
The results here
mm_page_alloc, Pid=4015, Count=1000
mm_page_alloc, Pid=4016, Count=1000
mm_page_alloc, Pid=4016, Count=2000
mm_page_alloc, Pid=4016, Count=3000
mm_page_alloc, Pid=4015, Count=2000
mm_page_alloc, Pid=4015, Count=3000
mm_page_alloc, Pid=4016, Count=4000
mm_page_alloc, Pid=4015, Count=4000
mm_page_alloc, Pid=4016, Count=5000
mm_page_alloc, Pid=4015, Count=5000
mm_page_alloc, Pid=4015, Count=6000
mm_page_alloc, Pid=4016, Count=6000
mm_page_alloc, Pid=4016, Count=7000
mm_page_alloc, Pid=4015, Count=7000
mm_page_free, Pid=4016, Count=1000
mm_page_alloc, Pid=4015, Count=8000
mm_page_alloc, Pid=4016, Count=8000
mm_page_free, Pid=4015, Count=1000
mm_page_alloc, Pid=4015, Count=9000
mm_page_alloc, Pid=4016, Count=9000
mm_page_free, Pid=4016, Count=2000
mm_page_alloc, Pid=4015, Count=10000
mm_page_alloc, Pid=4016, Count=10000
mm_page_free, Pid=4015, Count=2000
mm_page_alloc, Pid=4015, Count=11000
mm_page_alloc, Pid=4016, Count=11000
mm_page_free, Pid=4016, Count=3000
mm_page_free, Pid=4015, Count=3000
mm_page_alloc, Pid=4015, Count=12000
mm_page_alloc, Pid=4016, Count=12000
mm_page_alloc, Pid=4015, Count=13000
mm_page_alloc, Pid=4016, Count=13000
mm_page_free, Pid=4016, Count=4000
mm_page_free, Pid=4015, Count=4000
mm_page_alloc, Pid=4015, Count=14000
mm_page_alloc, Pid=4016, Count=14000
mm_page_alloc, Pid=4015, Count=15000
mm_page_alloc, Pid=4016, Count=15000
mm_page_alloc, Pid=4015, Count=16000
mm_page_alloc, Pid=4016, Count=16000
mm_page_free, Pid=4016, Count=5000
mm_page_free, Pid=4015, Count=5000
mm_page_alloc, Pid=4015, Count=17000
mm_page_alloc, Pid=4016, Count=17000
mm_page_alloc, Pid=4015, Count=18000
mm_page_alloc, Pid=4016, Count=18000
mm_page_free, Pid=4016, Count=6000
mm_page_alloc, Pid=4015, Count=19000
mm_page_alloc, Pid=4016, Count=19000
mm_page_free, Pid=4015, Count=6000
mm_page_alloc, Pid=4015, Count=20000
mm_page_alloc, Pid=4016, Count=20000
mm_page_alloc, Pid=4015, Count=21000
mm_page_free, Pid=4016, Count=7000
mm_page_alloc, Pid=4016, Count=21000
mm_page_alloc, Pid=4015, Count=22000
mm_page_alloc, Pid=4016, Count=22000
mm_page_free, Pid=4015, Count=7000
mm_page_alloc, Pid=4015, Count=23000
mm_page_free, Pid=4016, Count=8000
mm_page_alloc, Pid=4016, Count=23000
mm_page_alloc, Pid=4015, Count=24000
mm_page_alloc, Pid=4016, Count=24000
mm_page_free, Pid=4015, Count=8000
mm_page_alloc, Pid=4015, Count=25000
mm_page_alloc, Pid=4016, Count=25000
mm_page_alloc, Pid=4015, Count=26000
mm_page_free, Pid=4015, Count=9000
mm_page_alloc, Pid=4016, Count=26000
mm_page_alloc, Pid=4015, Count=27000
mm_page_free, Pid=4016, Count=9000
mm_page_alloc, Pid=4016, Count=27000
mm_page_alloc, Pid=4015, Count=28000
mm_page_alloc, Pid=4015, Count=29000
mm_page_free, Pid=4015, Count=10000
mm_page_alloc, Pid=4016, Count=28000
mm_page_free, Pid=4016, Count=10000
mm_page_alloc, Pid=4015, Count=30000
mm_page_alloc, Pid=4016, Count=29000
mm_page_free, Pid=4016, Count=11000
mm_page_free, Pid=4015, Count=11000
mm_page_alloc, Pid=4015, Count=31000
mm_page_alloc, Pid=4016, Count=30000
mm_page_free, Pid=4015, Count=12000
We can see the process allocate a lots of page cahe and just release a little bit.
from bentoml.
For now, I think this is not a bug. It could be treated as a normal action. You can set the memory limit for your container. and the page cache would be release automaticly when the container memory usage is high under your limit
from bentoml.
Thank you.
So there are no way to not increase the memory?
I think it is not stable to make deployment's memory full in production level.
How about adding the option to turn on/off that? or are there better way for me?
from bentoml.
How do you think about this one?
I think it is not stable to make deployment's memory full in production level.
It will look like the schreen shot below.
![image](https://private-user-images.githubusercontent.com/38372691/336191617-dcbf4005-0dd5-4f4e-b1c1-f50ffefe6f3b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTY0ODUsIm5iZiI6MTcxOTQxNjE4NSwicGF0aCI6Ii8zODM3MjY5MS8zMzYxOTE2MTctZGNiZjQwMDUtMGRkNS00ZjRlLWIxYzEtZjUwZmZlZmU2ZjNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDE1MzYyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5YTc2M2I4YWJkYzdiNGFkOGRmN2Q4ZmM0MjBjMzQ5NzEyOTZiNjUwNGZlNjc5YTg5NzZiZmY2NGI2NTcxZjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4gBhaEWkiZAVsRH7TO53bXstZZdJYnNNntSm_zMF6bk)
from bentoml.
@frostming Hello. How is it going?
from bentoml.
Thank you!
The memory no longer increases.
I look forward to the new version being released soon so that I can use it.
from bentoml.
Related Issues (20)
- Bug: 'bentoml containerize' doesn't include models in the image
- bug: Cannot use pydantic models with namedtuples
- bug: SSL does not work HOT 2
- bug: bentoml.transformers.save_model doesn't work with WhisperFeatureExtractor
- minor bug: Swagger docs.json not correctly specified for list[PydanticModel] Output
- bug: pydantic/pathlib patching causes `LookupError: <ContextVar name='request_directory' at 0x172a18860>` HOT 1
- bug: ModuleNotFoundError: No module named 'bentoml.frameworks'
- bug: Cannot specify Numpy DType and Shape with Pydantic as deletion contains a bug.
- bug: Cannot install Development environment using pdm HOT 2
- feature: support bentoml.depends.from_url("http://already.deployed-bento.com:3000")
- feature: Make Adaptative Batching algorithm customizable HOT 2
- feature: Use tcmalloc in supported platform as default memory allocator
- bug: Cannot get logging to work with monitoring resource
- bug: Excessive time spent parsing pydantic version
- bug: timeout configuration not work HOT 1
- feature: serialization/deserialization with `ormsgpack`
- bug: a lot of missing testing for the `_bento_sdk` package
- bug: Generic exceptions from HTTP client
- feature: To be able to define custom metrics endpoint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bentoml.