Comments (4)
Hi Jing! I'm afraid I'm not aware of a way to monitor memory usage for each component. Currently Bionic computes every flow component within the same Python process, so it's hard to separate out the memory used by one component. I don't think there's a good general technique for this.
However, in the future, we plan to support running each component in a separate process, and in that case we should be able to see how much memory a given component is using.
from bionic.
Hi Janek, I was not able to find our email thread, so I put my question here 😂 . I was thinking whether it's reasonable to add a wrapper for a flow component if the flow is single-threaded.
import tracemalloc
tracemalloc.start()
def get_mem_usage(func):
def call(*args, **kwargs):
pre, _ = tracemalloc.get_traced_memory()
result = func(*args, **kwargs)
after, _ = tracemalloc.get_traced_memory()
logging.info("Memory usage is %s", str(after - pre))
return result
return call
from bionic.
Thanks for showing me the tracemalloc
module -- I'd never encountered it before! That's pretty cool.
If I understand this code right, it would estimate the total size of the value returned by func
. This seems like it would work, although I'm not sure what will happen with C++ libraries that allocate memory on their own. I tried the following script and everything gave reasonable results except for the very end:
import tracemalloc as tm
import numpy as np
import pandas as pd
import xgboost as xgb
tm.start()
# Prints 0.
print(tm.get_traced_memory()[0])
# Allocate an array with 1M elements.
xs = np.ones((1000, 1000))
# Prints about 8M bytes; seems right.
print(tm.get_traced_memory()[0])
# Allocate a Pandas dataframe of the same size.
df = pd.DataFrame(xs).copy()
# Prints about 16M bytes, or twice as much: seems right.
print(tm.get_traced_memory()[0])
# Create an XGBoost DMatrix.
dm = xgb.DMatrix(xs)
# Prints about 20M bytes; seems plausible if XGB is using single-precision
# floats. (I'm a little surprised that Python can detect XGB's allocations
# since XGB is presumably using its own allocator in C++.)
print(tm.get_traced_memory()[0])
# Delete the original Numpy array.
del xs
# Prints about 12M bytes; seems right.
print(tm.get_traced_memory()[0])
# Delete the Pandas frame.
del df
# Prints about 4M bytes; seems right.
print(tm.get_traced_memory()[0])
# Delete the DMatrix.
del dm
# Still prints about 4M bytes; I don't know why. This seems like either a
# memory leak or some kind of clever memory allocation trick inside XGBoost.
# (If you try repeatedly creating and deleting DMatrics, it does keep leaking
# more memory, but the total amount used doesn't seem linear.)
print(tm.get_traced_memory()[0])
So, overall this seems like a reasonable technique but some C/C++ libraries might produce misleading results.
I don't expect Bionic to start using multiple threads anytime soon, so that part shouldn't be a problem.
from bionic.
Thanks for confirming, Janek! I will close the ticket for now.
from bionic.
Related Issues (20)
- Errors from unset entities are hard to debug
- @changes_per_run generates extra cache entries
- Non-persistable entities can sometimes be spuriously recomputed
- Redefining an entity with a function can produce surprising results
- The "provider" concept is too broad
- Should bionic.util.init_basic_logging only affect logging for bionic components?
- Concerning exception messages in pytest on Python 3.8 HOT 4
- Compute task states in the subprocesses
- Multiple processes and threads? HOT 1
- Full test suite fails on OS X Catalina HOT 1
- Caching fails when directory names contain spaces
- `FlowBuilder.merge` not accepting another builder is confusing
- Reloading of flow spanning multiple files HOT 1
- Docs: Functions with multiple outputs cannot be called explicitly HOT 2
- Uncached values should be recomputed each time they're used HOT 1
- Set-valued entities are not hashed determistically
- Confusing error message when non-iterable input used with `outputs`
- parallel evaluation inconsistencies with persist settings
- Improve error message when protocols fail
- Test vector update script should delete old test vectors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bionic.