I have experienced quite some memory usage when running a flow. Instead of disabling a

How to track the memory usage when running a flow about bionic HOT 4 CLOSED

square commented on June 26, 2024

How to track the memory usage when running a flow

from bionic.

Comments (4)

jqmp commented on June 26, 2024

Hi Jing! I'm afraid I'm not aware of a way to monitor memory usage for each component. Currently Bionic computes every flow component within the same Python process, so it's hard to separate out the memory used by one component. I don't think there's a good general technique for this.

However, in the future, we plan to support running each component in a separate process, and in that case we should be able to see how much memory a given component is using.

from bionic.

ajing commented on June 26, 2024

Hi Janek, I was not able to find our email thread, so I put my question here 😂 . I was thinking whether it's reasonable to add a wrapper for a flow component if the flow is single-threaded.

import tracemalloc
tracemalloc.start()

def get_mem_usage(func):
    def call(*args, **kwargs):
        pre, _ = tracemalloc.get_traced_memory()
        result = func(*args, **kwargs)
        after, _ = tracemalloc.get_traced_memory()
        logging.info("Memory usage is %s", str(after - pre))
        return result
    return call

from bionic.

jqmp commented on June 26, 2024

Thanks for showing me the tracemalloc module -- I'd never encountered it before! That's pretty cool.

If I understand this code right, it would estimate the total size of the value returned by func. This seems like it would work, although I'm not sure what will happen with C++ libraries that allocate memory on their own. I tried the following script and everything gave reasonable results except for the very end:

import tracemalloc as tm

import numpy as np
import pandas as pd
import xgboost as xgb

tm.start()

# Prints 0.
print(tm.get_traced_memory()[0])

# Allocate an array with 1M elements.
xs = np.ones((1000, 1000))

# Prints about 8M bytes; seems right.
print(tm.get_traced_memory()[0])

# Allocate a Pandas dataframe of the same size.
df = pd.DataFrame(xs).copy()

# Prints about 16M bytes, or twice as much: seems right.
print(tm.get_traced_memory()[0])

# Create an XGBoost DMatrix.
dm = xgb.DMatrix(xs)

# Prints about 20M bytes; seems plausible if XGB is using single-precision
# floats.  (I'm a little surprised that Python can detect XGB's allocations
# since XGB is presumably using its own allocator in C++.)
print(tm.get_traced_memory()[0])

# Delete the original Numpy array.
del xs

# Prints about 12M bytes; seems right.
print(tm.get_traced_memory()[0])

# Delete the Pandas frame.
del df

# Prints about 4M bytes; seems right.
print(tm.get_traced_memory()[0])

# Delete the DMatrix.
del dm

# Still prints about 4M bytes; I don't know why. This seems like either a
# memory leak or some kind of clever memory allocation trick inside XGBoost.
# (If you try repeatedly creating and deleting DMatrics, it does keep leaking
# more memory, but the total amount used doesn't seem linear.)
print(tm.get_traced_memory()[0])

So, overall this seems like a reasonable technique but some C/C++ libraries might produce misleading results.

I don't expect Bionic to start using multiple threads anytime soon, so that part shouldn't be a problem.

from bionic.

ajing commented on June 26, 2024

Thanks for confirming, Janek! I will close the ticket for now.

from bionic.

How to track the memory usage when running a flow about bionic HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent