Coder Social home page Coder Social logo

Comments (4)

jqmp avatar jqmp commented on June 26, 2024

Hi Jing! I'm afraid I'm not aware of a way to monitor memory usage for each component. Currently Bionic computes every flow component within the same Python process, so it's hard to separate out the memory used by one component. I don't think there's a good general technique for this.

However, in the future, we plan to support running each component in a separate process, and in that case we should be able to see how much memory a given component is using.

from bionic.

ajing avatar ajing commented on June 26, 2024

Hi Janek, I was not able to find our email thread, so I put my question here 😂 . I was thinking whether it's reasonable to add a wrapper for a flow component if the flow is single-threaded.

import tracemalloc
tracemalloc.start()

def get_mem_usage(func):
    def call(*args, **kwargs):
        pre, _ = tracemalloc.get_traced_memory()
        result = func(*args, **kwargs)
        after, _ = tracemalloc.get_traced_memory()
        logging.info("Memory usage is %s", str(after - pre))
        return result
    return call

from bionic.

jqmp avatar jqmp commented on June 26, 2024

Thanks for showing me the tracemalloc module -- I'd never encountered it before! That's pretty cool.

If I understand this code right, it would estimate the total size of the value returned by func. This seems like it would work, although I'm not sure what will happen with C++ libraries that allocate memory on their own. I tried the following script and everything gave reasonable results except for the very end:

import tracemalloc as tm

import numpy as np
import pandas as pd
import xgboost as xgb

tm.start()

# Prints 0.
print(tm.get_traced_memory()[0])

# Allocate an array with 1M elements.
xs = np.ones((1000, 1000))

# Prints about 8M bytes; seems right.
print(tm.get_traced_memory()[0])

# Allocate a Pandas dataframe of the same size.
df = pd.DataFrame(xs).copy()

# Prints about 16M bytes, or twice as much: seems right.
print(tm.get_traced_memory()[0])

# Create an XGBoost DMatrix.
dm = xgb.DMatrix(xs)

# Prints about 20M bytes; seems plausible if XGB is using single-precision
# floats.  (I'm a little surprised that Python can detect XGB's allocations
# since XGB is presumably using its own allocator in C++.)
print(tm.get_traced_memory()[0])

# Delete the original Numpy array.
del xs

# Prints about 12M bytes; seems right.
print(tm.get_traced_memory()[0])

# Delete the Pandas frame.
del df

# Prints about 4M bytes; seems right.
print(tm.get_traced_memory()[0])

# Delete the DMatrix.
del dm

# Still prints about 4M bytes; I don't know why. This seems like either a
# memory leak or some kind of clever memory allocation trick inside XGBoost.
# (If you try repeatedly creating and deleting DMatrics, it does keep leaking
# more memory, but the total amount used doesn't seem linear.)
print(tm.get_traced_memory()[0])

So, overall this seems like a reasonable technique but some C/C++ libraries might produce misleading results.

I don't expect Bionic to start using multiple threads anytime soon, so that part shouldn't be a problem.

from bionic.

ajing avatar ajing commented on June 26, 2024

Thanks for confirming, Janek! I will close the ticket for now.

from bionic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.