Coder Social home page Coder Social logo

deeplite-profiler's People

Contributors

ahmed-deeplite avatar alexh-deeplite avatar goodboyanush-deeplite avatar honnesh-deeplite avatar lzrwch avatar matteog11 avatar nagathodupu avatar olivier-dl avatar yasseridris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deeplite-profiler's Issues

Peak Memory Estimation

Memory allocation is a big issue, especially for DL on tiny devices.

The volatile memory (usually on-chip SRAM) hosts the activations and the buffer is used as a scratchpad for auxiliary procedures (e.g., the im2col buffer for GEMM-based convolutions).

So for each Conv Layer, the standard approach requires to allocate INPUT + OUTPUT activation buffer in SRAM, releasing the INPUT just after the whole layer computation is finished.

During the feed-forward pass of a ConvNet, a sequence of operators (layers), read/store the tensors, from/to a contiguous region of the volatile memory (memory pool). Each tensor is allocated with an offset address defining the portion of the memory pool where it will be hosted during its lifetime. Since tensors are accessed by a limited number of layers, and layers are scheduled sequentially, the memory previously assigned to a tensor can be reused over time to store other non-conflicting tensors.

Peak Memory: highest memory value across all the layer computation.

Screenshot 2022-10-27 at 10 45 16


How to estimate the Peak Memory?

For standard models (with a regular chain-like graph topology) it is easier: Ram(layer) ~= InputAct + OutAct. However, for more complex graph topologies the situation change: complex layers connectivity and tied tensors dependencies make the memory allocation problem much harder to solve efficiently in practice.



How to reduce Peak Memory?



  • Topology Scaling: affects both FLASH and SRAM. It includes width-mul, channel pruning, tensor decomposition.
  • Quantization: affects both FLASH and SRAM, but for MCU it is constrained to 8-bit
- topology scaling: affects both FLASH and SRAM

  • Sparsity: affects FLASH (SRAM just if combined with proper kernels)

  • Resolution scaling: SRAM
  • Operator transformation: SRAM
  • Graph transformation: SRAM
 (hand-crafted graph rewriting to reduce the peak memory consumption leveraging the algebraic properties of the operators)

Refs:

TypeError: 'dict' object is not callable

profiler.compute_network_status(batch_size=1, device=Device.CPU, short_print=False,

File "deeplite/profiler/profiler.py", line 200, in deeplite.profiler.profiler.Profiler.compute_network_status
File "deeplite/profiler/profiler.py", line 158, in deeplite.profiler.profiler.Profiler.compute_status
File "deeplite/profiler/profiler.py", line 362, in deeplite.profiler.profiler.ProfilerFunction.pipe_kwargs_to_call
File "deeplite/torch_profiler/torch_profiler.py", line 52, in deeplite.torch_profiler.torch_profiler.ComputeComplexity.call
File "deeplite/torch_profiler/torch_profiler.py", line 60, in deeplite.torch_profiler.torch_profiler.ComputeComplexity._compute_complexity
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 153, in deepcopy
y = copier(memo)
TypeError: 'dict' object is not callable

Profile models against a specific device

  • Try OpenVino Benchmark

  • Limitations: Supports only .xml and .onnx format:

  -m PATH_TO_MODEL, --path_to_model PATH_TO_MODEL
                        Required. Path to an .xml/.onnx/.prototxt file with a
                        trained model or to a .blob file with a trained
                        compiled model.
  • Check for any other open source tool available?

Release v 1.1.13

  • PR opened
  • Unit and Functional tests pass
  • PR reviewed and merged
  • version bumped in setup.py (and pushed to master)
  • python release.py stage
  • Integration tests pass(overnight)
  • Acceptance tests pass(overnight)
  • Update docs (mention changes in the comments of the release issue)
  • python release.py release

Create more examples

  • for OD models
  • for segmentation models
  • show examples from torchvision.hub
  • show examples from tf.keras.applications

tensorflow-gpu tensor copy failure

For both python3.6 and python3.71 when running the examples/tf-example.py the following error happens:

Traceback (most recent call last):
  File "examples/tf_example.py", line 60, in <module>
    include_weights=True, print_mode='info')
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 204, in compute_network_status
    self.compute_status(sk, recompute=recompute, **kwargs)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 162, in compute_status
    rval = pf_func.pipe_kwargs_to_call(self.model, self.data_splits, kwargs)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 382, in pipe_kwargs_to_call
    return self(**bounded_args.arguments)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/tf_profiler/tf_profiler.py", line 192, in __call__
    return self._compute_exectime(temp_model, dataloader, batch_size=batch_size)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/tf_profiler/tf_profiler.py", line 203, in _compute_exectime
    _ = model(tnum, training=False)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 547, in __call__
    inputs = nest.map_structure(_convert_non_tensor, inputs)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 515, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 515, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 544, in _convert_non_tensor
    return ops.convert_to_tensor(x)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1087, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1145, in convert_to_tensor_v2
    as_ref=False)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1224, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 305, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 246, in constant
    allow_broadcast=True)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 254, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 115, in convert_to_eager_tensor
    return ops.EagerTensor(value, handle, device, dtype)
RuntimeError: Error copying tensor to device: /job:localhost/replica:0/task:0/device:GPU:0. /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Option to save profiler output as txt file

  • PR opened
  • Unit and Functional tests pass
  • PR reviewed and merged
  • version bumped in setup.py (and pushed to master)
  • python release.py stage
  • Integration tests pass(overnight)
  • Acceptance tests pass(overnight)
  • Update docs (mention changes in the comments of the release issue)
  • python release.py release

Silent secondary eval metric after registry

Once the evaluation function is registered to a profiler's instance, any secondary metric added to it will silently be ignored. This happens because the binding of the profiler to a pfunction is fixed after registry. The profiler then fails at catching up with the dynamic status key.

I would either refactor the profiler to handle dynamic status keys all across the board or block adding a secondary metric once the function is registered.

Also, I don't really like the global REGISTER at the class level of DynamicEvalMetric... if two profiler coexists with their respective set of eval metrics it looks like a recipe for errors.

Create a seperate doc for profiler

  • Create a seperate doc
  • Create the configuration file
  • Automate travis update of docs into s3
  • Add a good design for SDK documentation
  • Add more docstrings for SDK documentation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.