deeplite / deeplite-profiler Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 9.0 3.18 MB

A collection of metrics to profile a single deep learning model or compare two different deep learning models

License: Apache License 2.0

Python 100.00%

deeplite-profiler's Issues

New Metric: Robustness accuracy

https://github.com/RobustBench/robustbench

Limited support for torch modules that outputs dict

Peak Memory Estimation

Memory allocation is a big issue, especially for DL on tiny devices.

The volatile memory (usually on-chip SRAM) hosts the activations and the buffer is used as a scratchpad for auxiliary procedures (e.g., the im2col buffer for GEMM-based convolutions).

So for each Conv Layer, the standard approach requires to allocate INPUT + OUTPUT activation buffer in SRAM, releasing the INPUT just after the whole layer computation is finished.

During the feed-forward pass of a ConvNet, a sequence of operators (layers), read/store the tensors, from/to a contiguous region of the volatile memory (memory pool). Each tensor is allocated with an offset address defining the portion of the memory pool where it will be hosted during its lifetime. Since tensors are accessed by a limited number of layers, and layers are scheduled sequentially, the memory previously assigned to a tensor can be reused over time to store other non-conflicting tensors.

 Peak Memory: highest memory value across all the layer computation. 

How to estimate the Peak Memory?

For standard models (with a regular chain-like graph topology) it is easier: Ram(layer) ~= InputAct + OutAct. However, for more complex graph topologies the situation change: complex layers connectivity and tied tensors dependencies make the memory allocation problem much harder to solve efficiently in practice.

  How to reduce Peak Memory?  

Topology Scaling: affects both FLASH and SRAM. It includes width-mul, channel pruning, tensor decomposition.
Quantization: affects both FLASH and SRAM, but for MCU it is constrained to 8-bit - topology scaling: affects both FLASH and SRAM 
Sparsity: affects FLASH (SRAM just if combined with proper kernels) 
Resolution scaling: SRAM
 Operator transformation: SRAM
Graph transformation: SRAM  (hand-crafted graph rewriting to reduce the peak memory consumption leveraging the algebraic properties of the operators)

Refs:

TypeError: 'dict' object is not callable

profiler.compute_network_status(batch_size=1, device=Device.CPU, short_print=False,

File "deeplite/profiler/profiler.py", line 200, in deeplite.profiler.profiler.Profiler.compute_network_status
File "deeplite/profiler/profiler.py", line 158, in deeplite.profiler.profiler.Profiler.compute_status
File "deeplite/profiler/profiler.py", line 362, in deeplite.profiler.profiler.ProfilerFunction.pipe_kwargs_to_call
File "deeplite/torch_profiler/torch_profiler.py", line 52, in deeplite.torch_profiler.torch_profiler.ComputeComplexity.call
File "deeplite/torch_profiler/torch_profiler.py", line 60, in deeplite.torch_profiler.torch_profiler.ComputeComplexity._compute_complexity
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/ubuntu/anaconda3/envs/cerberus/lib/python3.8/copy.py", line 153, in deepcopy
y = copier(memo)
TypeError: 'dict' object is not callable

Create multi version of documentation

Manage version control on the docs
Link versions of zoo with different versions of engine and zoo ?

Sources:

Profile models against a specific device

Try OpenVino Benchmark
Limitations: Supports only .xml and .onnx format:

  -m PATH_TO_MODEL, --path_to_model PATH_TO_MODEL
                        Required. Path to an .xml/.onnx/.prototxt file with a
                        trained model or to a .blob file with a trained
                        compiled model.

Check for any other open source tool available?

Release v 1.1.13

PR opened
Unit and Functional tests pass
PR reviewed and merged
version bumped in setup.py (and pushed to master)
python release.py stage
Integration tests pass(overnight)
Acceptance tests pass(overnight)
Update docs (mention changes in the comments of the release issue)
python release.py release

Profile an onnx model

Create more examples

for OD models
for segmentation models
show examples from torchvision.hub
show examples from tf.keras.applications

Support imageList custom class for ptflop

In order to support torchvision models like mask-rcnn, faster-rcnn and keypoints-rcnn for our engine we need to support

ImageList https://github.com/pytorch/vision/blob/master/torchvision/models/detection/image_list.py
Dictionaries of bboxes and labels
Strings

Profiler update for additional APs

Support AP by obj size for object detection. Will need implementation in zoo as well

tensorflow-gpu tensor copy failure

For both python3.6 and python3.71 when running the examples/tf-example.py the following error happens:

Traceback (most recent call last):
  File "examples/tf_example.py", line 60, in <module>
    include_weights=True, print_mode='info')
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 204, in compute_network_status
    self.compute_status(sk, recompute=recompute, **kwargs)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 162, in compute_status
    rval = pf_func.pipe_kwargs_to_call(self.model, self.data_splits, kwargs)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/profiler/profiler.py", line 382, in pipe_kwargs_to_call
    return self(**bounded_args.arguments)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/tf_profiler/tf_profiler.py", line 192, in __call__
    return self._compute_exectime(temp_model, dataloader, batch_size=batch_size)
  File "/home/ahmed/projects/deeplite-profiler/deeplite/tf_profiler/tf_profiler.py", line 203, in _compute_exectime
    _ = model(tnum, training=False)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 547, in __call__
    inputs = nest.map_structure(_convert_non_tensor, inputs)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 515, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 515, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 544, in _convert_non_tensor
    return ops.convert_to_tensor(x)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1087, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1145, in convert_to_tensor_v2
    as_ref=False)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1224, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 305, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 246, in constant
    allow_broadcast=True)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 254, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/ahmed/projects/deeplite-profiler/_env3.6/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 115, in convert_to_eager_tensor
    return ops.EagerTensor(value, handle, device, dtype)
RuntimeError: Error copying tensor to device: /job:localhost/replica:0/task:0/device:GPU:0. /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Refactor Sampler and Loader

They could use some love

API documentation

New UI: Tensorboard integration

Integration: https://stable-baselines.readthedocs.io/en/master/guide/tensorboard.html

Option to save profiler output as txt file

PR opened
Unit and Functional tests pass
PR reviewed and merged
version bumped in setup.py (and pushed to master)
python release.py stage
Integration tests pass(overnight)
Acceptance tests pass(overnight)
Update docs (mention changes in the comments of the release issue)
python release.py release

Silent secondary eval metric after registry

Once the evaluation function is registered to a profiler's instance, any secondary metric added to it will silently be ignored. This happens because the binding of the profiler to a pfunction is fixed after registry. The profiler then fails at catching up with the dynamic status key.

I would either refactor the profiler to handle dynamic status keys all across the board or block adding a secondary metric once the function is registered.

Also, I don't really like the global REGISTER at the class level of DynamicEvalMetric... if two profiler coexists with their respective set of eval metrics it looks like a recipe for errors.