Coder Social home page Coder Social logo

torchprof's Introduction

torchprof

PyPI version CircleCI

Attention! This library is deprecated due to the PyTorch 1.9 changes to the torch profiler. Please use the official profiler. Thank you!

A minimal dependency library for layer-by-layer profiling of PyTorch models.

All metrics are derived using the PyTorch autograd profiler.

Quickstart

pip install torchprof

import torch
import torchvision
import torchprof

model = torchvision.models.alexnet(pretrained=False).cuda()
x = torch.rand([1, 3, 224, 224]).cuda()

# `profile_memory` was added in PyTorch 1.6, this will output a runtime warning if unsupported.
with torchprof.Profile(model, use_cuda=True, profile_memory=True) as prof:
    model(x)

# equivalent to `print(prof)` and `print(prof.display())`
print(prof.display(show_events=False))
Module         | Self CPU total | CPU total | Self CUDA total | CUDA total | Self CPU Mem | CPU Mem | Self CUDA Mem | CUDA Mem  | Number of Calls
---------------|----------------|-----------|-----------------|------------|--------------|---------|---------------|-----------|----------------
AlexNet        |                |           |                 |            |              |         |               |           |
├── features   |                |           |                 |            |              |         |               |           |
│├── 0         | 1.832ms        | 7.264ms   | 1.831ms         | 7.235ms    | 0 b          | 0 b     | 756.50 Kb     | 3.71 Mb   | 1
│├── 1         | 51.858us       | 76.564us  | 51.296us        | 76.896us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 2         | 75.993us       | 157.855us | 77.600us        | 145.184us  | 0 b          | 0 b     | 547.00 Kb     | 1.60 Mb   | 1
│├── 3         | 263.526us      | 1.142ms   | 489.472us       | 1.918ms    | 0 b          | 0 b     | 547.00 Kb     | 2.68 Mb   | 1
│├── 4         | 28.824us       | 41.197us  | 28.672us        | 43.008us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 5         | 55.264us       | 120.016us | 55.200us        | 106.400us  | 0 b          | 0 b     | 380.50 Kb     | 1.11 Mb   | 1
│├── 6         | 175.591us      | 681.011us | 212.896us       | 818.080us  | 0 b          | 0 b     | 253.50 Kb     | 8.27 Mb   | 1
│├── 7         | 27.622us       | 39.494us  | 26.848us        | 39.296us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 8         | 140.204us      | 537.162us | 204.832us       | 781.280us  | 0 b          | 0 b     | 169.00 Kb     | 10.20 Mb  | 1
│├── 9         | 27.532us       | 39.364us  | 26.816us        | 39.136us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 10        | 138.621us      | 530.929us | 171.008us       | 650.432us  | 0 b          | 0 b     | 169.00 Kb     | 7.08 Mb   | 1
│├── 11        | 27.712us       | 39.645us  | 27.648us        | 39.936us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│└── 12        | 54.813us       | 118.823us | 55.296us        | 107.360us  | 0 b          | 0 b     | 108.00 Kb     | 324.00 Kb | 1
├── avgpool    | 58.329us       | 116.577us | 58.368us        | 111.584us  | 0 b          | 0 b     | 36.00 Kb      | 108.00 Kb | 1
└── classifier |                |           |                 |            |              |         |               |           |
 ├── 0         | 79.169us       | 167.495us | 78.848us        | 145.408us  | 0 b          | 0 b     | 45.00 Kb      | 171.00 Kb | 1
 ├── 1         | 404.070us      | 423.755us | 793.600us       | 793.600us  | 0 b          | 0 b     | 16.00 Kb      | 32.00 Kb  | 1
 ├── 2         | 30.097us       | 43.512us  | 29.792us        | 43.904us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
 ├── 3         | 53.390us       | 121.042us | 53.248us        | 99.328us   | 0 b          | 0 b     | 20.00 Kb      | 76.00 Kb  | 1
 ├── 4         | 64.622us       | 79.902us  | 236.544us       | 236.544us  | 0 b          | 0 b     | 16.00 Kb      | 32.00 Kb  | 1
 ├── 5         | 28.854us       | 41.067us  | 28.544us        | 41.856us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
 └── 6         | 62.258us       | 77.356us  | 95.232us        | 95.232us   | 0 b          | 0 b     | 4.00 Kb       | 8.00 Kb   | 1

To see the low level operations that occur within each layer, print the contents of prof.display(show_events=True).

Module                              | Self CPU total | CPU total | Self CUDA total | CUDA total | Self CPU Mem | CPU Mem | Self CUDA Mem | CUDA Mem  | Number of Calls
------------------------------------|----------------|-----------|-----------------|------------|--------------|---------|---------------|-----------|----------------
AlexNet                             |                |           |                 |            |              |         |               |           |
├── features                        |                |           |                 |            |              |         |               |           |
│├── 0                              |                |           |                 |            |              |         |               |           |
││├── aten::conv2d                  | 15.630us       | 1.832ms   | 14.176us        | 1.831ms    | 0 b          | 0 b     | 0 b           | 756.50 Kb | 1
││├── aten::convolution             | 9.768us        | 1.816ms   | 9.056us         | 1.817ms    | 0 b          | 0 b     | 0 b           | 756.50 Kb | 1
││├── aten::_convolution            | 45.005us       | 1.807ms   | 34.432us        | 1.808ms    | 0 b          | 0 b     | 0 b           | 756.50 Kb | 1
││├── aten::contiguous              | 8.738us        | 8.738us   | 8.480us         | 8.480us    | 0 b          | 0 b     | 0 b           | 0 b       | 3
││├── aten::cudnn_convolution       | 1.647ms        | 1.683ms   | 1.745ms         | 1.750ms    | 0 b          | 0 b     | -18.00 Kb     | 756.50 Kb | 1
││├── aten::empty                   | 21.249us       | 21.249us  | 0.000us         | 0.000us    | 0 b          | 0 b     | 774.50 Kb     | 774.50 Kb | 2
││├── aten::resize_                 | 7.635us        | 7.635us   | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 2
││├── aten::stride                  | 1.902us        | 1.902us   | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 4
││├── aten::reshape                 | 6.081us        | 17.833us  | 2.048us         | 2.048us    | 0 b          | 0 b     | 0 b           | 0 b       | 1
││├── aten::view                    | 11.752us       | 11.752us  | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 1
││└── aten::add_                    | 57.248us       | 57.248us  | 18.432us        | 18.432us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 1                              |                |           |                 |            |              |         |               |           |
││├── aten::relu_                   | 27.152us       | 51.858us  | 25.696us        | 51.296us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
││└── aten::threshold_              | 24.706us       | 24.706us  | 25.600us        | 25.600us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 2                              |                |           |                 |            |              |         |               |           |
...

The original Pytorch EventList can be returned by calling raw() on the profile instance.

trace, event_lists_dict = prof.raw()
print(trace[2])
# Trace(path=('AlexNet', 'features', '0'), leaf=True, module=Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)))

print(event_lists_dict[trace[2].path][0])
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
               aten::conv2d         0.85%      15.630us       100.00%       1.832ms       1.832ms      14.176us         0.77%       1.831ms       1.831ms           0 b           0 b     756.50 Kb           0 b             1
          aten::convolution         0.53%       9.768us        99.15%       1.816ms       1.816ms       9.056us         0.49%       1.817ms       1.817ms           0 b           0 b     756.50 Kb           0 b             1
         aten::_convolution         2.46%      45.005us        98.61%       1.807ms       1.807ms      34.432us         1.88%       1.808ms       1.808ms           0 b           0 b     756.50 Kb           0 b             1
           aten::contiguous         0.20%       3.707us         0.20%       3.707us       3.707us       3.680us         0.20%       3.680us       3.680us           0 b           0 b           0 b           0 b             1
    aten::cudnn_convolution        89.90%       1.647ms        91.86%       1.683ms       1.683ms       1.745ms        95.27%       1.750ms       1.750ms           0 b           0 b     756.50 Kb     -18.00 Kb             1
                aten::empty         0.66%      12.102us         0.66%      12.102us      12.102us       0.000us         0.00%       0.000us       0.000us           0 b           0 b     756.50 Kb     756.50 Kb             1
           aten::contiguous         0.15%       2.706us         0.15%       2.706us       2.706us       2.560us         0.14%       2.560us       2.560us           0 b           0 b           0 b           0 b             1
              aten::resize_         0.39%       7.164us         0.39%       7.164us       7.164us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
           aten::contiguous         0.13%       2.325us         0.13%       2.325us       2.325us       2.240us         0.12%       2.240us       2.240us           0 b           0 b           0 b           0 b             1
              aten::resize_         0.03%       0.471us         0.03%       0.471us       0.471us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
               aten::stride         0.06%       1.092us         0.06%       1.092us       1.092us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
               aten::stride         0.02%       0.280us         0.02%       0.280us       0.280us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
               aten::stride         0.01%       0.270us         0.01%       0.270us       0.270us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
               aten::stride         0.01%       0.260us         0.01%       0.260us       0.260us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
                aten::empty         0.50%       9.147us         0.50%       9.147us       9.147us       0.000us         0.00%       0.000us       0.000us           0 b           0 b      18.00 Kb      18.00 Kb             1
              aten::reshape         0.33%       6.081us         0.97%      17.833us      17.833us       2.048us         0.11%       2.048us       2.048us           0 b           0 b           0 b           0 b             1
                 aten::view         0.64%      11.752us         0.64%      11.752us      11.752us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1
                 aten::add_         3.12%      57.248us         3.12%      57.248us      57.248us      18.432us         1.01%      18.432us      18.432us           0 b           0 b           0 b           0 b             1
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 1.832ms
CUDA time total: 1.831ms

Layers can be selected for individually using the optional paths kwarg. Profiling is ignored for all other layers.

model = torchvision.models.alexnet(pretrained=False)
x = torch.rand([1, 3, 224, 224])

# Layer does not have to be a leaf layer
paths = [("AlexNet", "features", "3"), ("AlexNet", "classifier")]

with torchprof.Profile(model, paths=paths) as prof:
    model(x)

print(prof)
Module         | Self CPU total | CPU total | Number of Calls
---------------|----------------|-----------|----------------
AlexNet        |                |           |
├── features   |                |           |
│├── 0         |                |           |
│├── 1         |                |           |
│├── 2         |                |           |
│├── 3         | 3.162ms        | 12.626ms  | 1
│├── 4         |                |           |
│├── 5         |                |           |
│├── 6         |                |           |
│├── 7         |                |           |
│├── 8         |                |           |
│├── 9         |                |           |
│├── 10        |                |           |
│├── 11        |                |           |
│└── 12        |                |           |
├── avgpool    |                |           |
└── classifier | 11.398ms       | 12.130ms  | 1
 ├── 0         |                |           |
 ├── 1         |                |           |
 ├── 2         |                |           |
 ├── 3         |                |           |
 ├── 4         |                |           |
 ├── 5         |                |           |
 └── 6         |                |           |

Citation

If this software is useful to your research, I would greatly appreciate a citation in your work.

@misc{awwong1-torchprof,
  title        = {torchprof},
  author       = {Alexander William Wong},
  month        = 12,
  year         = 2020,
  url          = {https://github.com/awwong1/torchprof}
  note         = {https://github.com/awwong1/torchprof}
}

LICENSE

MIT

torchprof's People

Contributors

ant0nsc avatar awwong1 avatar corentinj avatar yadamit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchprof's Issues

AttributeError: 'EventList' object has no attribute 'populate_cpu_children'

import torch
import torchvision
import torchprof

model = torchvision.models.alexnet(pretrained=False).cuda()

x = torch.rand([1, 3, 224, 224]).cuda()

with torchprof.Profile(model, use_cuda=True) as prof:
... model(x)
...
Traceback (most recent call last):
File "", line 2, in
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torchvision/models/alexnet.py", line 43, in forward
x = self.features(x)
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data1/ /annaconda3/envs/condamask/lib/python3.6/site-packages/torchprof/profile.py", line 72, in wrap_forward
event_list.populate_cpu_children()
AttributeError: 'EventList' object has no attribute 'populate_cpu_children'

Should children cpu/cuda total time be counted?

Hello,

These lines will sum the cpu total times and cuda times for all events in the current path:

sum([e.cpu_time_total for e in events]),
sum([e.cuda_time_total for e in events]),

In AlexNet, for the path ('AlexNet', 'classifier', '0'), events is

[<FunctionEvent id=0 cpu_time=39.981us cpu_start=22.671 cpu_end=62.652 cpu_children=[1] cuda_time=39.936us name=dropout thread=0 input_shapes=[]>,                                                     
 <FunctionEvent id=1 cpu_time=29.053us cpu_start=27.992 cpu_end=57.045 cpu_children=[] cuda_time=29.696us name=_fused_dropout thread=0 input_shapes=[]>]

where the function event with id=1 is the child of the function event with id=0. I may misunderstood, but if the function event id=1 is a child of function event id=0, isn't the total cpu/cuda time of the function event id=1 already included in the total cpu/cuda time of the function event id=0? If so, this would mean that we should not sum over all events, but only over events that do not have any parents?

`show_events` dropping non-consecutive duplicates

TODO: Update display code to not drop keys.

from itertools import groupby

L = [("a", 43), ("a", 3), ("b", 32), ("b", 99), ("c", 1)]
for k, v in groupby(L, lambda a: a[0]):
    print(k, list(v))

# a [('a', 43), ('a', 3)]
# b [('b', 32), ('b', 99)]
# c [('c', 1)]

# out of order keys, (a, b)
L = [("a", 43), ("b", 32), ("c", 1), ("a", 3), ("b", 99)]
for k, v in groupby(L, lambda a: a[0]):
    print(k, list(v))

# a [('a', 43)]
# b [('b', 32)]
# c [('c', 1)]
# a [('a', 3)]
# b [('b', 99)]

Originally posted by @awwong1 in #15 (comment)

Big difference between using TorchProf versus raw autograd profiler

Hi,
I ran both TorchProf and autograd profiler on CPU only to compare. For TorchProf I sum it manually of the CPU_total. For autograd profiler I export as chrome trace and opens it in chrome, select the whole range, and there's a summary of 'Wall Duration'.
With standard mobilenetv2 they match up, the numbers are fairly close: 76ms versus 75.5ms.
With SuperGlue (https://github.com/magicleap/SuperGluePretrainedNetwork), the numbers differ a lot: 405ms (TorchProf) versus 527ms (autograd profiler). Wondering what could be the reason.

torchprof breaks on PyTorch 1.8.0 because a function name changed

In PyTorch 1.8.0, a method name in the EventList class changed: populate_cpu_children has been made private, now _populate_cpu_children. This change was made in #46470.
This breaks torchprof with PyTorch 1.8.0.

Happy to create a PR for that, but would love to hear from @awwong1 if you are happy with having torchprof tied to version 1.8.0 of PyTorch. Alternatively, could put in a check which of the two methods is present.

Difference in total time of display=True and display=False

Thanks for this wonderful tool!
However for the same model, I see some difference in the times when using prof.display(show_events=False) and prof.display(show_events=True).
When using prof.display(show_events=False), this is a part of the output :

├── proposal_generator  |                |           |           
│├── anchor_generator   |                |           |           
││└── cell_anchors      |        0.000us |   0.000us |    0.000us
│├── rpn_head           |                |           |           
││├── conv              |      569.006us |   2.113ms |  167.045ms
││├── objectness_logits |      389.972us |   1.400ms |    6.801ms
││└── anchor_deltas     |      342.629us |   1.210ms |    7.177ms

The above shows that the conv layer under rpn_head takes 167.045 ms.
However when I use prof.display(show_events=True), the same part of the output now becomes:

│├── rpn_head                    |                |           |           
││├── conv                       |                |           |           
│││├── conv2d                    |        4.283us | 102.069us |  433.152us
│││├── convolution               |        3.701us |  97.786us |  430.080us
│││├── _convolution              |        9.287us |  94.085us |  424.960us
│││├── contiguous                |        1.963us |   1.963us |    2.048us
│││└── cudnn_convolution         |       82.835us |  82.835us |  414.720us
││├── objectness_logits          |                |           |           
│││├── conv2d                    |        3.989us |  74.004us |   87.040us
│││├── convolution               |        3.754us |  70.015us |   82.944us
│││├── _convolution              |       13.054us |  66.261us |   78.848us
│││├── contiguous                |        2.232us |   2.232us |    0.512us
│││└── cudnn_convolution         |       50.975us |  50.975us |   65.536us
││├── anchor_deltas              |                |           |           
│││├── conv2d                    |        4.255us |  69.547us |   80.896us
│││├── convolution               |        3.742us |  65.292us |   75.776us
│││├── _convolution              |        9.442us |  61.550us |   71.680us
│││├── contiguous                |        2.119us |   2.119us |    2.048us
│││└── cudnn_convolution         |       49.989us |  49.989us |   60.416us

Here however by summing the values under conv_layer of rpn_head, the time taken is 1.7us.
Why is there such a huge difference between these two values? Shouldn't they be similar?

Difference between the CPU time and the GPU time?

Thanks for your helpful tools! I still have a question about the difference between the CPU time and the GPU time. I deploy CNN models on GPU platform, if I want to measure the latency of each layer, can I directly use the cuda total time as the results? Does the CPU total time include the cuda total time as explained by the given link Self CPU Time vs CPU Time? Thank you!

why appears the problem"RuntimeError: Profiler is already enabled on this thread"

File "main_check.py", line 65, in
model(x)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/ubuntu/4d0449a9-3698-450e-81ea-a5ae8bf71538/SGN/twod_model_va.py", line 171, in forward
affine = self.affine_trans(x.transpose(1, 2)).transpose(1, 2)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/ubuntu/4d0449a9-3698-450e-81ea-a5ae8bf71538/SGN/tcn.py", line 63, in forward
return self.network(x)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/media/ubuntu/4d0449a9-3698-450e-81ea-a5ae8bf71538/SGN/tcn.py", line 43, in forward
out = self.net(x)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torchprof/profile.py", line 74, in wrap_forward
res = _forward(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torchprof/profile.py", line 73, in wrap_forward
with tprofiler.profile(use_cuda=self.use_cuda) as prof:
File "/home/ubuntu/anaconda3/envs/pyg/lib/python3.8/site-packages/torch/autograd/profiler.py", line 320, in enter
torch.autograd._enable_profiler(config)
RuntimeError: Profiler is already enabled on this thread

Difference among `conv2d`, `convolution`, and `_convolution`

Thanks for sharing such a useful tool to profile pytorch model.

Module                        | Self CPU total | CPU total | CUDA total
------------------------------|----------------|-----------|-----------
AlexNet                       |                |           |
├── features                  |                |           |
│├── 0                        |                |           |
││├── conv2d                  |       15.740us |   1.956ms |    1.972ms
││├── convolution             |       12.000us |   1.940ms |    1.957ms
││├── _convolution            |       36.590us |   1.928ms |    1.946ms
││├── contiguous              |        6.600us |   6.600us |    6.464us
││└── cudnn_convolution       |        1.885ms |   1.885ms |    1.906ms
│├── 1                        |                |           |
││└── relu_                   |       68.880us |  68.880us |   69.632us
│├── 2                        |                |           |
││├── max_pool2d              |       15.330us |  85.639us |   84.992us
││└── max_pool2d_with_indices |       70.309us |  70.309us |   70.656us
│├── 3                        |                |           |

conv2d, convolution, and _convolution are at the same level in your example of AlexNet. What is the difference among conv2d, convolution, and _convolution?

Why is it different between torchprof time and pytorch actual training time?

Thanks for sharing such a useful tool for profile pytorch model. But When I use it, I meet some problem. I train ResNet50 by pytorch in ImageNet dataset(batch size = 64), and a single iteration costs 0.782ms. However I use torchprof to profile ResNet50, I find layer1 costs 1.539s. I am confused. Output likes this:

Module Self CPU total CPU total CUDA total Occurrences
ResNet
├── conv1
├── bn1
├── relu
├── maxpool
├── layer1 142.341ms 786.225ms 1.539s 1

adding a summary line ?

Wondering if you can add a summary line at the output end? Basically I am comparing a few models, so I don't need to know how long each op takes but I want to know how long the whole model takes. I can add them myself but given there are many layers and ms/us conversion it's not quick.

How to get summary time?

Another question is, how to get the summary time of the whole model? Should I use sum of (CPU_total + Cuda_total)*occurance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.