This is a Caffe2 operator microbenchmark.
pip install -r requirements.txt
If you want to know the intensity per batch in advance and draw the Roofline, you should add config of operator like we provide.
Example:
cd Roofline
python3 Roofline.py --config-file Config/Config_Conv.json
cd [OPERATOR]
If you want to get the execution time
-
Using CPU
python3 [FILENAME]
-
Using GPU
python3 [FILENAME] --use-gpu
If you want to get the CPU-GPU data transfer time
nsys profile --trace=cuda python3 [FILENAME] --use-gpu
nsys stats [.qdrep] // Output file from nsys profile
- Convolution
- Fully Connected
- SparseLengthsSum