Comments (10)
from omega_h.
from omega_h.
@joshia5 Do you see any differences from your build and run process on SCOREC? If so, please list them. In particular, which version of omegah were you running for the fast ask_revClass
times on the 146k mesh reported in the emails?
from omega_h.
from omega_h.
OK. Would you please run the reverse_class_test
as shown above with --osh-time
to see what timings you get?
from omega_h.
from omega_h.
from omega_h.
from omega_h.
The output on dcs259 is below for the build/run with and without kokkos. The no-kokkos result is within 1% of what you saw on cranium (0.562172s vs 0.564636s). The kokkos run is 54% slower; which is roughly in line with 43% difference we saw in the full version of the test (master@466e4702).
I think this pretty much confirms that either something about the build of gitrm (and/or its use of omegah) is causing the problem.
The run with kokkos disabled:
faces ask_rc time: 0.560801 seconds
TOP-DOWN:
=========
ask_revClass 0.562172 8
| derive_revClass 0.56195 4
| | sort_by_high_index 0.560344 4
| | offset_scan 0.000220853 4
| | | device_free 5.2965e-05 4
| | | single host to device 3.9708e-05 4
| | | Write allocation 2.9754e-05 4
| | | | device_malloc 2.6465e-05 4
| | | device_malloc 2.5161e-05 4
| | device_free 0.000183636 12
| | Write allocation 0.000118426 16
| | | device_malloc 0.000105891 16
| offset_scan 5.4673e-05 1
| | device_free 1.2597e-05 1
| | single host to device 9.566e-06 1
| | Write allocation 7.107e-06 1
| | | device_malloc 5.767e-06 1
| | device_malloc 6.066e-06 1
| Write allocation 1.4481e-05 2
| | device_malloc 1.2431e-05 2
| device_free 8.971e-06 1
| ask_revClass 4.65e-07 1
binary::read(path, comm, mesh, strict) 0.0677998 2
| binary::read_in_comm(path, comm, mesh, version) 0.0663222 2
| | binary::read(istream, mesh, version) 0.0662304 2
| | | Write allocation 0.00269113 27
| | | | device_malloc 0.00266934 27
| | | array host to device 0.000849324 27
| | | set_ents 1.1176e-05 5
device_free 0.00213896 53
ask_adj 0.000607764 3
| derive_adj 0.000604695 1
| | transit 0.000600884 1
| | | Write allocation 6.834e-06 1
| | | | device_malloc 5.695e-06 1
| | ask_adj 9.81e-07 2
Write allocation 9.8263e-05 15
| device_malloc 8.7703e-05 15
array host to device 8.6837e-05 9
BOTTOM-UP:
==========
sort_by_high_index 0.560344 4
| derive_revClass 0.560344 4
| | ask_revClass 0.560344 4
binary::read(istream, mesh, version) 0.0626788 2
| binary::read_in_comm(path, comm, mesh, version) 0.0626788 2
| | binary::read(path, comm, mesh, strict) 0.0626788 2
device_malloc 0.00294452 71
| Write allocation 0.00291329 66
| | binary::read(istream, mesh, version) 0.00266934 27
| | | binary::read_in_comm(path, comm, mesh, version) 0.00266934 27
| | | | binary::read(path, comm, mesh, strict) 0.00266934 27
| | derive_revClass 0.000105891 16
| | | ask_revClass 0.000105891 16
| | offset_scan 3.2232e-05 5
| | | derive_revClass 2.6465e-05 4
| | | | ask_revClass 2.6465e-05 4
| | | ask_revClass 5.767e-06 1
| | ask_revClass 1.2431e-05 2
| | transit 5.695e-06 1
| | | derive_adj 5.695e-06 1
| | | | ask_adj 5.695e-06 1
| offset_scan 3.1227e-05 5
| | derive_revClass 2.5161e-05 4
| | | ask_revClass 2.5161e-05 4
| | ask_revClass 6.066e-06 1
device_free 0.00239713 71
| derive_revClass 0.000183636 12
| | ask_revClass 0.000183636 12
| offset_scan 6.5562e-05 5
| | derive_revClass 5.2965e-05 4
| | | ask_revClass 5.2965e-05 4
| | ask_revClass 1.2597e-05 1
| ask_revClass 8.971e-06 1
binary::read(path, comm, mesh, strict) 0.00147752 2
derive_revClass 0.00108324 4
| ask_revClass 0.00108324 4
array host to device 0.000936161 36
| binary::read(istream, mesh, version) 0.000849324 27
| | binary::read_in_comm(path, comm, mesh, version) 0.000849324 27
| | | binary::read(path, comm, mesh, strict) 0.000849324 27
transit 0.00059405 1
| derive_adj 0.00059405 1
| | ask_adj 0.00059405 1
ask_revClass 0.000143731 9
| ask_revClass 4.65e-07 1
offset_scan 9.2602e-05 5
| derive_revClass 7.3265e-05 4
| | ask_revClass 7.3265e-05 4
| ask_revClass 1.9337e-05 1
binary::read_in_comm(path, comm, mesh, version) 9.1794e-05 2
| binary::read(path, comm, mesh, strict) 9.1794e-05 2
Write allocation 5.2706e-05 66
| binary::read(istream, mesh, version) 2.1793e-05 27
| | binary::read_in_comm(path, comm, mesh, version) 2.1793e-05 27
| | | binary::read(path, comm, mesh, strict) 2.1793e-05 27
| derive_revClass 1.2535e-05 16
| | ask_revClass 1.2535e-05 16
| offset_scan 4.629e-06 5
| | derive_revClass 3.289e-06 4
| | | ask_revClass 3.289e-06 4
| | ask_revClass 1.34e-06 1
| ask_revClass 2.05e-06 2
| transit 1.139e-06 1
| | derive_adj 1.139e-06 1
| | | ask_adj 1.139e-06 1
single host to device 4.9274e-05 5
| offset_scan 4.9274e-05 5
| | derive_revClass 3.9708e-05 4
| | | ask_revClass 3.9708e-05 4
| | ask_revClass 9.566e-06 1
set_ents 1.1176e-05 5
| binary::read(istream, mesh, version) 1.1176e-05 5
| | binary::read_in_comm(path, comm, mesh, version) 1.1176e-05 5
| | | binary::read(path, comm, mesh, strict) 1.1176e-05 5
ask_adj 4.05e-06 5
| derive_adj 9.81e-07 2
| | ask_adj 9.81e-07 2
derive_adj 2.83e-06 1
| ask_adj 2.83e-06 1
The run with Kokkos enabled:
faces ask_rc time: 0.868226 seconds
TOP-DOWN:
=========
ask_revClass 0.870067 8
| derive_revClass 0.869768 4
| | sort_by_high_index 0.867046 4
| | Write allocation 0.000914632 16
| | offset_scan 0.000344374 4
| | | Write allocation 0.000146366 4
| | | device_free 5.5791e-05 4
| | | single host to device 3.6593e-05 4
| | | device_malloc 2.6933e-05 4
| offset_scan 8.3409e-05 1
| | Write allocation 3.5237e-05 1
| | device_free 1.2824e-05 1
| | single host to device 8.872e-06 1
| | device_malloc 6.476e-06 1
| Write allocation 6.872e-05 2
| ask_revClass 5.49e-07 1
binary::read(path, comm, mesh, strict) 0.0683914 2
| binary::read_in_comm(path, comm, mesh, version) 0.0668707 2
| | binary::read(istream, mesh, version) 0.066775 2
| | | Write allocation 0.00280287 27
| | | array host to device 0.00135131 27
| | | set_ents 1.5135e-05 5
ask_adj 0.000697382 3
| derive_adj 0.000694121 1
| | transit 0.000689666 1
| | | Write allocation 3.5156e-05 1
| | ask_adj 1.079e-06 2
Write allocation 0.000527981 15
array host to device 0.000198938 9
BOTTOM-UP:
==========
sort_by_high_index 0.867046 4
| derive_revClass 0.867046 4
| | ask_revClass 0.867046 4
binary::read(istream, mesh, version) 0.0626057 2
| binary::read_in_comm(path, comm, mesh, version) 0.0626057 2
| | binary::read(path, comm, mesh, strict) 0.0626057 2
Write allocation 0.00453096 66
| binary::read(istream, mesh, version) 0.00280287 27
| | binary::read_in_comm(path, comm, mesh, version) 0.00280287 27
| | | binary::read(path, comm, mesh, strict) 0.00280287 27
| derive_revClass 0.000914632 16
| | ask_revClass 0.000914632 16
| offset_scan 0.000181603 5
| | derive_revClass 0.000146366 4
| | | ask_revClass 0.000146366 4
| | ask_revClass 3.5237e-05 1
| ask_revClass 6.872e-05 2
| transit 3.5156e-05 1
| | derive_adj 3.5156e-05 1
| | | ask_adj 3.5156e-05 1
array host to device 0.00155025 36
| binary::read(istream, mesh, version) 0.00135131 27
| | binary::read_in_comm(path, comm, mesh, version) 0.00135131 27
| | | binary::read(path, comm, mesh, strict) 0.00135131 27
binary::read(path, comm, mesh, strict) 0.00152072 2
derive_revClass 0.00146284 4
| ask_revClass 0.00146284 4
transit 0.00065451 1
| derive_adj 0.00065451 1
| | ask_adj 0.00065451 1
ask_revClass 0.000146969 9
| ask_revClass 5.49e-07 1
offset_scan 9.8691e-05 5
| derive_revClass 7.8691e-05 4
| | ask_revClass 7.8691e-05 4
| ask_revClass 2e-05 1
binary::read_in_comm(path, comm, mesh, version) 9.5703e-05 2
| binary::read(path, comm, mesh, strict) 9.5703e-05 2
device_free 6.8615e-05 5
| offset_scan 6.8615e-05 5
| | derive_revClass 5.5791e-05 4
| | | ask_revClass 5.5791e-05 4
| | ask_revClass 1.2824e-05 1
single host to device 4.5465e-05 5
| offset_scan 4.5465e-05 5
| | derive_revClass 3.6593e-05 4
| | | ask_revClass 3.6593e-05 4
| | ask_revClass 8.872e-06 1
device_malloc 3.3409e-05 5
| offset_scan 3.3409e-05 5
| | derive_revClass 2.6933e-05 4
| | | ask_revClass 2.6933e-05 4
| | ask_revClass 6.476e-06 1
set_ents 1.5135e-05 5
| binary::read(istream, mesh, version) 1.5135e-05 5
| | binary::read_in_comm(path, comm, mesh, version) 1.5135e-05 5
| | | binary::read(path, comm, mesh, strict) 1.5135e-05 5
ask_adj 4.34e-06 5
| derive_adj 1.079e-06 2
| | ask_adj 1.079e-06 2
derive_adj 3.376e-06 1
| ask_adj 3.376e-06 1
from omega_h.
This appears to be a problem related to pumipic picpart creation.
from omega_h.
Related Issues (20)
- don't use static functions with thrust parallel_for HOT 1
- building with cuda 11.2 fails HOT 2
- Checking IDs of adjacent entities in the mixed mesh test
- Error building with "-DCMAKE_BUILD_TYPE=Release" on Summit HOT 12
- Issue converting Simmetrix to Omega_h mesh HOT 7
- ask_rev API information not correct when using omega-h mesh returned from pumi-pic HOT 1
- Number of components in a mesh tag is more than INT8_MAX HOT 5
- Warning message configuring with kokkos 3.4.1 on Perlmutter HOT 3
- copy semantics of Mesh HOT 4
- `get_barycentric` and `simplex_basis` mismatch HOT 1
- get/set tags with Topo_type vs dimension (integer) HOT 6
- make rcField functions private HOT 3
- Fix usage of change_tagToMesh/change_tagTorc HOT 9
- Add cuda/kokkos builds to GitHub actions HOT 1
- Check if it's possible to add simmodsuite to the test matrix HOT 3
- build with gcc-6.3.0 HOT 5
- New issue building omega_h on Perlmutter with kokkos 3.4.01, gcc 11.2, and cuda 11.7 HOT 1
- Issue building on RHEL7 with cuda 12.1 and gcc 11.2.0 HOT 2
- build errors using gcc10 on rhel7 with cuda disabled HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from omega_h.