Coder Social home page Coder Social logo

ask_revClass performance about omega_h HOT 10 CLOSED

scorec avatar scorec commented on September 16, 2024
ask_revClass performance

from omega_h.

Comments (10)

joshia5 avatar joshia5 commented on September 16, 2024 1

from omega_h.

joshia5 avatar joshia5 commented on September 16, 2024 1

from omega_h.

cwsmith avatar cwsmith commented on September 16, 2024

@joshia5 Do you see any differences from your build and run process on SCOREC? If so, please list them. In particular, which version of omegah were you running for the fast ask_revClass times on the 146k mesh reported in the emails?

from omega_h.

joshia5 avatar joshia5 commented on September 16, 2024

from omega_h.

cwsmith avatar cwsmith commented on September 16, 2024

OK. Would you please run the reverse_class_test as shown above with --osh-time to see what timings you get?

from omega_h.

joshia5 avatar joshia5 commented on September 16, 2024

from omega_h.

joshia5 avatar joshia5 commented on September 16, 2024

from omega_h.

joshia5 avatar joshia5 commented on September 16, 2024

from omega_h.

cwsmith avatar cwsmith commented on September 16, 2024

The output on dcs259 is below for the build/run with and without kokkos. The no-kokkos result is within 1% of what you saw on cranium (0.562172s vs 0.564636s). The kokkos run is 54% slower; which is roughly in line with 43% difference we saw in the full version of the test (master@466e4702).

I think this pretty much confirms that either something about the build of gitrm (and/or its use of omegah) is causing the problem.

The run with kokkos disabled:

faces ask_rc time: 0.560801 seconds

TOP-DOWN:
=========
ask_revClass 0.562172 8
|  derive_revClass 0.56195 4
|  |  sort_by_high_index 0.560344 4
|  |  offset_scan 0.000220853 4
|  |  |  device_free 5.2965e-05 4
|  |  |  single host to device 3.9708e-05 4
|  |  |  Write allocation 2.9754e-05 4
|  |  |  |  device_malloc 2.6465e-05 4
|  |  |  device_malloc 2.5161e-05 4
|  |  device_free 0.000183636 12
|  |  Write allocation 0.000118426 16
|  |  |  device_malloc 0.000105891 16
|  offset_scan 5.4673e-05 1
|  |  device_free 1.2597e-05 1
|  |  single host to device 9.566e-06 1
|  |  Write allocation 7.107e-06 1
|  |  |  device_malloc 5.767e-06 1
|  |  device_malloc 6.066e-06 1
|  Write allocation 1.4481e-05 2
|  |  device_malloc 1.2431e-05 2
|  device_free 8.971e-06 1
|  ask_revClass 4.65e-07 1
binary::read(path, comm, mesh, strict) 0.0677998 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0663222 2
|  |  binary::read(istream, mesh, version) 0.0662304 2
|  |  |  Write allocation 0.00269113 27
|  |  |  |  device_malloc 0.00266934 27
|  |  |  array host to device 0.000849324 27
|  |  |  set_ents 1.1176e-05 5
device_free 0.00213896 53
ask_adj 0.000607764 3
|  derive_adj 0.000604695 1
|  |  transit 0.000600884 1
|  |  |  Write allocation 6.834e-06 1
|  |  |  |  device_malloc 5.695e-06 1
|  |  ask_adj 9.81e-07 2
Write allocation 9.8263e-05 15
|  device_malloc 8.7703e-05 15
array host to device 8.6837e-05 9

BOTTOM-UP:
==========
sort_by_high_index 0.560344 4
|  derive_revClass 0.560344 4
|  |  ask_revClass 0.560344 4
binary::read(istream, mesh, version) 0.0626788 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0626788 2
|  |  binary::read(path, comm, mesh, strict) 0.0626788 2
device_malloc 0.00294452 71
|  Write allocation 0.00291329 66
|  |  binary::read(istream, mesh, version) 0.00266934 27
|  |  |  binary::read_in_comm(path, comm, mesh, version) 0.00266934 27
|  |  |  |  binary::read(path, comm, mesh, strict) 0.00266934 27
|  |  derive_revClass 0.000105891 16
|  |  |  ask_revClass 0.000105891 16
|  |  offset_scan 3.2232e-05 5
|  |  |  derive_revClass 2.6465e-05 4
|  |  |  |  ask_revClass 2.6465e-05 4
|  |  |  ask_revClass 5.767e-06 1
|  |  ask_revClass 1.2431e-05 2
|  |  transit 5.695e-06 1
|  |  |  derive_adj 5.695e-06 1
|  |  |  |  ask_adj 5.695e-06 1
|  offset_scan 3.1227e-05 5
|  |  derive_revClass 2.5161e-05 4
|  |  |  ask_revClass 2.5161e-05 4
|  |  ask_revClass 6.066e-06 1
device_free 0.00239713 71
|  derive_revClass 0.000183636 12
|  |  ask_revClass 0.000183636 12
|  offset_scan 6.5562e-05 5
|  |  derive_revClass 5.2965e-05 4
|  |  |  ask_revClass 5.2965e-05 4
|  |  ask_revClass 1.2597e-05 1
|  ask_revClass 8.971e-06 1
binary::read(path, comm, mesh, strict) 0.00147752 2
derive_revClass 0.00108324 4
|  ask_revClass 0.00108324 4
array host to device 0.000936161 36
|  binary::read(istream, mesh, version) 0.000849324 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.000849324 27
|  |  |  binary::read(path, comm, mesh, strict) 0.000849324 27
transit 0.00059405 1
|  derive_adj 0.00059405 1
|  |  ask_adj 0.00059405 1
ask_revClass 0.000143731 9
|  ask_revClass 4.65e-07 1
offset_scan 9.2602e-05 5
|  derive_revClass 7.3265e-05 4
|  |  ask_revClass 7.3265e-05 4
|  ask_revClass 1.9337e-05 1
binary::read_in_comm(path, comm, mesh, version) 9.1794e-05 2
|  binary::read(path, comm, mesh, strict) 9.1794e-05 2
Write allocation 5.2706e-05 66
|  binary::read(istream, mesh, version) 2.1793e-05 27
|  |  binary::read_in_comm(path, comm, mesh, version) 2.1793e-05 27
|  |  |  binary::read(path, comm, mesh, strict) 2.1793e-05 27
|  derive_revClass 1.2535e-05 16
|  |  ask_revClass 1.2535e-05 16
|  offset_scan 4.629e-06 5
|  |  derive_revClass 3.289e-06 4
|  |  |  ask_revClass 3.289e-06 4
|  |  ask_revClass 1.34e-06 1
|  ask_revClass 2.05e-06 2
|  transit 1.139e-06 1
|  |  derive_adj 1.139e-06 1
|  |  |  ask_adj 1.139e-06 1
single host to device 4.9274e-05 5
|  offset_scan 4.9274e-05 5
|  |  derive_revClass 3.9708e-05 4
|  |  |  ask_revClass 3.9708e-05 4
|  |  ask_revClass 9.566e-06 1
set_ents 1.1176e-05 5
|  binary::read(istream, mesh, version) 1.1176e-05 5
|  |  binary::read_in_comm(path, comm, mesh, version) 1.1176e-05 5
|  |  |  binary::read(path, comm, mesh, strict) 1.1176e-05 5
ask_adj 4.05e-06 5
|  derive_adj 9.81e-07 2
|  |  ask_adj 9.81e-07 2
derive_adj 2.83e-06 1
|  ask_adj 2.83e-06 1

The run with Kokkos enabled:

faces ask_rc time: 0.868226 seconds

TOP-DOWN:
=========
ask_revClass 0.870067 8
|  derive_revClass 0.869768 4
|  |  sort_by_high_index 0.867046 4
|  |  Write allocation 0.000914632 16
|  |  offset_scan 0.000344374 4
|  |  |  Write allocation 0.000146366 4
|  |  |  device_free 5.5791e-05 4
|  |  |  single host to device 3.6593e-05 4
|  |  |  device_malloc 2.6933e-05 4
|  offset_scan 8.3409e-05 1
|  |  Write allocation 3.5237e-05 1
|  |  device_free 1.2824e-05 1
|  |  single host to device 8.872e-06 1
|  |  device_malloc 6.476e-06 1
|  Write allocation 6.872e-05 2
|  ask_revClass 5.49e-07 1
binary::read(path, comm, mesh, strict) 0.0683914 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0668707 2
|  |  binary::read(istream, mesh, version) 0.066775 2
|  |  |  Write allocation 0.00280287 27
|  |  |  array host to device 0.00135131 27
|  |  |  set_ents 1.5135e-05 5
ask_adj 0.000697382 3
|  derive_adj 0.000694121 1
|  |  transit 0.000689666 1
|  |  |  Write allocation 3.5156e-05 1
|  |  ask_adj 1.079e-06 2
Write allocation 0.000527981 15
array host to device 0.000198938 9

BOTTOM-UP:
==========
sort_by_high_index 0.867046 4
|  derive_revClass 0.867046 4
|  |  ask_revClass 0.867046 4
binary::read(istream, mesh, version) 0.0626057 2
|  binary::read_in_comm(path, comm, mesh, version) 0.0626057 2
|  |  binary::read(path, comm, mesh, strict) 0.0626057 2
Write allocation 0.00453096 66
|  binary::read(istream, mesh, version) 0.00280287 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.00280287 27
|  |  |  binary::read(path, comm, mesh, strict) 0.00280287 27
|  derive_revClass 0.000914632 16
|  |  ask_revClass 0.000914632 16
|  offset_scan 0.000181603 5
|  |  derive_revClass 0.000146366 4
|  |  |  ask_revClass 0.000146366 4
|  |  ask_revClass 3.5237e-05 1
|  ask_revClass 6.872e-05 2
|  transit 3.5156e-05 1
|  |  derive_adj 3.5156e-05 1
|  |  |  ask_adj 3.5156e-05 1
array host to device 0.00155025 36
|  binary::read(istream, mesh, version) 0.00135131 27
|  |  binary::read_in_comm(path, comm, mesh, version) 0.00135131 27
|  |  |  binary::read(path, comm, mesh, strict) 0.00135131 27
binary::read(path, comm, mesh, strict) 0.00152072 2
derive_revClass 0.00146284 4
|  ask_revClass 0.00146284 4
transit 0.00065451 1
|  derive_adj 0.00065451 1
|  |  ask_adj 0.00065451 1
ask_revClass 0.000146969 9
|  ask_revClass 5.49e-07 1
offset_scan 9.8691e-05 5
|  derive_revClass 7.8691e-05 4
|  |  ask_revClass 7.8691e-05 4
|  ask_revClass 2e-05 1
binary::read_in_comm(path, comm, mesh, version) 9.5703e-05 2
|  binary::read(path, comm, mesh, strict) 9.5703e-05 2
device_free 6.8615e-05 5
|  offset_scan 6.8615e-05 5
|  |  derive_revClass 5.5791e-05 4
|  |  |  ask_revClass 5.5791e-05 4
|  |  ask_revClass 1.2824e-05 1
single host to device 4.5465e-05 5
|  offset_scan 4.5465e-05 5
|  |  derive_revClass 3.6593e-05 4
|  |  |  ask_revClass 3.6593e-05 4
|  |  ask_revClass 8.872e-06 1
device_malloc 3.3409e-05 5
|  offset_scan 3.3409e-05 5
|  |  derive_revClass 2.6933e-05 4
|  |  |  ask_revClass 2.6933e-05 4
|  |  ask_revClass 6.476e-06 1
set_ents 1.5135e-05 5
|  binary::read(istream, mesh, version) 1.5135e-05 5
|  |  binary::read_in_comm(path, comm, mesh, version) 1.5135e-05 5
|  |  |  binary::read(path, comm, mesh, strict) 1.5135e-05 5
ask_adj 4.34e-06 5
|  derive_adj 1.079e-06 2
|  |  ask_adj 1.079e-06 2
derive_adj 3.376e-06 1
|  ask_adj 3.376e-06 1

from omega_h.

cwsmith avatar cwsmith commented on September 16, 2024

This appears to be a problem related to pumipic picpart creation.

from omega_h.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.