Coder Social home page Coder Social logo

Version with profiling about psycloned_nemo HOT 7 OPEN

arporter avatar arporter commented on July 18, 2024
Version with profiling

from psycloned_nemo.

Comments (7)

arporter avatar arporter commented on July 18, 2024

I've altered my script so that it permits single-line IF statements inside KERNELS regions. I've also done some experimenting and identified some files that I can now process that I couldn't previously. This makes the profile more informative if nothing else:

nemo_icestp_profiled

Most of the remaining white space is due to either global sums (especially in stp_ctl in stpctl.f90) or the packed halo exchanges (lbc_lnk_ptr in lbclnk.f90).

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

An update: the support for the NVTX profiling API is now on master in PSyclone.

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

nemo_prof_kernels_inside_tracers_tranxtvvl
I've (manually) tweaked traldf_iso and tra_nxt_vvl to put KERNELS in more sensible/performant locations. They've now disappeared from the profile :-)

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

nemo_prof_gsum
I've (manually) optimised the global sums in stp_ctl - the source of the white-space on the RHS of the profiles before this one. I've also introduced a heuristic that puts KERNELS inside loops over levels when they contain 2 or more loops. The latter is essential in a couple of the big kernels but the overall performance benefit is questionable. Still, it's only a small change to the script :-)

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

Realised I had a bug in the script that meant that KERNELS were not being put in lower branches of CASE statements. Also realised that PSyclone can now process icetab.f90, however resulting code is slower...

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

nemo_prof_icetab

from psycloned_nemo.

arporter avatar arporter commented on July 18, 2024

Have got NEMO compiling with the latest version of PSyclone and PGI 19.10. Since the profiling API has changed I've created a new branch (profiling_new_api) in this repo. You will need to build the latest version of the nvidia wrapper library distributed with PSyclone. (See description of this Issue.) Resulting code is fastest yet:

nemo_prof_220520

from psycloned_nemo.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.