yoonbo / kokkos Goto Github PK
View Code? Open in Web Editor NEWThis project forked from kokkos/kokkos
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction
License: Other
This project forked from kokkos/kokkos
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction
License: Other
Kokkos implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use OpenMP, Pthreads and CUDA as backend programming models. Kokkos is licensed under standard 3-clause BSD terms of use. For specifics see the LICENSE file contained in the repository or distribution. The core developers of Kokkos are Carter Edwards and Christian Trott at the Computer Science Research Institute of the Sandia National Laboratories. The KokkosP interface and associated tools are developed by the Application Performance Team and Kokkos core developers at Sandia National Laboratories. To learn more about Kokkos consider watching one of our presentations: GTC 2015: http://on-demand.gputechconf.com/gtc/2015/video/S5166.html http://on-demand.gputechconf.com/gtc/2015/presentation/S5166-H-Carter-Edwards.pdf A programming guide can be found under doc/Kokkos_PG.pdf. This is an initial version and feedback is greatly appreciated. A separate repository with extensive tutorial material can be found under https://github.com/kokkos/kokkos-tutorials. If you have a patch to contribute please feel free to issue a pull request against the develop branch. For major contributions it is better to contact us first for guidance. For questions please send an email to [email protected] For non-public questions send an email to hcedwar(at)sandia.gov and crtrott(at)sandia.gov ============================================================================ ====Requirements============================================================ ============================================================================ Primary tested compilers on X86 are: GCC 4.7.2 GCC 4.8.4 GCC 4.9.2 GCC 5.1.0 GCC 5.2.0 Intel 14.0.4 Intel 15.0.2 Intel 16.0.1 Intel 17.0.098 Intel 17.1.132 Clang 3.5.2 Clang 3.6.1 Clang 3.7.1 Clang 3.8.1 Clang 3.9.0 PGI 17.1 Primary tested compilers on Power 8 are: GCC 5.4.0 (OpenMP,Serial) IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug) Primary tested compilers on Intel KNL are: GCC 6.2.0 Intel 16.2.181 (with gcc 4.7.2) Intel 17.0.098 (with gcc 4.7.2) Intel 17.1.132 (with gcc 4.9.3) Intel 17.2.174 (with gcc 4.9.3) Intel 18.0.061 (beta) (with gcc 4.9.3) Secondary tested compilers are: CUDA 7.0 (with gcc 4.8.4) CUDA 7.5 (with gcc 4.8.4) CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8) CUDA/Clang 8.0 using Clang/Trunk compiler Other compilers working: X86: Cygwin 2.1.0 64bit with gcc 4.9.3 Limited testing of the following compilers on POWER7+ systems: GCC 4.8.5 (on RHEL7.1 POWER7+) Known non-working combinations: Power8: Pthreads backend Primary tested compiler are passing in release mode with warnings as errors. They also are tested with a comprehensive set of backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...). We are using the following set of flags: GCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized Secondary compilers are passing without -Werror. Other compilers are tested occasionally, in particular when pushing from develop to master branch, without -Werror and only for a select set of backends. ============================================================================ ====Getting started========================================================= ============================================================================ In the 'example/tutorial' directory you will find step by step tutorial examples which explain many of the features of Kokkos. They work with simple Makefiles. To build with g++ and OpenMP simply type 'make' in the 'example/tutorial' directory. This will build all examples in the subfolders. To change the build options refer to the Programming Guide in the compilation section. ============================================================================ ====Running Unit Tests====================================================== ============================================================================ To run the unit tests create a build directory and run the following commands KOKKOS_PATH/generate_makefile.bash make build-test make test Run KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as changing the device type for which to build. ============================================================================ ====Install the library===================================================== ============================================================================ To install Kokkos as a library create a build directory and run the following KOKKOS_PATH/generate_makefile.bash --prefix=INSTALL_PATH make lib make install KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as changing the device type for which to build. ============================================================================ ====CMakeFiles============================================================== ============================================================================ The CMake files contained in this repository require Tribits and are used for integration with Trilinos. They do not currently support a standalone CMake build. =========================================================================== ====Kokkos and CUDA UVM==================================================== =========================================================================== Kokkos does support UVM as a specific memory space called CudaUVMSpace. Allocations made with that space are accessible from host and device. You can tell Kokkos to use that as the default space for Cuda allocations. In either case UVM comes with a number of restrictions: (i) You can't access allocations on the host while a kernel is potentially running. This will lead to segfaults. To avoid that you either need to call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or you can set the environment variable CUDA_LAUNCH_BLOCKING=1. Furthermore in multi socket multi GPU machines, UVM defaults to using zero copy allocations for technical reasons related to using multiple GPUs from the same process. If an executable doesn't do that (e.g. each MPI rank of an application uses a single GPU [can be the same GPU for multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1. This will enforce proper UVM allocations, but can lead to errors if more than a single GPU is used by a single process. =========================================================================== ====Contributing=========================================================== =========================================================================== Contributions to Kokkos are welcome. In order to do so, please open an issue where a feature request or bug can be discussed. Then issue a pull request with your contribution. Pull requests must be issued against the develop branch. =========================================================================== ====Citing Kokkos========================================================== =========================================================================== If you publish work which mentions Kokkos, please cite the following paper: @article{CarterEdwards20143202, title = "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns ", journal = "Journal of Parallel and Distributed Computing ", volume = "74", number = "12", pages = "3202 - 3216", year = "2014", note = "Domain-Specific Languages and High-Level Frameworks for High-Performance Computing ", issn = "0743-7315", doi = "https://doi.org/10.1016/j.jpdc.2014.07.003", url = "http://www.sciencedirect.com/science/article/pii/S0743731514001257", author = "H. Carter Edwards and Christian R. Trott and Daniel Sunderland" }
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.