Coder Social home page Coder Social logo

passlab / rexompiler Goto Github PK

View Code? Open in Web Editor NEW
0.0 5.0 0.0 579.28 MB

REX OpenMP Compiler

Home Page: https://passlab.github.io/rexompiler/

License: Other

CMake 0.59% Rich Text Format 0.03% Makefile 1.00% M4 0.59% Shell 1.11% Python 0.19% C++ 16.82% Assembly 0.01% Perl 0.99% C 71.27% TeX 0.49% HTML 0.09% Roff 0.13% Ruby 0.02% C# 0.07% Pascal 0.08% Fortran 6.30% Java 0.08% LLVM 0.07% Yacc 0.07%

rexompiler's Introduction

ROSE Compiler

ROSE is an open source compiler infrastructure to build source-to-source program transformation and analysis tools for large-scale Fortran 77/95/2003, C, C++, OpenMP, and UPC applications. The intended users of ROSE could be either experienced compiler researchers or library and tool developers who may have minimal compiler experience. ROSE is particularly well suited for building custom tools for static analysis, program optimization, arbitrary program transformation, domain-specific optimizations, complex loop optimizations, performance analysis, and cyber-security.

http://www.rosecompiler.org/

Why Use ROSE

ROSE is not grep, sed, LLVM, or a Perl script. A ROSE Tool uses the ROSE compiler-based infrastructure to parse code into a complete Abstract Syntax Tree (AST). The AST contains all the syntax and semantic information in the original code, and has a rich API supporting sophisticated analysis and transformations. The ROSE Tool queries the AST and reports on and/or changes the AST, then may emit new code from the AST. All ROSE Tools can thus precisely replicate the parsing and semantic analysis behaviour of multiple compiler languages, vendors, and versions. New ROSE Tools can quickly be created by customers or by the ROSE Team. ROSE is open-source, and is portable across a large and expanding range of platforms. ROSE Tools can process large code bases, and the ROSE infrastructure and ROSE Tool collection are continuously upgraded and extended by the LLNL ROSE Team and outside contributors.

Installation Instructions

From the source tree run ./build. Then navigate to your build tree and run configure and make.

../src/configure --prefix=/path/for/ROSE/install \
                 --enable-languages=c,c++ \
                 --with-boost=/path/to/boost/install
make -j${NUM_PROCESSORS}
make install -j${NUM_PROCESSORS}
make check -j${NUM_PROCESSORS}

For Ubuntu 18.04, we have experimental support for installating ROSE pre-built binaries packages using apt-get

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:rosecompiler/rose-development # Replace rose-development with rose-stable for release version
sudo apt-get install rose
sudo apt-get install rose-tools # Optional: Installs ROSE tools in addition to ROSE Core

For full install instructions go to https://github.com/rose-compiler/rose/wiki/How-to-Set-Up-ROSE

ROSE Directories

  • src: all source code for ROSE
  • tests: several subdirectories of test codes for ROSE.
  • tools: usable feature complete tools
  • projects: in development and incomplete tools
  • tutorial: exmples of ROSE features
  • docs: files for bulding the documentation

Documentation

For more information about ROSE and how to use it visit the github wiki at https://github.com/rose-compiler/rose/wiki

The ROSE API can be found at http://doxygen.rosecompiler.org. The API can also be made locally by going to cd $ROSE_BUILD/docs/Rose and runnig make doxygen_docs. The html pages can then be found in ${ROSE_BUILD}/docs/Rose/ROSE_WebPages and can be easily viewed by pointing your browser at ${ROSE_BUILD}/docs/Rose/ROSE_WebPages.

rexompiler's People

Watchers

 avatar  avatar  avatar  avatar  avatar

rexompiler's Issues

enums for the same OpenMP constants are defined in both ompparser and ROSETTA/SageIII

Please check this file:
https://github.com/passlab/rexompiler/blob/master/src/ROSETTA/src/node.C#L1925
For example, to initialize the IF clause, we specify a parameter with data type and variable name. The issue is ROSE only supports a limited number of data types to generate those huge CXX files. Even the very common std::vector can't be used directly as a parameter.

Those supported data types are defined in this file:
https://github.com/passlab/rexompiler/blob/master/src/ROSETTA/src/buildStorageClasses.C#L1664
This function "string AstNodeClass::buildSourceForIRNodeStorageClassConstructor ()" will encouter the assert failure if any new data type is used to declare the node parameter. Some other functions may be also involved.

Unparsing is more straight-forward.
https://github.com/passlab/rexompiler/blob/master/src/backend/unparser/languageIndependenceSupport/unparseLanguageIndependentConstructs.C

Another place need to be updated I know is omplowering.
https://github.com/passlab/rexompiler/blob/master/src/midend/programTransformation/ompLowering/omp_lowering.cpp
It may also perform some specific transformation based on the enum, such as the scheduling policy.

Overall, I suggest we do not replace the existing enums unless they can't meet our requirements for some reason.

Best regards

Anjia Wang

On Thu, Nov 12, 2020, at 9:38 AM, Yonghong Yan wrote:

Anjia,

Can you give a source file URL for the builder and the unparsing that
use those enum? I want to see how much they are using and the amount
of work we need to refactor them.

On Thu, Nov 12, 2020 at 9:31 AM Anjia Wang [email protected] wrote:

Hello Dr. Yan,

Those enums are required as a parameter to create the corresponding SgOmpClause node. The ROSE builder functions only recognize the enums defined in the file Suport.code but not ours.

Another reason is the unparsing module in ROSE is based on the enums defined in Support.code as well. If we use our own enum set, that module has to be refactored.

Best regards

Anjia Wang

On Wed, Nov 11, 2020, at 2:45 PM, Yonghong Yan wrote:

What is the reason of redefining those enum in
https://github.com/passlab/rexompiler/blob/master/src/ROSETTA/Grammar/Support.code#L47
and then convert those definition in ompAstConstruction.cpp from
https://github.com/passlab/rexompiler/blob/master/src/frontend/SageIII/ompAstConstruction.cpp#L698.

Why cannot we use those defined in
https://github.com/passlab/ompparser/blob/d4b3f880a7fed899a9e1261620eaedf8ca6a8245/src/OpenMPKinds.h
directly?

Adopting cmake build system and remove autotools

Not sure how cmake support is ready or not for ROSE. If we want to keep cmake, we need to migrate ROSE-based cmake files for REX cmake files. Lots of things need to be removed. This is low-priority.

Organize the tests for the compiler development

Nov 2020 meeting notes:

Xinyao: for test cases, we organize them as one folder for each directive that has the cases/reference that test the directive.
https://docs.google.com/spreadsheets/d/1u_5snMD2u61PKOszYGJBdMeYA2x73cEFrcHgGP7rXXY/edit?usp=sharing

Xinyao/Anjia/Yonghong: Testing reorganization: one folder per directive. The folder contains test cases, Makefile, reference of each case, etc. test case and their name MUST be properly named so we can find the two easily. E.g. parallel.c, parallel_reference.c. We need a good Makefile in the top level and then in each folder so we can just list the case and their reference file in the Makefile. Check https://github.com/OpenMPToolsInterface/llvm-project/tree/openmp5-gpu/openmp/libomptarget/omptests for how OpenMP LLVM runtime organizes their tests. Create Makefiles structure first based on the llvm openmp in the link, and then experiment with our test cases/reference. Check ROSE test Makefile.am to see how they are using Makefile for that. [A new testing system is implemented. https://github.com/passlab/rexomptest/tree/main/REXCompiler/omptests]

To see the code: https://github.com/passlab/rexomptest/tree/test_xinyao
To see the list of all test files(recording the process):
https://docs.google.com/spreadsheets/d/1u_5snMD2u61PKOszYGJBdMeYA2x73cEFrcHgGP7rXXY/edit?usp=sharing

For each test cases, we need two references, one for parsing and unparsing, and the other for transformation. Organize the existing test first.

ROSE/REX compiler should not allow noncontiguous data mapping

According to the OpenMP specs 5.0, for data mapping, the array section must specify a contiguous storage.

For example, given an integer array a[100][50], map(to: a[5][13:27]) is valid, but map(to: a[5:14][13:27]) is not. Currently, REX compiler consider the latter valid and is able to perform the transformation. Clang/LLVM 10.x will throw an error that error: array section does not specify contiguous storage.

We may need to implement a function to check whether the array section is valid during transformation.

OpenMP: Loop: Worksharing-Loop

The syntax of the worksharing-loop construct is as follows

#pragma omp for [clause[ [,] clause] ... ] new-line
loop-nest

where loop-nest is a canonical loop nest and clause is one of the following:

private(list)
firstprivate(list)
lastprivate([lastprivate-modifier:]list)
linear(list[:linear-step])
reduction([reduction-modifier,]reduction-identifier:list)
schedule([modifier [, modifier]:]kind[, chunk_size])
collapse(n)
ordered[(n)]
nowait
allocate([allocator:]list)
order([order-modifier:]concurrent)

XOMP for worksharing-loop are currently as follows:


APIs of kmp and gomp for parallel directive

  • GOMP
  • KMP

Handling of clauses

  • private(list)
  • firstprivate(list)
  • lastprivate([lastprivate-modifier:]list)
  • linear(list[:linear-step])
  • reduction([reduction-modifier,]reduction-identifier:list)
  • schedule([modifier [, modifier]:]kind[, chunk_size])
  • collapse(n)
  • ordered[(n)]
  • nowait
  • allocate([allocator:]list)
  • order([order-modifier:]concurrent)

Compilation

Tests

Add ompparser as git submodule under frontend/SageIII/astOmpConstruction

In this way, we do not need to include ompparser as a library dependency of the compiler. By default, git clone will pull ompparser files into that folder and we will also need to add Makefile/CMakeList.txt to make it build into librose.so. This is no priority, put here so we know we need to fix this later.

Support for depend iterator

The grammar of iterator modifier is shown as below:
iterator([ iterator-type ] identifier=begin:end[:step][,......])
I tried to use the storage format like " std::map<SgSymbol*, std::vector < SgExpression*> >"
But this format is not supported now in buildStorageClasses.C
Screenshot from 2020-10-21 00-08-06
TO make it work, I use the storage format like " std::map<SgSymbol*, std::vector < std::pair <SgExpression*, SgExpression*> > >", which means that one of the SgExpression* in the pair is useless.
The best way is to support new format in REX, but it is complex and I need some advice on where to start.

declare mapper in REX expression parsing and transformation

Declare mapper should start a new lexical score for the mapper clause and the map clause, see below example from OpenMP example document. Right now, we treat a declare mapper to declare a function, the mapper-identifier if provided can be considered the function name. To reduce name conflict possibility, we can rename it as __omp_declare_mapper_mapper-identifier[-type]. If it is not provided, we will use certain naming convention to create one. So a function should be declared as place holder for the future transformation and the body of the function can be just the rest of the declare mapper directive. By dong this, the use of mapper later by the map clause can find the symbol for the declared mapper.

Even for implementation, if mapper-identifier is not provided, since mapper is more like a function declaration and a symbol table entry should be established to map the mapped type (myvect_t) to the mapper. E.g. the mapper is transformed to a function declaration called declare_mapper_myvect_t and the symbol table entry is <myvect_t declare_mapper_myvect_t>. We may need to discuss more on the naming to make sure it is not easy to create naming conflict situation.

typedef struct myvec{
  size_t len;
  double *data;
} myvec_t;

#pragma omp declare mapper(myvect_t_mymapper: myvec_t v) \
       map(v, v.data[0:v.len])

int main( ... ) {

#pragma omp target map(mapper(myvect_t_mymapper), ...

}

More example: https://github.com/OpenMP/Examples/blob/main/devices/sources/target_mapper.2.c

another example for which the declare mapper is within a function

int foo() {
  struct myvec{
    size_t len;
    double *data;
  };

 #pragma omp declare mapper(myvect_t_mymapper: struct myvec v) \
       map(v, v.data[0:v.len])

 ...
} 

We can do function declaration for a mapper that is in the global scope. For a declare mapper within a function, we can do struct declaration (map clause will be transformed to a function pointer (TBD)). Right now, we will use struct declaration for mapper in both global and func scope.

Add LLVM runtime flag to REX

In the omp lowering test, it requires a specific OpenMP runtime library to test the execution. Otherwise, only the compilation and transformation will be tested. Currently, due to lacking this information, we don't test the execution automatically.

While building REX, we could add a flag to specify the path to the LLVM OpenMP runtime library, such as --with-llvm-omp-library=${LLVM_PATH}/lib.

ROSE Pruning for REX

This is a contiguous issue of pruning the original ROSE for our REX compiler. Below is the note so far we kept for this issue. The work for cmake build system also includes some pruning work (#2). The branch for this issue is rexpruning

Things that are removed

roseHPCToolkitTests, HPCToolkit-related, FailSafe-related

under roseExtension:
failSafe
sqlite3x
dataStructureTraversal
highLevelGrammar
roseHPCToolkit
qtWidgets

Aterm, QTlibs, FLang, COBOL, SQLITE3/MYSQL, roseSupport/utility_functionsImpl.C, src/roseIndependentSupport, qrose, libroseSqlite3xConnection(?)

ROSE supports several programming languages, such as PHP, Python, Java, X10, binary analysis. Our first step is to trim down ROSE source tree to include only sources for C/C++ and Fortran. We focus on the core source tree first (those in src folder).

  1. Make a full build/configure/make of the source tree with C/C++/Fortran/CUDA/OpenCL language support. The logs files for each of the three steps are recorded to understand the workflow of each building steps and their dependencies.

  2. The pruning starts with removing the those sources folder that are not needed from the building process. Sources files and folder are not removed at this point, and we only need to modify Makefile.am. After go though those Makefile.am in src folder, lots of building target and environment settings that are used for supporting non-C/C++/Fortran languages are removed. Target and environment settings for several folders in Makefile.am building are removed as well, particularly those under 3rdPartyLibraries, roseAPI, roseExtensions, roseIndependentSupport, roseSupport, and uti. Setting for building those sources that are used by Qing Yi's work are completely removed. Building logs for both a full building and the pruned building are in rose-building-logs folder, including:

  3. After inspecting the two make output, we found out that at least the following binary analysis are also needed to link with others to create librose.so. (like this: diff -u 'cat rose-make.out | grep CXXLD' 'cat rose-make.out | grep CXXLD')

    • CXXLD libroseDisassemblers.la
    • CXXLD libroseBinaryFormats.la
    • CXXLD libroseBinaryLoader.la
    • CXXLD libPartitioner.la
    • CXXLD libbinaryMidend.la

So those lib needs to put back to the build systems in order to generate the correct librose.so. At this stage, we want our pruning pass ROSE test and then we will move on to the next step of removing the dependencies of those binary analysis lib. Removing those lib will most likely involve code changes to at least ROSETTA/Grammar/Node.code#L1-L200 that has lots of code for SgAsm*.

Commits related to the pruning

  1. f7c367a
  2. bb013f9
  3. ac451be
  4. ad0b5c3
  5. ecdf47d
  6. 25bd734
  7. be2e65f
  8. 149de56
  9. 6e6d4b9
  10. e4561c5
  11. 023b24a
  12. c12eae2
  13. 709a3a4

DECLARE SIMD LINEAR reports undeclared variable incorrectly

#pragma omp declare simd linear(p:1)
#pragma omp declare simd uniform(p)
#pragma omp declare simd simdlen(1) notinbranch
float bar2(int * p) {
*p = *p +10;
return *p;
}

The test above failed because the linear clause here searched the the current symbol table directly instead of looking up p in the followed function. There are a few types of clauses in the DECLARE SIMD directive having a similar issue.
We need to check this case while adding the new clauses. In the original AST constructor, ROSE has this checking already. It can be copied and modified to fit the ComplexClause class.

Array section is not supported in REX compiler

It's supported in the internal parser of ROSE.

/* depend( array1[i][k], array2[p][l]), real array references in the list */
variable_exp_list : postfix_expr {
if (!arraySection) // regular array or scalar references: we add the entire array reference to the variable list
if (!addVarExp((SgExpression*)$1)) YYABORT;
array_symbol = NULL; //reset array symbol when done.
}
| variable_exp_list ',' postfix_expr
{
if (!arraySection)
if (!addVarExp((SgExpression*)$3)) YYABORT;
}
;
/* map (array[lower:length][lower:length]) , not array references, but array section notations */
map_variable_list : id_expression_opt_dimension
| map_variable_list ',' id_expression_opt_dimension
;
/* mapped variables may have optional dimension information */
id_expression_opt_dimension: ID_EXPRESSION { if (!addVar((const char*)$1)) YYABORT; } dimension_field_optseq
;
/* Parse optional dimension information associated with map(a[0:n][0:m]) Liao 1/22/2013 */
dimension_field_optseq: /* empty */
| dimension_field_seq
;
/* sequence of dimension fields */
dimension_field_seq : dimension_field
| dimension_field_seq dimension_field
;
dimension_field: '[' expression {lower_exp = current_exp; }
':' expression { length_exp = current_exp;
assert (array_symbol != NULL);
SgType* t = array_symbol->get_type();
bool isPointer= (isSgPointerType(t) != NULL );
bool isArray= (isSgArrayType(t) != NULL);
if (!isPointer && ! isArray )
{
std::cerr<<"Error. ompparser.yy expects a pointer or array type."<<std::endl;
std::cerr<<"while seeing "<<t->class_name()<<std::endl;
}
ompattribute->array_dimensions[array_symbol].push_back( std::make_pair (lower_exp, length_exp));
}
']'
;

When we prune the internal parser to an expression parser, this section is removed as well. We should add the code back and remove its dependency on OmpAttribute.
https://github.com/passlab/rexompiler/blob/dev/src/frontend/SageIII/expression_parser.yy

Parsing expression for referencing struct field using . or ->

Need expression parsing for struct/class filed access using . or -> in OpenMP directive/clauses. Check grammar https://www.lysator.liu.se/c/ANSI-C-grammar-y.html and https://github.com/antlr/grammars-v4/blob/master/c/C.g4 and add to

postfix_expr:primary_expr {
rule

Add test case.

Check OpenMP spec or other compiler to see which directive/clause to allow for using A.b or A->b kind of expression.

Outlined functions for CPU and GPU are generated to the same file

We use the ROSE outliner API to generate the outlined functions to a new file. The kernels for CPU and GPU are stored in separate files.

SgSourceFile* Outliner::getLibSourceFile(SgBasicBlock* target) {

If there are both omp parallel for and omp target parallel for in the input, there should be two outlined function files. However, the current code only produces one .cu file to hold all the outlined functions.

It seems that two outliner API calls return the same file pointer so that all the functions operations happen to the same file. It may be caused by the outliner implementation. If so, we can implement a similar helper function within omp_lowering.cpp to create the new file.

OpenMP: parallel Construct

The syntax of the parallel construct is as follows:

#pragma omp parallel [clause[ [,] clause] ... ] new-line
structured-block

where clause is one of the following:

if([ parallel :] scalar-expression)
num_threads(integer-expression)
default(data-sharing-attribute)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction([reduction-modifier ,] reduction-identifier : list)
proc_bind(affinity-policy)
allocate([allocator :] list)

where affinity-policy is one of the following:

primary
master [deprecated]
close
spread

XOMP for parallel directive are currently as follows:

extern void XOMP_parallel_start (void (*func) (void *), void *data, unsigned ifClauseValue, unsigned numThreadsSpecified, char* file_name, int line_no);
extern void XOMP_parallel_end (char* file_name, int line_no);

APIs of kmp and gomp for parallel directive

  • GOMP
  • KMP

Handling of clauses

  • if([ parallel :] scalar-expression)
  • num_threads(integer-expression)
  • default(data-sharing-attribute)
  • private(list)
  • firstprivate(list)
  • shared(list)
  • copyin(list)
  • reduction([reduction-modifier ,] reduction-identifier : list)
  • proc_bind(affinity-policy)
  • allocate([allocator :] list)

Compilation

Tests

Combined clause processing in REX

Should we keep the combined clause info, do we need a SageIII code for combined construct or just decompose it as it converts from ompparser IR to Sage IR.

Or we decompose the combined directive, but keep the information in the node using either a counter or a vector of the nodes of the directives.

We can also create a Sage node for all the combined constructs, but no clause. In the sage node, keep a enum variable to identify which combined directive this node is. The node will be the parent of all the nodes of the decomposed directives.

REX doesn't support multiple source file for omp lowering

The current implementation of omp lowering stores all the outlined functions in separate files. It works for a single input file, but not for the input with multiple source files. In that case, REX compiler will complain the source file can't be NULL and the compilation fails. If we store the outlined functions in the main file, everything is fine.

We use ROSE API Outliner::getLibSourceFile to create the skeleton of outlined function file and then make some customization.

Move AST constructor to a subfolder

To better organize the SageIII source code, we decide to move the source code related to AST constructor to a new subfolder astOmpConstruction, including ompAstConstruction.cpp, ompAstConstruction.h, expression_parser.yy, expression_lexer.ll, and maybe some other files.

The file https://github.com/passlab/rexompiler/blob/dev/src/frontend/SageIII/Makefile.am needs to be updated. Inside the new subfolder astOmpConstruction, another Makefile.am may be required as well.

ROSE may incorrectly consider an array as scalar variable

When we specify an array section in the map clause, the compiler should know the symbol is not just a pointer but an array.
For example, given map(to: x[0:n]) map(from: y[0:n]), x and y are both array. However, in the AXPY example, REX compiler considered x as a scalar. y is correctly recognized.

void axpy_ompacc(REAL* x, REAL* y, int n, REAL a) {
  int i;
#pragma omp target map(tofrom: y[0:n]) map(to: x[0:n],a,n)
#pragma omp parallel for shared(x, y, n, a) private(i)
  for (i = 0; i < n; i++)
    y[i] += a * x[i];

}

Register GPU offloading entries for multiple source files

We have supported the registration for GPU offloading entries in a single source file. A code example can be checked here.
However, when the input contains multiple source files, there are more things to concern.

1. Registration Methods

There are two approaches to register the GPU offloading entries. One is to register them all before entering the main function and the other one is register them on-demand.

1.1 Proactive registration

This approach is to use GNU function attributes __attribute__((constructor)) to register all the offloading entries while starting the program regardless whether the offloading functions will be actually used or not. Each file has it own registration bootstrap code. The linker will pick up all the register/unregister bootstrap code from all files and execute them before/after main function. The advantage is that the registration time won't be counted towards to the kernel execution. The code logic is cleaner because the registration happens in one place. The original kernel code is not changed.

Since the device image is placed in a continuous memory location to host all the entries, we may need to collect the entries from all files together first and then register them.

1.2 Lazy registration

Another way to register the offloading entries is registration on-demand. Like lazy loading, we don't register the entries in advance. Instead, only when the function __tgt_target_teams is about to use some offloading entries for the first time, it registers them and leaves other entries untouched. It will save some memory because unused entries won't be loaded. The drawback is that the time cost of registration will be added to the computing time. It involves I/O operations that could be slow, especially when the computing kernel itself is not that complex. For example, AXPY kernel costs about 77 ms to compute 8 million elements on Carina. It takes additional 55 ms to register the kernel. However, if the kernel computing time is very high, this registration overhead could be ignored since it's constant and won't increase proportional to the problem size.

For implementation, a flag could be used to indicate whether an entry has been registered or not. Before executing the kernel, we need to check the flag and operate accordingly.

OpenMP GPU offloading needs two outlined functions

We generate one outlined function for the parallel for directive and use __kmpc_fork_call to call the outlined function. ROSE does the similar thing for both parallel for and target parallel for directives.
However, to transform the target parallel for directive , there are three steps:

  1. Convert the target parallel for region to an outlined function as usual, which is a CUDA kernel.
  2. Create a wrapper function on the host to set up the environment and call __tgt_target_teams. __tgt_target_teams will call the CUDA kernel internally.
  3. Call the wrapper function in the original source code. Users do not directly use the CUDA kernel function.

According to the procedure above, unlike the transformation using __kmpc_fork_call, we have to implement extra steps for OpenMP GPU offloading to generate two new functions.
In the on-going implementation for a simple case, there's no data mapping involved. Later, when we implement the map clause, the data mapping happens in the wrapper function. We probably need to revise the current transformation in ROSE a lot to move the data mapping from the host to the wrapper function.

The wrapper function cannot be stored together with the CUDA kernel file since the latter is designed exclusively for the .cubin file. We may need to create another file to store this kind of wrapper functions. This wrapper file is different from the CPU outlined function file even they are both executed on the host.

A simple case is shown as follows.

Original function:
foo.c:

void foo() {
int i;
#pragma omp target
#pragma omp parallel for
  for (i = 0; i < 10; i++)
    printf("Test\n");
}

ROSE transformation:
rose_foo.c:

void foo() {
 int i;
/* prepare CUDA variables */
...
OUT__1__5328__main__7__<<<_num_blocks_,_threads_per_block_>>>();
}
__global__ void OUT__1__5328__main__7__() {
/* prepare variables */
...
  XOMP_static_sched_init(...);
  while(XOMP_static_sched_next(...))
    /* for loop */
} 

REX transformation:
rose_foo.c:

void foo() {
int i;
/* call the wrapper function */
OUT__1__5328__main__7__(...);

rex_lib_cuda_driver_foo.c

void OUT__1__5328__main__7__(...) {
/* prepare variables */
...
// call the cuda kernel
  __tgt_target_teams(...);

rex_lib_cuda_foo.cu

__global__ void OUT__1__5328__main__7__kernel() {
/* prepare variables */
...
/* computation */
} 

Use a new file to store the outlined function for the device (CPU or GPU)

According to Dr. Yan's study, to transform the target directive, we need a separate .cu file to store the CUDA kernel function. In the original file, no CUDA code is involved.

Currently, ROSE stores the outlined function in the same source file. There could be two solutions:

  1. Manually create a new file, copy the relevant AST there.
  2. ROSE may have built-in functions to store the outlined function in a new file.

These are only rough ideas. More details of ROSE source code should be explored.

Function call to CPU outlined kernel may take wrong number of parameters

The generated kernel accepts correct parameters. however, the function call may think the kernel doesn't need any parameters and pass a NULL pointer instead.

OpenMP input:

#pragma omp parallel firstprivate(i)
    assert(i == 100);

Function call to the kernel:

  void *__out_argv2__5948__[1];
  __out_argv2__5948__[0] = ((void *)(&i));
  __kmpc_fork_call(0,0,OUT__2__5948__main__8__,0);

Besides missing the parameter i, the wrapper declaration should not appear here given that the wrapper has been disabled.

Tup build system to be removed?

Tup is a new build system that it seems ROSE is trying. I recommend to remove it since 1), it is not yet completed. 2) we focus on OpenMP implementation, so GNU autotools serves the purposes and we do not need to spend time on migrating or maintaining another build system. 3) I read Tup document and examples, I do not like the syntax target and dependency and command are specified. It might bring more features in, but not easy to read for those rules. For descriptive-based syntax that mostly user read, it should be easy to read.

Different compilers could generate transformed code in a different format

Among our CI configurations, GCC 7 on Ubuntu 18.04 and GCC 9 on Ubuntu 20.04 generates identical transformed code. However, GCC 5 on Ubuntu 16.04 could produce different ones. So far, it has nothing to do with omp lowering but general C code.

According to the failed CI test, lowering tests for parallel.c and parallel-reduction.c showed inconsistent results between the configurations. They are equivalent in semantic but written differently.

  1. parallel.c

The upper part is generated by GCC 5 on Ubuntu 16.04. The affected code is the assertion.

https://github.com/passlab/rexompiler/blob/master/tests/nonsmoke/functional/CompileTests/OpenMP_tests/parallel.c

<   _p_i == 100?((void )0) : __assert_fail("i == 100","parallel.c",9,__PRETTY_FUNCTION__);
<    *j == 77?((void )0) : __assert_fail("j == 77","parallel.c",10,__PRETTY_FUNCTION__);
---
>   (((void )(sizeof(((_p_i == 100?1 : 0))))) , ((
> {
>     if (_p_i == 100) 
>       ;
>      else 
>       __assert_fail("i == 100","parallel.c",9,__PRETTY_FUNCTION__);
>   })));
>   (((void )(sizeof((( *j == 77?1 : 0))))) , ((
> {
>     if ( *j == 77) 
>       ;
>      else 
>       __assert_fail("j == 77","parallel.c",10,__PRETTY_FUNCTION__);
>   })));
  1. parallel-reduction.c

https://github.com/passlab/rexompiler/blob/master/tests/nonsmoke/functional/CompileTests/OpenMP_tests/parallel-reduction.c

<   sum == i * thread_num + 100?((void )0) : __assert_fail("sum == (i*thread_num + 100)","parallel-reduction.c",18,__PRETTY_FUNCTION__);
---
>   (((void )(sizeof(((sum == i * thread_num + 100?1 : 0))))) , ((
> {
>     if (sum == i * thread_num + 100) 
>       ;
>      else 
>       __assert_fail("sum == (i*thread_num + 100)","parallel-reduction.c",18,__PRETTY_FUNCTION__);
>   })));

There are a few options:

1. Make the comparison of lowering code optional

If users test REX compiler on the same two configurations, they can enable this option and expect all the tests are passed. Otherwise, they may end with a different lowering code, which doesn't mean the code is incorrect.
In our CI tests, the configuration of GCC 5 on Ubuntu 16.04 will not enable this comparison.

2. Keep the comparison as the default option and omit the support to GCC 5 on Ubuntu 16.04

We can explore how GCC 7+ works on Ubuntu 16.04.

3. Deprecate the support to Ubuntu 16.04 completely

The public official support of Ubuntu 16.04 has ended for quite a while, it makes sense to give up this OS. However, the upstream ROSE still supports the older versions. We may want to keep consistent with ROSE.

I recommend option 1 because users' compilers may behave differently anyway. If the comparison is required for testing, they may encounter false errors due to the format difference.

REX doesn't support implicit data mapping

LLVM can transfer data for GPU offloading even if they are not specified in the map clause. REX requires that all the variables used for GPU offloading must be explicitly mapped.

OpenMP input:

  int x = 100;
  int j;
#pragma omp target
#pragma omp parallel for
  for (j = 0; j < 4; j++)
      printf("x = %d\n", x);

Even x is not explicitly mapped, LLVM can compile the code and get the correct output as expected.
REX will consider no parameters need to be passed to the kernel since there's no map clause.

ROSE does not have an API to generate GNU function attribute code

To let a function to execute when the main function is started and exited, we need to use GNU function attribute constructor/destructor. However, ROSE does not support this syntax.
According to the ROSE official document, a function declaration supports a few other attribute syntax like section. However, even for this attribute, ROSE doesn't actually perform any transformation.

We may need to implement such kind of support from scratch in REX compiler including SgNode and corresponding transformation.

expression parsing for ompparser integration

We need a new set of grammar file (.ll and .yy) for bison and flex to parse expressions passed by ompparser. This can be done by starting deleting OMP-related token and grammar from omplexer.ll and ompparser.yy files from src/frontend/SageIII folder. The implementation should then export an interface, e.g. SgExpression * parse_expression( char* ) that is called by OmpAstConstruction when converting ompparser IR to Sage III IR.

Create a new branch for this integration.

Two problems left in DEPEND clause

  1. Using list to storage iterator in DEPEND clause. This problem will affect the transformation later. When doing transformation, we need to store the variable in the symbol table. Which means for example int a = 1:100:2, a should be a SgSymbol, instead of SgExpression in the list.
  2. Format like A[i][j] should be support in the clause list. Now it is not supported.

target kernel code generation

OpenMP target with nested parallel/for could be very complicated:

#pragma omp target // (1 team and 1 thread per team)
{
          printf("position 1: master team, and master thread %d/%d, %d/%d\n", omp_get_team_num(), omp_get_num_teams(), 
          omp_get_thread_num(), omp_get_num_threads());

          #pragma omp parallel num_threads(16)// (1 team and 16 threads) 
          {
                 printf("position 2: team, and thread %d/%d, %d/%d\n", omp_get_team_num(), omp_get_num_teams(), 
                 omp_get_thread_num(), omp_get_num_threads());

                #pragma omp for schedule(dynamic:4)
                for (i=0; i<N; i++)
          }

          ...

         #pragma omp teams num_teams (4) //(4 teams)
         #pragma omp parallel num_threads(8)
         {
                 printf("position 3: team, and thread %d/%d, %d/%d\n", omp_get_team_num(), omp_get_num_teams(), 
                 omp_get_thread_num(), omp_get_num_threads());

                #pragma omp for 
                for(i=0; i<N; i++) 
          }

         ....

}

TODO:

  1. create an OpenMP program with the above use of target and teams, and then manually convert it to use CUDA dynamic parallelism. Check OpenMP example doc and they should have some source code for you to start.
  2. implement the CUDA device version of omp_get_thread_num, omp_get_num_threads, omp_get_team_num, omp_get_num_teams.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.