mit-crpg / openmoc Goto Github PK

A method of characteristics code for nuclear reactor physics calculations.

Home Page: https://mit-crpg.github.io/OpenMOC/

License: Other

Python 26.02% C 3.46% Shell 0.09% C++ 46.33% Cuda 2.10% Makefile 0.18% Jupyter Notebook 13.82% CMake 0.02% Dockerfile 0.02% SWIG 7.97%

openmoc's People

Stargazers

Watchers

Forkers

alexyou dvtranmit jasmeet107 geogunow spavlick nathangibson14 samuelshaner wbinventor theshuo luisarose soft-lean mjlong augusto-vib katyhuff chirayubatra ivunchata world2005 cjosey jtramm softowaha shikhar413 johnnyliu27 giorgiobornia friedmud geoffmomin ljeure zechzhao qingminghe tenorbert harterj giudgiud python3pkg tjlaboss nelsonag jamiepg4 excelwords xiashaopeng hanzhuoran wuwenbin2006 corps-g zhanzhichao hbcbh1999 nucg1gb micahgale gridley jelares hturner08 xycjscs joshua-sexton azrael77 razianushrat ai-pranto ternence6 otmanjai senoritasarker lishunde edj19 brother-zhang nhatkhang yuguangyuan studymcnpandgeant raven-pro florenciaren liangzhaoyuthu loganharbour wcdawn sjtao98 liu15388282385 liuyuansharp kevinm387 ahnaf-tahmid-chowdhury zhouyi-u johanvx thubaiqi lijiethu hunterbelanger youyeonc m2-9 goul-tard pshriwise

openmoc's Issues

Cache coherency for track segments

Add a "void alignTrackSegments()" class method to the Track class. This method will be called for all tracks following segmentation, and will allocate arrays in which all track segments can be adjacently stored in memory. Finally, the method will cast the array as a std::vector<segment*> and replace the "_segments" Track class attribute with the new array.

PAPI: compile time options

Include option to include PAPI during compile time

e.g. --with-papi passed to build system

Speedup Python modules

Use Cython to speedup Python plotting routines.

setCrossSectionByGroup uses 0-based group numbering

All the setCrossSectionByGroup type routines in Material.cpp use 0-based array indexing for the energy group. In other spots in the code, groups are indexed with 1-based indexing.

I can add -1's all over the place myself no problem, but I'm not sure if there are sample inputs (or real inputs?) that would break if we made this consistent.

CMFD segmentation fault for c5g7-cmfd.py

The segmentation fault occurs with the default relaxation factor and mesh level. Using GDB, the segmentation fault appears to occur in the Mesh::splitCorners() method.

Create a suite of benchmark problems

This may be a decent way to ramp up the BEAVRS UROP team to creating the BEAVRS 2D benchmark for OpenMOC.

Possible benchmarks include:

4x4 BWR lattice (Hong, Cho, 1998)
CANDU 6 pin cell (Marleau, DRAGON code, 2006)
HTTR hex lattice (Zhang, Rahnema, 2011)

User's Guide documentation

-How to write "input" files
-Brief description of each Solver type
-Reference OpenMOC API
-FAQ

Improve pin power computation routine(s)

Pin power computation was a hack a year ago to validate the C5G7 benchmark. It needs to be updated and the code made more readable.

In particular, the Solver classes should compute the power (fission rate) in each FSR after the source is converged. The Geometry::computePinPowers(...) Geometry::computePinPowersInUnviverse(...) methods should be extracted in the openmoc.process Python module.

The recursive algorithm used to collect FSR powers into pin powers can be retained, but it should be made more transparent to new developers. In addition, the routine should store the pin powers in an HDF5 file with tags specifiying the lattice ID, lattice cell coordinates, etc for easy data retrieval. Finally, the openmoc.plotter module should be extended with a routine to plot pin powers.

HDF5 reader/writer for performance metrics

Write a h5py-enablied Python module to read and write performance data, including keff/iteration, residual/iteration, timer data, # segments, # tracks, # FSRs, # angles, # energy groups, etc. Support for ASCII datafiles for users without h5py.

Remove color colorbar from cells, FSRs, and materials plots

This bug showed up after switching from matplotlib's pcolor routine to the imshow routine for speedy plotting.

Optimize VectorizedSolver class

The VectorizedSolver class (and its child VectorizedPrivateSolver) are inordinately slower than the non-vectorized solvers. This class should be faster than the corresponding CPUSolver and ThreadPrivateSolver classes, but it is not. We need to use PAPI profiling to figure out the percentage of vector instructions to total instructions issued, and determine what the bottleneck on performance is.

Python issues building on a mac

When trying to install OpenMOC on a mac, I ran into some issues with multiple versions of python being installed on my computer. All macs come with a "system" version of python that has been built with gcc 4.2 (as of early 2013). gcc 4.2 does not support the C++ 2011 standard, which is needed to install OpenMOC, so I first downloaded gcc 4.8 with MacPorts. Subsequently, I downloaded the newest version of python with MacPorts and all the required modules (setuptools, matplotlib, numpy, etc.) and linked them with the MacPorts python. From there I ran into several issues installing OpenMOC:

g++ would not recognize the -soname flag and I got the following error:

ld: unknown option: -soname
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

I followed the directions from various forums and replaced -soname with -dylib_install_name and -shared with -dynamiclib. Also, as suggested, I added the -lpython flag to the linker flags.

After solving this issue, the code was able to compile, but when I tried to run an input file (or just import openmoc into python), python erred out with:

Fatal Python error: PyThreadState_Get: no current thread
Abort trap: 6

An good description of what is occurring is described here [http://stackoverflow.com/questions/15678153/homebrew-python-on-mac-os-x-10-8-fatal-python-error-pythreadstate-get-no-cu]. The macports version of python is being used to execute setup.py, but g++ is finding the python library associated with the system version of python while linking. This was remedied by adding a path to the macports python library file. Also, this library file was named libpython2.7.dylib, so the -lpython flag was changed to -lpython2.7.

Lesson to be learned: Compiling OpenMOC to a python module on a mac requires different compiler flags. Also, be cautious of what python installation is being used to compile OpenMOC. While one version of python might be used to call setup.py, the modules or libraries associated with another version can be included, which will result in the code erring out either during compilation or execution. If you get a fatal python error go into OpenMOC/build/lib and enter the terminal command:

otool -L _openmoc.so

This will tell you which python binary is being associated with the openmoc module. If this binary is not the same as the python binary being used in running OpenMOC, the code will error out. Also be aware that python binaries typically begin with a capital "P" while the python executable begins with a lowercase "p".

Installation Instructions (missing git 'clone')

setLatticeCells method only works the first time it is called

Developer's Guide documentation

-Overall code structure
-Coding style conventions
-Configuration management and how to add new source code or compilation options
-Wrapping code with SWIG and SWIG configuration files

PAPI: inclusive vs exclusive counting option

Figure out whether to support nested counting of PAPI events (or even if this is viable)

If not possible (or too cumbersome/tricky), error handling so that only non-nested counting is occuring

Modifications to computeFSRSources()

Here's my modified version of computeFSRSources. It was originally intended to access only the nonzeros in the scattering matrix for each material but this proved to be too time consuming, oddly enough. This was the reason behind using index, n_mat and nnzs. It is now simplified to only calculate scattering below a cutoff representing the highest incoming group which is scattered. This cutoff is stored in nnzs as the code loops through all outgoing groups.

Also, four different index variables are introduced to reduce the total number of multiplications that have to be done within each loop to retrieve information for various quantities.

Let me know if you have any questions, or see any possible problems with this code. Thanks!

FP_PRECISION CPUSolver::computeFSRSourcesNew() {

int tid;
int flx_ind; 
int sig_ind;
int fg_ind;
int t_ind;
int n_mat;
int nnzs;

FP_PRECISION* nu_sigma_f;
FP_PRECISION* sigma_s;
FP_PRECISION* sigma_t;
FP_PRECISION* chi;
Material* material;

FP_PRECISION scatter_source;
FP_PRECISION fission_source;
FP_PRECISION temp;

FP_PRECISION source_residual = 0.0;
FP_PRECISION inv_k_eff = (1.0 / _k_eff);  // Precalculate inverse of k

std::vector< std::vector<int> > index;
index = _geometry->getEnergyBounds_fg();

FP_PRECISION* cell_res = new FP_PRECISION[_num_FSRs];
FP_PRECISION* group_res = new FP_PRECISION[_num_threads*_num_groups];

/* _scatter_sources was changed from [_num_FSRs*_num_groups] to 
   [_num_threads*_num_groups] since storing for each cell is unnecessary */

/* For all regions, find the source */
#pragma omp parallel for private(tid, flx_ind, sig_ind, fg_ind, \
  t_ind, n_mat, nnzs, material, nu_sigma_f, chi, sigma_s, sigma_t,\
  fission_source, scatter_source, temp) schedule(guided)
for (int r=0; r < _num_FSRs; r++) {

    tid = omp_get_thread_num();
    material = _FSR_materials[r];
nu_sigma_f = material->getNuSigmaF();
chi = material->getChi();
sigma_s = material->getSigmaS();
    sigma_t = material->getSigmaT();
    n_mat = material->getUid();

/* Precalculate indexes only dependendent on cell, thread or material */
flx_ind = r*_num_groups;    // Index to access current cell
fg_ind = n_mat*_num_groups; // Index to access material
t_ind = tid*_num_groups;    // Index to access threaded _scatter_sources and group_res

/* Compute fission source */
if (!(material->isFissionable())) {
    fission_source = 0.0;
}
else {
    for (int g=0; g < _num_groups; g++)
        _fission_sources[flx_ind + g] = _scalar_flux[flx_ind + g] * nu_sigma_f[g];

    fission_source = inv_k_eff * pairwise_sum<FP_PRECISION>(&_fission_sources[flx_ind], 
                        _num_groups);
}

/* Compute total scattering source for group g */
    for (int g=0; g < _num_groups; g++) {

    sig_ind = g*_num_groups; //Index for access cross section for outgoing group g

    nnzs = index[fg_ind+g].back()+1;
    for (int gp = 0; gp < nnzs; gp++) { 
        _scatter_sources[t_ind + gp] = sigma_s[sig_ind + gp]*_scalar_flux[flx_ind + gp];
    }

    scatter_source = pairwise_sum<FP_PRECISION>(&_scatter_sources[t_ind],
                        nnzs);

    /* Set the total source for region r in group G */
    _source[flx_ind + g] = (chi[g]*fission_source + scatter_source) * ONE_OVER_FOUR_PI;

    _reduced_source[flx_ind + g] = _source[flx_ind + g] / sigma_t[g];

    /* Compute the norm of residual of the source in the region, group */
    if ( fabs(_source[flx_ind + g]) > (7E-10/_num_groups) ){
        temp = (_source[flx_ind + g] - _old_source[flx_ind + g])/_source[flx_ind + g];
        group_res[t_ind + g] = temp*temp;
    }

    /* Update the old source */
    _old_source[flx_ind + g] = _source[flx_ind + g];

    }

/* Sum all residuals stored in group_res in current thread */
cell_res[r] = pairwise_sum<FP_PRECISION>(&group_res[t_ind],_num_groups);

}

/* Sum up the residuals from each group and in each region */
source_residual = pairwise_sum<FP_PRECISION>(cell_res,_num_FSRs);
source_residual = sqrt(source_residual / _num_FSRs);

delete [] cell_res;
delete [] group_res;

log_printf(DEBUG,"ALL SOURCES AND RESIDUALS BUILT");

return source_residual;

}

Possible issue with Group 7 Flux plotting in c5g7 benchmark

Hey Will,

I ran the c5g7 benchmark and plotted the flux profiles for all 7 groups and found some artifacts in the group 7 plot. Not sure why but I thought I should let you know.

Build and validate 2D BEAVRS benchmark

Create input files for the BEAVRS benchmark geometry at the axial-midplane. Use 2-group and 8-group cross-section data generated from CASMO for the materials.

This issue will require fixing some ray tracing issues - namely, making the outermost boundary a circle (ie, the core vessel).

Segfault with -O2 or greater optimization flag on BGQ with bgxlc compiler

Change Geometry::findFSRs(...) method name

Since this is a class public method, the name of the method should better describe what is going on. From the name, users have no idea that this is related to CMFD. In addition, a "find" method which is exposed to the user (as a public method) would typically return something to the user. This doesn't appear to be the case for this method.

Either this method should be made private so that users don't unexpectedly call it from Python expecting one thing but getting another, or it should be renamed to something more descriptive.

CUDA module not building properly - init.py is undefined

Found this issue when trying to install with CUDA on Noah's machine.

Add capability to unionize halfspaces

Modification to CMFD for many energy groups

Trying to allocate memory for the A and M CMFD matrices for a 10,000 cells and 361 groups caused the code to crash. Instead of allocating it at once, memory can be allocated on cell at a time. First, the types for A and M must be changed in CMFD.h

double** _M
double** _A

Then, in the constructor for CMFD,the memory allocation section can be changed to the following:

/* Memory allocated for each CMFD cell */
int size_M = _num_groups*_num_groups;
int size_A = _num_groups*(_num_groups+4);
int _num_cells = _cells_x*_cells_y;

/* Allocate pointers for each cell */
_M = new double* [_num_cells];
_A = new double* [_num_cells];

for ( int i = 0; i < _num_cells; i++ ) {

  log_printf(DEBUG,"Allocating matrix memory for cell %d",i);

  _M[i] = new double[size_M]; //Allocate M memory for one cell
  _A[i] = new double[size_A]; //Allocate A memory for one cell

}

Then, in all the necessary functions, any calls to _M or _A must be changed accordingly. Here's an example:

In Cmfd::constructMatrices -

Before: _A[row*(_num_groups+4)+e+2]

After: _A[cell][e*(_num_groups+4)+e+2]

This fixed the issue for me.

Post OpenMOC API on web documentation

Connect auto-generated Doxygen documentation to Sphinx using Breathe, and post to the gh-pages branch.

Allow user to provide numpy array with coarse group structure for CMFD.

The CMFD module currently allows users to specify the number of groups to use in the CMFD solver. This can be anywhere from 1 to the number of groups in the MOC solver. If the number of CMFD groups is different than the number of MOC groups, the MOC group structure is cut up into coarse groups with each coarse group containing the same number of fine groups. If (num_MOC_groups % num_CMFD_groups != 0), the lowest group will be assigned the extra groups and thus grow by (num_MOC_groups % num_CMFD_groups) groups. This issue is to add the functionality to allow the user to specify the coarse mesh group structure and thereby achieve better spectral resolution in a coarse group CMFD solve.

Command line options for build system

Changing the config.py for each machine leads to merge conflicts when pushing/pulling from different machines. We should make the most commonly changed options command line options to reduce this issue.

LRA input file errors

Several of the LRA sample input files end in errors of the following type:

[ ERROR ] Unable to set diffusion coefficient for group 0 for Material 2
[ ERROR ] ... which contains 6 energy groups
[ ERROR ] Unable to set diffusion coefficient for group 0 for Material 2
[ ERROR ] ... which contains 6936 energy groups

Also, we should probably consolidate all of the different LRA example input files into just a single file. All of the differentiations of it are probably only useful for advanced users interested in that specific problem.

Timer Inconsitencies for inclusive vs. exclusive timing scope

The Timer class may produce inconsistent for results runs > 5 seconds. In particular, the time reported for the transport sweep is greater than the total runtime. This seems to be due to compounded roundoff error in adding up the runtime for each transport sweep.

printString() method for Lattice class throws Segmentation Fault

Changing _thread_flux for larger problems

The following changes to how _thread_flux is implemented make it so that the memory for each thread is allocated one at a time instead of all at once.

in ThreadPrivateSolver.h, definition of _thread_flux should be changed to

FP_PRECISION** _thread_flux;

In ThreadPrivateSolver.cpp, the following changes should be made

void ThreadPrivateSolver::initializeFluxArrays() {

CPUSolver::initializeFluxArrays();

/* Delete old flux arrays if they exist */
if (_thread_flux != NULL)
    delete [] _thread_flux;

int size;

/* Allocate memory for the flux and leakage arrays */
try{

size = _num_FSRs * _num_groups;

/* Allocate pointers for each thread scalar flux */
_thread_flux = new FP_PRECISION* [_num_threads];

for ( int thread = 0; thread < _num_threads; thread++ ) {

    log_printf(DEBUG,"Allocating flux memory for thread %d",thread);

    _thread_flux[thread] = new FP_PRECISION[size]; //Allocate fluxes for one thread

    log_printf(DEBUG,"Location of allocated memory is %X",_thread_flux[thread]);

}

/* Allocate a thread local array of FSR scalar fluxes */
//size = _num_threads * _num_FSRs * _num_groups;
//_thread_flux = new FP_PRECISION[size];
}
catch(std::exception &e) {
    log_printf(ERROR, "Could not allocate memory for the solver's fluxes. "
       "Backtrace:%s", e.what());
}

log_printf(NORMAL,"Size of thread scalar flux is %d", _num_threads*size);

}

This means that all functions in ThreadPrivateSolver using _thread_flux must be modified too. Here is an example:

void ThreadPrivateSolver::flattenFSRFluxes(FP_PRECISION value) {

CPUSolver::flattenFSRFluxes(value);

int temp_ind;

/* Flatten the thread private flat source region scalar flux array */
#pragma omp parallel for private(temp_ind) schedule(guided)
for (int tid=0; tid < _num_threads; tid++) {
    for (int r=0; r < _num_FSRs; r++) {
    temp_ind = (r)*_num_groups;
    for (int e=0; e < _num_groups; e++) {
        _thread_flux[tid][temp_ind + e] = 0.0;
    }
    }
}

return;

}

OpenMOC Build Issue

The python-dev package appears to be a required dependency.

Modifications to computeKeff()

Also, here are the modifications I made to computeKeff. I created separate loops for fission and absorption so I could reuse cell_rates and group_rates without creating two separate ones, but doubling the memory may be worth it if you want it to go faster. My changes are a bit more straight forward than for computeFSRSources, but let me know if you have any questions or you see an issue.

void CPUSolver::computeKeff() {

Material* material;
FP_PRECISION* sigma_a;
FP_PRECISION* nu_sigma_f;
FP_PRECISION volume;
FP_PRECISION ScaleLeak;

FP_PRECISION tot_abs = 0.0;
FP_PRECISION tot_fission = 0.0;

int tid;
int t_ind;
int flx_ind;

FP_PRECISION* cell_rates = new FP_PRECISION[_num_FSRs];
FP_PRECISION* group_rates = new FP_PRECISION[_num_threads*_num_groups];

/* Loop over all flat source regions and compute the volume-weighted
 * absorption rates */
#pragma omp parallel for private(tid, t_ind, flx_ind, volume, material, \
  sigma_a, nu_sigma_f) schedule(guided)
for (int r=0; r < _num_FSRs; r++) {

    tid = omp_get_thread_num();

    volume = _FSR_volumes[r];
material = _FSR_materials[r];
sigma_a = material->getSigmaA();
nu_sigma_f = material->getNuSigmaF();

flx_ind = r*_num_groups;
    t_ind = tid*_num_groups;

for (int e=0; e < _num_groups; e++) {
        group_rates[t_ind + e] = sigma_a[e] * _scalar_flux[flx_ind + e];
    }

/* Reduce absorption across all groups for current cell */
cell_rates[r] = pairwise_sum<FP_PRECISION>(&group_rates[t_ind], _num_groups) * volume;

}

/* Reduce absorption across FSRs */
tot_abs = pairwise_sum<FP_PRECISION>(cell_rates, _num_FSRs);

/* Loop over all flat source regions and compute the volume-weighted
 * fission rates */
#pragma omp parallel for private(tid, t_ind, flx_ind, volume, material, \
  sigma_a, nu_sigma_f) schedule(guided)
for (int r=0; r < _num_FSRs; r++) {

tid = omp_get_thread_num();

    volume = _FSR_volumes[r];
material = _FSR_materials[r];
sigma_a = material->getSigmaA();
nu_sigma_f = material->getNuSigmaF();

flx_ind = r*_num_groups;
    t_ind = tid*_num_groups;

for (int e=0; e < _num_groups; e++) {
    group_rates[t_ind + e] = nu_sigma_f[e] * _scalar_flux[flx_ind + e];
    }

    /* Reduce fission across all groups for current cell */
cell_rates[r] = pairwise_sum<FP_PRECISION>(&group_rates[t_ind], _num_groups) * volume;

}

/* Reduce fission across all FSRs */
tot_fission = pairwise_sum<FP_PRECISION>(cell_rates, _num_FSRs);

/* This is my stuff, don't worry about it ;) */
if (_mgtype == 0) {
ScaleLeak = 1.0;
}
else {
ScaleLeak = _geometry->getLeakageScalingFactor();
}

/** Reduce leakage array across tracks, energy groups, polar angles */
int size = 2 * _tot_num_tracks * _polar_times_groups;
_leakage = pairwise_sum<FP_PRECISION>(_boundary_leakage, size) * 0.5;
_leakage *= ScaleLeak;

_k_eff = tot_fission / (tot_abs + _leakage);

log_printf(DEBUG, "abs = %f, fission = %f, leakage = %f, "
       "k_eff = %f", tot_abs, tot_fission, _leakage, _k_eff);

delete [] group_rates;
delete [] cell_rates;

return;

}

Compare weak and strong scaling performance with hardware prefetch aggressiveness

We should examine the scaling to determine the effect of hardware prefetching. Hardware prefetch can be set at compile time on Intel and IBM BGQ machines using the following compiler flags, where n is the number of cache lines (default setting is 2):

Intel: -opt-prefetch[=n]
IBM: -qprefetch=aggressive, -qprefetch=noaggressive, -qnoprefetch

The end result would be weak and strong scaling plots using the exp vs. table for different levels of prefetch aggressiveness (1,2,3,4 cache lines). This may help us explain the superlinear weak scaling, akin to arguments made in the Simon and McGalliard paper "Multi-Core Processor Memory Contention Benchmark Analysis Case Study."

Test OpenMP static scheduling

The main transport sweep loop may see a performance boost from static scheduling where the chunk size is the total number of tracks in azimuthal halfspace divided by the number of threads.

Build a OpenMOC Performance Mini-App

This would be a mini-App for OpenMOC akin to XS Bench. The Mini-App would isolate the different kernels in OpenMOC and user to measure the performance of each one. This would be useful for measuring the power profile for the OpenMOC's kernels with finer code granularity than simply measuring the power profile for the entire code.

What do we need:

A robust, thread-safe PapiProfiler class
A random material nuclear cross-section generator
A random track and segment generator
A MOCProfiler class with isolated OpenMOC kernels from each of the OpenMOC solver types (CPUSolver, GPUSolver, ThreadPrivatSolver, VectorizedSolver, VectorizedPrivateSolver).

Runtime Options

OMP or CUDA threads
materials
energy groups
tracks
average segments / track
FSRs
distribution of segments / track (gaussian, poisson, uniform, etc)
distribution of FSRs for track segments (clustering distributions)
distributions of materials for track segments (clustering distributions)

Change to source residual definition

I ran into some issues comparing the residuals of runs using different numbers of groups. The source residual for a 361 group problem starts at 19.0, sqrt(_num_groups), so setting the tolerance to 1E-5 actually results in converging the eigenvalue effectively to 1E-5/19.0. This seemed to be resolved by changing the source residual is definition in CPUSolver::computeFSRSources() as follows:

source_residual = sqrt(source_residual / (_num_FSRs * _num_groups) );

Implement routine to set OMP affinity at runtime

We can write a Python routine for a new "openmoc.perf" module which will set environment variables at runtime. This will allow us to set the OMP affinity and to explore different mappings across processors/sockets and to investigate NUMA and resource contention effects.

Implement openmoc.perfmodel Python module

Develop a performance model of the solver based on the number of segments, groups, and flat source regions. Implement a Python module with routines to estimate the number of flops, loads and runtime given a instances of a Geometry and TrackGenerator object. This will be useful to compare predicted values with measured PAPI values.

In addition, it may be useful to write some C/C++ code and wrap it as a C extension for Python to measure and report key hardware characteristics, such as cache size and latency.

Variable length arrays break CMFD OpenMP parallel regions

The code breaks with older versions of gcc, icpc, and bgxlc compilers due to the OpenMP parallel regions in Cmfd.cpp. One or more variable length C++ containers (std::vector or std::map) is declared as a private variable for these regions. The bug can be fixed by either using a fixed length container, type casting the container as an array pointer, or initializing a variable private to each thread to point to the container from within the parallel region.

Indentation in source code

CPUSolver.cpp has quite a few tabs rather than spaces, which leads to unexpected indentations in different text editors. I suggest tabs be eliminated in favor of spaces.

Also, after talking to Will, it sounds like the desired code convention moving forward is to use two spaces (rather than four spaces) for indenting.

Build a test suite

The suite should test all classes and routines, and run through a series of benchmark problems using all different solver types and parameters:

Parameters:

CPUSolver, ThreadPrivateSolver, VectorizedSolver, VectorizedPrivateSolver, GPUSolver, etc
exp vs. table lookup
azimuthal angles
track spacing
spatial discretization
source convergence tolerance
threads (# thread and threablocks for GPU)

Benchmarks:

single group homogeneous medium
multi-group homogeneous medium
pin cell
c5g7

Investigate resource contention by localizing resources with OMP's affinity mechanisms

An interesting study for looking at our superlinear weak scaling would be to localize points of contention on shared memory systems (L2, L3 caches, FSB, etc.) by running the code in parallel with different thread-to-core mappings. This can be done using OpenMP's affinity mechanisms - Intel's KMP_AFFINITY environment variable, or IBM's BG_THREADLAYOUT environment variable for Blue Gene.

A good reference for study is R. Hood's talk titled "Performance Impact of Resource Contention in Multicore Systems."

PAPI Instrumentation

Solver class:

Attributes:

Methods:

setPAPIevent(char * event)
getPAPIcounts(char *event, int code_section)
clearPAPIevent(char* event)

Code Sections to Profile:

Try using the OPA ASM OpenMP atomics for CPUSolver

Kazutomo Yoshi at ANL may help us wit using an OPA API for inserting assembly language optimized atomic operations in place of the OpenMP atomics used for the scalar flux update in the CPUSolver. This may reduce the parallel scaling performance degradation that we see with this solver as compared to the ThreadPrivateSolver.

Optional Arguments not Optional for CPUSolver and ThreadPrivateSolver

ThreadPrivateSolver and CPUSolver have optional Geometry and TrackGenerator arguments in the constructor (default to NULL), but OpenMOC seg faults if Geometry and TrackGenerator objects aren't supplied.

Timer fails on compute nodes for BGQ

2D modular ray tracing

For spatial domain decomposition, we will need modular ray tracing for efficient communication of angular fluxes across domains. To give users the option of using cyclic ray tracing or modular ray tracing, we can can subclass the TrackGenerator class into a CyclicTrackGenerator and a ModularTrackGenerator class for each application.

Online installation instructions

Update online installation instructions using Sphinx and post to the gh-pages branch for web hosting by GitHub.

Let users know they need to install python-dev

Let users know they need to install python-dev and not only the python binaries if they encounter

$ openmoc/openmoc_wrap.cpp:149:20: fatal error: Python.h: No such file or directory

mit-crpg / openmoc Goto Github PK

openmoc's People

Stargazers

Watchers

Forkers

openmoc's Issues

OMP or CUDA threads

materials

energy groups

tracks

average segments / track

FSRs

azimuthal angles

threads (# thread and threablocks for GPU)

Recommend Projects

Recommend Topics

Recommend Org