graphblas / graphblas-api-cpp Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 1.0 841 KB

GraphBLAS C++ API Specification.

Home Page: https://graphblas.org/graphblas-api-cpp/

TeX 58.19% Makefile 41.81%

graphs

graphblas-api-cpp's People

Contributors

Stargazers

Watchers

Forkers

rscohn2

graphblas-api-cpp's Issues

Add sections on interoperability

Define precisely how a user should outfit a type to fulfill MatrixRange and MutableMatrixRange, etc. (Also do a proof-of-concept, e.g. with mdspan and nwgraph.)

Also, define precisely how a user should outfit a type to fulfill Monoid.

Descriptors vs. Views (and how to handle OUTP:REPLACE)

GBTL has shown how lightweight wrapper classes (views) can be used to capture transpose, structure and complement. Support for "replace" vs "merge" semantics is currently half baked.

Can we cleanly do away with the Descriptor class approach in C++ (I [scott] hope so).

Take a pass through spec, agreeing and normalizing on names

Inspect and normalize our specification of undefined behavior

We must ensure we distinguish between two kinds of UB:

UB where the behavior of the program is undefined
UB where the state of an object, value returned from a function are undefined.

"Signaling Accumulators"

A "signaling accumulator" would be a useful concept for many graph algorithms where you want to keep track of whether an update has occurred in each iteration.

For example, in SSSP, we can test whether the algorithm has converged by checking whether any new updates were made to the distance vector during this iteration. Traditionally, you'd check this by performing an additional subtraction, or else otherwise checking whether the new distance vector has any new, shorter distances. Alternately, you could implement a new min operator that sets a flag if there's an update. If no update occurs, the algorithm has converged, and the algorithm can end.

    bool updated_value = false;


    auto signaling_min = [&](auto&& a, auto&& b) {
                           auto min = grb::min{}(a, b);
                           if (min != a) {
                             updated_value = true;
                           return min;
                         };

    auto update = grb::multiply(grb::transpose(a), dist,
                           grb::min{}, grb::plus{},
                           grb::full_vector_mask{});

    auto dist2 = grb::ewise_union(dist, update, signaling_min);


    if (!updated_value) {
      break;
    }

For GraphBLAS C++ 2.0, I would propose implementing a "signaling op" wrapper that indicates whether an update occurs when accumulating.

    // Proposal for signal API
    bool update
    grb::signal min(grb::min{}, update);

    template <typename Fn, typename Flag>
    struct signal {

      operator()(auto&& a, auto&& b) {
        auto c = fn_(a, b);
        if (c == a) {
          flag_ = true;
        }
        return c;
      }

      Flag& flag_;
      Fn fn_;
    };

Implement `reduce` algorithm

We should implement a reduce algorithm that will reduce a grb::vector to a single value or reduce a grb::matrix to a single value, or vector by row or column.

We had a long discussion in this week's meeting about the API for reduce, and whether to

Default an initial value to T{}, as in std::reduce
Always require an initial value
Default the initial value to grb::monoid_traits<Fn, T>::identity() when the binary operator passed in is a monoid.

At some point we should implement this full algorithm, but for now I am leaving this unimplemented as it is fairly trivial to use std::reduce at the moment, as shown below.

grb::matrix<float> x = ...;
grb::values_view view(x);
float sum = std::reduce(view.begin(), view.end(), float(0), grb::plus{});

Evaluate coverage of extract and assign

Add `kronecker_product` algorithm

Add an algorithm to implement the Kronecker product of two matrices.

Create appendix with examples from C spec

Review behavior of exceptions

Code example using `transform_view`

Write a code example where we use transform_view to achieve the same result as indexUnaryOp in the C API.

Consider a semiring concept

This could simplify the definition of multiply, ewise_add, and ewise_multiply. Users could also define semirings in-place using a monoid and a binary operator.

multiply(a, b, c, {grb::plus(), grb::times()});

Implement `trace` algorithm

Implement an algorithm to compute the trace of a GraphBLAS matrix.

Dealing with all of the variants of apply

The GraphBLAS C API v. 2.0 has the following variants and signatures (pseudocode)

Standard Variants

standard vector variant: apply(Vec w, Vec mask, BinaryOp accum, UnaryOp op, Vec U, ...)
standard matrix variant: apply(Mat C, Mat Mask, BinaryOp accum, UnaryOp op, Mat A, ...)

BinaryOp bind 1st/2nd Variants

bind-1st vector variant: apply(Vec w, Vec mask, BinaryOp accum, BinaryOp op, Scalar s, Vec U, ...)
bind-2nd vector variant: apply(Vec w, Vec mask, BinaryOp accum, BinaryOp op, Vec U, Scalar s, ...)
standard matrix variant: apply(Mat C, Mat Mask, BinaryOp accum, BinaryOp op, Scalar s, Mat A, ...)
standard matrix variant: apply(Mat C, Mat Mask, BinaryOp accum, BinaryOp op, Mat A, Scalar s, ...)

IndexUnaryOp Variants

standard vector variant: apply(Vec w, Vec mask, BinaryOp accum, IndexUnaryOp op, Vec U, Scalar s, ...)
standard matrix variant: apply(Mat C, Mat Mask, BinaryOp accum, IndexUnaryOp op, Mat A, Scalar s, ...)

The C++ API way of defining the standard vector signature is as follows (matrix follows the same pattern):

    template<typename T, typename I, typename Hint, typename Allocator,
             typename MaskType,
             typename AccumulatorType,
             typename UnaryOpType,
             typename UVectorType>
    void apply(vector<T, I, Hint, Allocator>  &w,
               MaskType                 const &mask,
               AccumulatorType                 accum,
               UnaryOpType                     op,
               UVectorType              const &u,
               OutputControlEnum               outp = MERGE);

The one might define the BinaryOp+bind as follows but bind-1st and bind-second have the same templated signature and need to be dealt with with a static assert in the body as follows:

    template<typename T, typename I, typename Hint, typename Allocator,
             typename MaskT,
             typename AccumT,
             typename BinaryOpT,
             typename FirstT,
             typename SecondT>
    inline void apply(
        vector<T, I, Hint, Allocator>  &w,
        MaskT                    const &mask,
        AccumT                          accum,
        BinaryOpT                       op,
        FirstT                   const &lhs,
        SecondT                  const &rhs,
        OutputControlEnum               outp = MERGE)
    {
        // figure out if the user wants bind1st or bind2nd based on the arg types
        constexpr bool is_bind1st = is_vector_v<SecondT>;
        constexpr bool is_bind2nd = is_vector_v<FirstT>;

        // make sure only one of the types matches
        static_assert(is_bind1st ^ is_bind2nd, "apply isn't going to work");
        ...
    }

However this signature is not recommended for the C++ API was we should be using std::bind at the call site and call the standard variant as follows (for bind-2nd):

        // multiply all elements of m by damping_factor
        grb::apply(m, grb::NoMask(), grb::NoAccumulate(),
                   std::bind(grb::Times<RealT>(),
                             std::placeholders::_1,
                             damping_factor),
                   m);

Without the BinaryOp-bind variants then there is no collision with what the IndexUnaryOp variant would be, but one may still want to asser that u is a vector:

    template<typename T, typename I, typename Hint, typename Allocator,
             typename MaskType,
             typename AccumulatorType,
             typename IndexUnaryOpType,
             typename UVectorType>
    void apply(vector<T, I, Hint, Allocator>  &w,
               MaskType                 const &mask,
               AccumulatorType                 accum,
               IndexUnaryOpType                op,
               ValueType                const &s
               UVectorType              const &u,
               OutputControlEnum               outp = MERGE);

Is this approach the best? I see no way to do binding for index unary ops and call the standard variant because. While I could find a way to bind the val parameter, the element location indices need to be provided at the call site inside the implementation of apply.

Expand introduction

Expand the introduction of the spec document.

Review use of exceptions

Scott began a section on exceptions, which I have updated.

Scott also added some notes on errors in the GraphBLAS C API, which I have moved to below:

Notes on C API errors

Informational return values

SUCCESS: not necessary with exceptions
NO_VALUE: happens in extractElement() method when providing a transparent destination scalar and no stored value. Using grb::out_of_range exception. The justification for changing informational return with exception is the consistency with STL container at() method behaviour.

API errors

UNINITIALIZED_OBJECT: not necessary with RAII philosophy
NULL_POINTER: not necessary, no pointers used in C++ API
INVALID_VALUE:
- used with incorrect enum values (not necessary in C++)
- used if a dimension is defined as 0 (like in vector or matrix construction or resize, (do we need this?)
- duplicate indices during build (do we need this?).
INVALID_INDEX: This is specifically for indices that are outside the defined dimensions (for setElement(), extractElement(), removeElement()). Does this overload with NO_VALUE?
DOMAIN_MISMATCH: not applicable in C++...if it compiles there is no mismatch.
OUTPUT_NOT_EMPTY: was used when trying to build with a container that is not empty....what should C++ do?
NOT_IMPLEMENTED: ???

Execution Errors

PANIC - do we need this?
OUT_OF_MEMORY: using std::bad_alloc... why not grb::bad_alloc? Because allocators not part of GraphBLAS spec.
INSUFFICIENT_SPACE: ???
INVALID_OBJECT: The requires a discussion of exception quarantees. I.e. what is the state of an output object when an exception is thrown during an operation it was called with? This is also complicated by non-blocking mode which is deferred for now.
INDEX_OUT_OF_BOUNDS: grb::out_of_range (This is overloaded use...for two different meanings)
EMPTY_OBJECT: (only called in reduce to scalar...does C++ API side step this)

Template arguments to Vector and Matrix

ScalarT: I think we all agree that type of values stored in Matrices and Vectors should be a template parameter
IndexT: Due to past requests and specifications from other graph/sparse libraries, we should consider template parameter for the index type.
Variadic template parameters for storage hints or graph types, requires much more work and discussion (e.g. RowMajor vs. ColumnMajor, and Directed vs. Undirected, etc)
Should context types show up in the template parameter list for distributed implementations.

Reach consensus on `read_matrix`

Provide access to local tiles in distributed matrix?

Today we discussed adding execution policies to the C++ API, as well as other facilities that might be necessary for a distributed API.

One of the issues that came up is whether---and how---to provide access to a process' local tiles in a distributed matrix.

One option would be to explicitly expose a tile grid, while another would be to expose a range of submatrices, along with the particular coordinates of the submatrix in the global matrix. The second option would support any distributed matrix distributions as long as they had rectangular, non-overlapping tiles. (The tile shapes could be non-uniform.) This could even potentially support non-rectangular tiles if we find a way to represent the coordinates of non-rectangular submatrices.

Whether to expose this, and what exactly we'd expect users to do with this, is still up for discussion.

distributed_matrix<float> m = ...;


auto local_tile = ...;

// Local multiply
grb::multiply(local_tile, ...);

// Distributed multiply
grb::multiply(m, ...);

Tile grid

0 1 0 1
2 3 2 3
0 1 0 1
2 3 2 3

auto g = m.grid_shape();

// (4, 4)
fmt::print("{}\n", g);

for (int i = 0; i < m.grid_shape()[0]; i++) {
  for (int j = 0; j < m.grid_shape()[1]; j++) {
    if (m.tile({i, j}).rank() == my_rank()) {
      algorithm(m.tile({i, j}), ...);
    }
  }
}

if (m.tile({0, 2}).rank() == my_rank()) {
  ...;
}

for (auto&& [submatrix_coordinates, tile] : m.my_local_tiles()) {
  // I know where my submatrix lies based on `submatrix_coordinates`
  algorithm(tile, ...);
}

Add NumPy-like `reshape` mechanism for converting vectors to matrices

Add a NumPy-like reshape method for converting matrix -> matrix, matrix -> vector, vector -> matrix.

Document the parallel systems we need to address and define a supporting platform model that works for GraphBLAS

We need to document the parallel systems we must be able to support with the graphBLAS. This would include:

Multi-core, multi-CPU in a shared address space. Explicit management of NUMA features of a system is critical
Single GPU ... basic Host/Device model with disjoint host/device memories and Uniform Shared Memory (USM)
Multi GPU ... Host/Device model with disjoint memories and USM
Arbitrary accelerators instead of GPUs (aside: An accelerator is restricted to a fixed API, unlike a GPU which is programmable)
Shared nothing distributed systems with nodes composed of the above

We need a platform model that appropriately abstracts systems composed of the above. It must deal with the complexity of the various memory spaces and support arbitrary, dynamic partitions of the above.

Finally, we need a way to deal with nonblocking GraphBLAS operations as part of a larger execution context that supports asynchronous execution. I will add a separate issue for this topic.

Add spec support for input/output

rgri has support for mmread and mmwrite, this should be formally supported in the spec.