metagraph-dev / mlir-graphblas Goto Github PK

View Code? Open in Web Editor NEW

15.0 5.0 6.0 3 MB

MLIR tools and dialect for GraphBLAS

Home Page: https://mlir-graphblas.readthedocs.io/en/latest/

License: Apache License 2.0

Python 37.62% Shell 0.02% C++ 36.27% CMake 0.24% MLIR 22.63% Cython 3.21%

mlir-graphblas's Introduction

MLIR dialect for GraphBLAS + Python tools

Note that this code currently requires llvm-project@bd0cae6.

graphblas-opt

In order to build graphblas-opt, run python3 build.py from mlir_graphblas/src/. This will build graphblas-opt and run the tests for graphblas-opt. The built files will be stored in mlir_graphblas/src/build.

build.py does not rebuild from scratch by default. To perform a clean build, run python3 build.py -build-clean.

Linting with clang-format

Ensure that clang-format is available (install the clang-tools conda package) and run:

./run-clang-format.py -r mlir_graphblas/src/

If changes required, can make changes in place with

./run-clang-format.py -i -r mlir_graphblas/src/

Note about Transition

mlir-graphblas is transitioning away from lowering code targeting the SCF dialect and towards lowering code targeting linalg.generic. This process is happening in tandem with changes to the sparse-tensor dialect's lowering of linalg.generic with dynamically-shaped output. As a result, mlir-graphblas is temporarily pointing at the mlir-ac conda package which is built from a branch off the LLVM project code. Once these changes are merged into the main branch on LLVM, mlir-graphblas will be updated to target that. Until then, we will be out of sync with the LLVM project.

mlir-graphblas's People

Contributors

Stargazers

Watchers

Forkers

jim22k seibert paul-tqh-nguyen eriknw vishalbelsare chelini

mlir-graphblas's Issues

GraphSearch doesn't handle terminating nodes

GraphSearch assumes that the initial seeds will always find a new node each iteration. If a node has no outgoing edges (i.e. is a terminating node), the algorithm will fail.

The B matrix is being updated in-place, but should be replaced each iteration.

Going from a vector (in the case of argmin/argmax) will be much easier using sparse_tensor.new and coordinates from that vector. The "random" cases are easy because they already return a matrix.

Duplicate passes in the explorer

Although not common, sometimes we might want to run the same pass multiple times when compiling. The duplicate named pass causes the explorer in the Sequential tab to get confused. Example:

passes = [
    '--test-sparsification=lower=1',
    '--convert-linalg-to-loops',
    '--tensor-bufferize',
    '--func-bufferize',
    '--tensor-constant-bufferize',
    '--convert-linalg-to-loops',  # duplicate
    '--finalizing-bufferize',
    '--convert-scf-to-std',
    '--convert-std-to-llvm',
]

We should privately assign each stage a unique identifier in the explorer UI, even if the pass name is duplicated.

Investigate using canonicalization passes to add graphblas.convert_layout ops when appropriate

Currently, in our lowering passes, we add graphblas.convert_layout ops when we're given a CSC matrix when the "real" lowering expects a CSR matrix.

It might be cleaner to use canonicalization passes passes to do this sort of thing.

It might also be a good idea to add tensor casts to convert fixed shaped tensors to tensors using ? in the shape. This would help with #129.

For graphblas.comment ops, it's unclear whether or not these should be handled during lowering or canonicalization.

Add documentation for graphblas.matrix_reduce_to_vector

graphblas.matrix_reduce_to_vector is entirely missing from the docs.

Convert syntax-verifying FileCheck tests to behavior-verifying FileCheck tests

7fd2602 introduced the first FileCheck tests that verify the behavior of the transformed code rather than solely the syntactic output of the code transformations.

Since we care mostly about the behavior of the transformed code and because maintaining tests that are brittle against minor changes to the transformed code is a burden we don't care for, we should convert all of our current FileCheck tests to test behavior instead of syntax.

We should also consider moving some of the tests in test_mlir_builder.py to behavior-verifying FileCheck tests as well since many of them are testing the behavior of ops from the GraphBLAS dialect rather than the IR builder.

Add FileCheck tests for the lowering of graphblas.matrix_reduce_to_vector

There are several tests with names matching lower_* here. There's not one corresponding to graphblas.matrix_reduce_to_vector. We should add one.

Make the graphblas.matrix_select op correctly handle CSC inputs

As pointed out here, when given a CSC input tensor, the current implementation of graphblas.matrix_select only works correctly when the selector attribute is "gt0". The result for "triu" and "tril" will swap.

Support tensor<?x?xindex, #ENCODING> in GraphBLAS dialect

tensor<?x?xindex> is a legitimate type.

None of the GraphBLAS dialect's ops can lower a tensor of this type currently.

Make sure to update the MLIR tests as well, e.g. the one in mlir_graphblas/src/test/GraphBLAS/lower_matrix_reduce_to_scalar.mlir mentioning @matrix_reduce_to_scalar_all_types.

NOTE: Lowering these ops with this tensor type as an operand currently results in a particularly not helpful error message. I think the solution (at least for graphblas.matrix_reduce_to_scalar) is to expand the TypeSwitch statements run on the tensor's element type.

"first" and "second" multiply pieces of semirings are broken

Attempting to compile using "min_first" will break due to the implementation of "first".

It seems to be due to the fact that we are directly yielding one of the input arguments.

^bb0(%arg2: f64, %arg3: f64):
    graphblas.yield mult %arg2 : f64

If I put in a pointless operation, it works

^bb0(%arg2: f64, %arg3: f64):
    %zero = constant 0.0 : f64
    %no_change = addf %arg2, %zero : f64
    graphblas.yield mult %no_change : f64

So it's not the fact that we ignore arg3. It seems to be an issue with a stand-alone yield statement.

Here is the crash report that is generated:

 1PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
 2Stack dump:
 30.	Program arguments: /Users/jkitchen/Projects/HIVE/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt --mlir-print-debuginfo --graphblas-lower
 4Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
 50  graphblas-opt            0x000000010b97eb77 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
 61  graphblas-opt            0x000000010b97d8f8 llvm::sys::RunSignalHandlers() + 248
 72  graphblas-opt            0x000000010b97f1b0 SignalHandler(int) + 272
 83  libsystem_platform.dylib 0x00007fff20458d7d _sigtramp + 29
 94  libsystem_platform.dylib 0x00007ffee51f0c00 _sigtramp + 18446744072717172384
105  graphblas-opt            0x000000010b3994f5 computeInnerProduct(mlir::PatternRewriter&, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Value, mlir::Type, ExtensionBlocks, mlir::Value, mlir::Value, mlir::Value) + 5125
116  graphblas-opt            0x000000010b3664c2 (anonymous namespace)::LowerMatrixMultiplyGenericRewrite::rewriteMatrixMatrixMultiplication(mlir::graphblas::MatrixMultiplyGenericOp, mlir::PatternRewriter&, ExtensionBlocks) const + 9410
127  graphblas-opt            0x000000010b363bc0 (anonymous namespace)::LowerMatrixMultiplyGenericRewrite::matchAndRewrite(mlir::graphblas::MatrixMultiplyGenericOp, mlir::PatternRewriter&) const + 640
138  graphblas-opt            0x000000010b3638ce mlir::detail::OpOrInterfaceRewritePatternBase<mlir::graphblas::MatrixMultiplyGenericOp>::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const + 62
149  graphblas-opt            0x000000010b7c57b8 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>) + 1416
1510 graphblas-opt            0x000000010b752aa3 mlir::applyPatternsAndFoldGreedily(llvm::MutableArrayRef<mlir::Region>, mlir::FrozenRewritePatternSet const&, mlir::GreedyRewriteConfig) + 2067
1611 graphblas-opt            0x000000010b32c522 mlir::applyPatternsAndFoldGreedily(mlir::Operation*, mlir::FrozenRewritePatternSet const&, mlir::GreedyRewriteConfig) + 66
1712 graphblas-opt            0x000000010b32b868 (anonymous namespace)::GraphBLASLoweringPass::runOnOperation() + 328
1813 graphblas-opt            0x000000010b7cd607 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 519
1914 graphblas-opt            0x000000010b7cdb46 mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::__1::unique_ptr<mlir::Pass, std::__1::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 134
2015 graphblas-opt            0x000000010b7cf2b1 mlir::PassManager::run(mlir::Operation*) + 625
2116 graphblas-opt            0x000000010b60c380 performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) + 528
2217 graphblas-opt            0x000000010b609f20 processBuffer(llvm::raw_ostream&, std::__1::unique_ptr<llvm::MemoryBuffer, std::__1::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) + 608
2318 graphblas-opt            0x000000010b60aaab mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) + 2651
2419 graphblas-opt            0x000000010aa0e91a main + 122
2520 libdyld.dylib            0x00007fff2042ef5d start + 1
2621 libdyld.dylib            0x0000000000000003 start + 18446603339974906023

graphblas.matrix_multiply with a mask that has an empty row gets a seg fault when mask_complement = true

This script gets a seg fault, which is unexpected.

import mlir_graphblas
import numpy as np
from mlir_graphblas.tools.utils import sparsify_array

engine = mlir_graphblas.MlirJitEngine()

passes = [
    "--graphblas-lower",
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--tensor-constant-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-std",
    "--convert-memref-to-llvm",
    "--convert-std-to-llvm",
]
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {
    func @matrix_multiply_plus_plus_mask_complement(%a: tensor<?x?xf64, #CSR64>, %b: tensor<?x?xf64, #CSR64>, %m: tensor<?x?xf64, #CSR64>) -> tensor<?x?xf64, #CSR64> {
        %b_csc = graphblas.convert_layout %b : tensor<?x?xf64, #CSR64> to tensor<?x?xf64, #CSC64>
        %answer = graphblas.matrix_multiply %a, %b_csc, %m { semiring = "plus_plus", mask_complement = true } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>, tensor<?x?xf64, #CSR64>) to tensor<?x?xf64, #CSR64>
        return %answer : tensor<?x?xf64, #CSR64>
    }
}
"""

engine.add(mlir_text, passes)

A_dense = np.array(
    [
        [1, 0, 0, 0],
        [2, 0, 3, 4],
        [0, 0, 0, 0],
        [0, 0, 5, 6],
    ],
    dtype=np.float64
)
A = sparsify_array(A_dense, [False, True])

B_dense = np.array(
    [
        [0, 7, 0, 7],
        [0, 1, 0, 0],
        [0, 1, 7, 0],
        [0, 7, 2, 0],
    ],
    dtype=np.float64
)
B = sparsify_array(B_dense, [False, True])

mask = sparsify_array(
    np.array(
        [
            [0, 0, 0, 0],
            [0, 1, 1, 0],
            [0, 1, 1, 0],
            [0, 1, 1, 0],
        ],
        dtype=np.float64
    ),
    [False, True]
)

engine.matrix_multiply_plus_plus_mask_complement(A, B, mask)

print("We never reach this line.")

Execution:

(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 test.py
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt
Segmentation fault: 11
(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$

Support the aggregator "count" in graphblas.matrix_reduce_to_scalar

Since we intend that count be supported for graphblas.matrix_reduce_to_vector, we should do the same for graphblas.matrix_reduce_to_scalar. It might be as simple as replacing the op with graphblas.num_vals (in addition to updating the docs, adding FileCheck tests, and adding Python tests).

Need more helpful error message than "Result element type differs from the input element types." in certain cases

The code below is incorrect because it expects a tensor result from a vector-vector multiply. It results in a confusing error message that's not always helpful.


(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src$ cat /tmp/example.mlir

#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CV64 = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed" ],
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @matrix_multiply_plus_times(%vecA: tensor<?xf64, #CV64>, %vecB: tensor<?xf64, #CV64>, %mask: tensor<?xf64, #CV64>) -> tensor<?xf64, #CV64> {
    %answer = graphblas.matrix_multiply %vecA, %vecB, %mask { semiring = "any_pair" } : (tensor<?xf64, #CV64>, tensor<?xf64, #CV64>, tensor<?xf64, #CV64>) to tensor<?xf64, #CV64>
    return %answer : tensor<?xf64, #CV64>
}
(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src$ cat /tmp/example.mlir | ./build/bin/graphblas-opt 
<stdin>:23:15: error: Result element type differs from the input element types.
    %answer = graphblas.matrix_multiply %vecA, %vecB, %mask { semiring = "any_pair" } : (tensor<?xf64, #CV64>, tensor<?xf64, #CV64>, tensor<?xf64, #CV64>) to tensor<?xf64, #CV64>
              ^
<stdin>:23:15: note: see current operation: %0 = "graphblas.matrix_multiply"(%arg0, %arg1, %arg2) {semiring = "any_pair"} : (tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ], pointerBitWidth = 64, indexBitWidth = 64 }>>, tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ], pointerBitWidth = 64, indexBitWidth = 64 }>>, tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ], pointerBitWidth = 64, indexBitWidth = 64 }>>) -> tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ], pointerBitWidth = 64, indexBitWidth = 64 }>>
(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src$

We should make sure there's a FileCheck test for whatever better error message we come up with.

Add tests and documentation for graphblas.print

#183 implements graphblas.print. It needs tests as well as documentation.

Create a graphblas.matrix_reduce_to_vector op

We'll need a graphblas.matrix_reduce_to_vector to do things like sum along the rows of a matrix, e.g.

%vector = graphblas.matrix_reduce_to_vector %matrix { aggregator = "sum", axis=1 } : tensor<?x?xi32, #CSR64>

The axis attribute is similar to the axis parameter in np.sum.

Misc. Notes:

In situations where we're summing rows (i.e. using axis = 1) on a CSR matrix, things run quickly. If we're doing so on a CSC matrix, we should first replace the op with a call to graphblas.transpose and then do the reduction.

TODO:

Create a graphblas.matrix_reduce_to_vector op.
Create a separate graphblas.matrix_reduce_to_vector_generic op.

Click here to see a guiding example of what it might lower to.

#CSR64 = #sparse_tensor.encoding<{ 
  dimLevelType = [ "dense", "compressed" ], 
  dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, 
  pointerBitWidth = 64, 
  indexBitWidth = 64 
}>

#SparseVec64 = #sparse_tensor.encoding<{ 
  dimLevelType = [ "compressed" ], 
  pointerBitWidth = 64, 
  indexBitWidth = 64 
}>

module  {
func private @vector_empty(tensor<?x?xi32, #CSR64>, index) -> tensor<?xi32, #SparseVec64>
func private @vector_resize_dim(tensor<?xi32, #SparseVec64>, index, index) -> ()
func private @vector_resize_pointers(tensor<?xi32, #SparseVec64>, index, index) -> ()
func private @vector_resize_index(tensor<?xi32, #SparseVec64>, index, index) -> ()
func private @vector_resize_values(tensor<?xi32, #SparseVec64>, index) -> ()

func @main_func(%matrix: tensor<?x?xi32, #CSR64>) -> tensor<?xi32, #SparseVec64> {
  %c0_i32 = constant 0 : i32

  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c2 = constant 2 : index

  %nrows = tensor.dim %matrix, %c0 : tensor<?x?xi32, #CSR64>
  
  %matrix_pointers = sparse_tensor.pointers %matrix, %c1 : tensor<?x?xi32, #CSR64> to memref<?xi64>
  %matrix_values = sparse_tensor.values %matrix : tensor<?x?xi32, #CSR64> to memref<?xi32>

  %output = call @vector_empty(%matrix, %c1) : (tensor<?x?xi32, #CSR64>, index) -> tensor<?xi32, #SparseVec64>
  call @vector_resize_dim(%output, %c0, %nrows) : (tensor<?xi32, #SparseVec64>, index, index) -> ()

  %output_nnz = scf.for %matrix_row_index = %c0 to %nrows 
      step %c1 
      iter_args(%num_non_empty_rows = %c0) -> (index) {
    %next_matrix_row_index = addi %matrix_row_index, %c1 : index
    %first_ptr_64 = memref.load %matrix_pointers[%matrix_row_index] : memref<?xi64>
    %second_ptr_64 = memref.load %matrix_pointers[%next_matrix_row_index] : memref<?xi64>
    %row_is_empty = std.cmpi "eq", %first_ptr_64, %second_ptr_64 : i64
    %updated_num_non_empty_rows = scf.if %row_is_empty -> (index) {
      scf.yield %num_non_empty_rows : index
    } else {
      %incremented_num_non_empty_rows = addi %num_non_empty_rows, %c1 : index
      scf.yield %incremented_num_non_empty_rows : index
    }
    scf.yield %updated_num_non_empty_rows : index
  }
  call @vector_resize_pointers(%output, %c0, %c2) : (tensor<?xi32, #SparseVec64>, index, index) -> ()
  call @vector_resize_index(%output, %c0, %output_nnz) : (tensor<?xi32, #SparseVec64>, index, index) -> ()
  call @vector_resize_values(%output, %output_nnz) : (tensor<?xi32, #SparseVec64>, index) -> ()

  %output_pointers = sparse_tensor.pointers %output, %c0 : tensor<?xi32, #SparseVec64> to memref<?xi64>
  %output_nnz_i64 = index_cast %output_nnz : index to i64
  memref.store %output_nnz_i64, %output_pointers[%c1] : memref<?xi64>

  %output_indices = sparse_tensor.indices %output, %c0 : tensor<?xi32, #SparseVec64> to memref<?xi64>
  %output_values = sparse_tensor.values %output : tensor<?xi32, #SparseVec64> to memref<?xi32>

  scf.for %row_index = %c0 to %nrows
      step %c1 
      iter_args(%output_values_position = %c0) -> (index) {
    %ptr_64 = memref.load %matrix_pointers[%row_index] : memref<?xi64>
    %ptr = index_cast %ptr_64 : i64 to index
    %next_row_index = addi %row_index, %c1 : index
    %next_ptr_64 = memref.load %matrix_pointers[%next_row_index] : memref<?xi64>
    %next_ptr = index_cast %next_ptr_64 : i64 to index

    %row_is_non_empty = std.cmpi "ne", %ptr_64, %next_ptr_64 : i64
    %next_output_values_position = scf.if %row_is_non_empty -> (index) {
      
      %row_sum = scf.for %current_ptr = %ptr to %next_ptr
            step %c1 
            iter_args(%current_sum = %c0_i32) -> (i32) {
        %row_value = memref.load %matrix_values[%current_ptr] : memref<?xi32>
        %updated_sum = addi %row_value, %current_sum : i32
        scf.yield %updated_sum : i32
      }
      
      memref.store %row_sum, %output_values[%output_values_position] : memref<?xi32>
      %row_index_i64 = index_cast %row_index : index to i64
      memref.store %row_index_i64, %output_indices[%output_values_position] : memref<?xi64>
      
      %updated_output_values_position = addi %output_values_position, %c1 : index
      scf.yield %updated_output_values_position : index
      
    } else {
      scf.yield %output_values_position : index
    }
    
    scf.yield %next_output_values_position : index
    
  }

  return %output : tensor<?xi32, #SparseVec64>
}
}

Automatically generate test cases for GraphBLAS MLIR dialect

All of our GraphBLAS dialect tests are currently run through FileCheck. These test cases are limited to the ones in the mlir_graphblas/src/test/GraphBLAS/*.mlir files.

Instead of only testing cases we explicitly write by hand, we'd benefit from a way of automatically generating test cases, e.g. testing that an op works with all tensor element types (to do this currently, we'd have to explicitly write a bunch of boilerplate tests that explicitly use all the possible element subtypes; it'd be nice to write one test that uses a for-loop).

We might be able to do this by changing the contents of mlir-graphblas/mlir_graphblas/src/test/lit.cfg.py.

Add `while` and `if` to `MLIRFunctionBuilder`

MLIRFunctionBuilder contains methods for writing a scf.for loop. Do the same for scf.while and scf.if.

Then update existing algorithms and replace the irb.add_statement() lines.

Sparsity safety checking with mlir_graphblas.sparse_utils.MLIRSparseTensor

Let's say we compile this MLIR code via the JIT engine:

#trait_sum_reduction = {
  indexing_maps = [
    affine_map<(i,j,k) -> (i,j,k)>,
    affine_map<(i,j,k) -> ()>
  ],
  sparse = [
    [ "S", "S", "S" ],
    [ ]
  ],
  iterator_types = ["reduction", "reduction", "reduction"],
  doc = "Sparse Tensor Sum"
}

!SparseTensor = type !llvm.ptr<i8>

func @sparse_sum(%argA: !SparseTensor) -> f32 {
  %output_storage = constant dense<0.0> : tensor<f32>
  %arga = linalg.sparse_tensor %argA : !SparseTensor to tensor<10x20x30xf32>
  %reduction = linalg.generic #trait_sum_reduction
     ins(%arga: tensor<10x20x30xf32>)
    outs(%output_storage: tensor<f32>) {
      ^bb(%a: f32, %x: f32):
        %0 = addf %x, %a : f32
        linalg.yield %0 : f32
  } -> tensor<f32>
  %answer = tensor.extract %reduction[] : tensor<f32>
  return %answer : f32
}

Let's say we generated an instance of mlir_graphblas.sparse_utils.MLIRSparseTensor with an incompatible sparsity layout, i.e. isn't sparse in all dimensions.

This will likely lead to problems down the line.

We can and should detect this when the JIT engine encodes our inputs to something callable by ctypes.

Returning the same sparse tensor multiple times in IR builder causes errors.

good.py

from mlir_graphblas import MlirJitEngine
from mlir_graphblas.mlir_builder import MLIRFunctionBuilder, GRAPHBLAS_PASSES
from mlir_graphblas.algorithms import _build_common_aliases

import numpy as np

engine = MlirJitEngine()

irb = MLIRFunctionBuilder(
    "good_func",
    input_types=[],
    return_types=["tensor<?xi64, #CV64>"],
    aliases=_build_common_aliases(),
)
 
c0 = irb.constant(0, "index")
c1 = irb.constant(1, "index")
c8 = irb.constant(8, "index")
c3_i64 = irb.constant(3, "i64")

vec = irb.util.new_sparse_tensor("tensor<?xi64, #CV64>", c8)
vec_ptr8 = irb.util.tensor_to_ptr8(vec)
irb.util.resize_sparse_index(vec_ptr8, c0, c1)
irb.util.resize_sparse_values(vec_ptr8, c1)

vec_indices = irb.sparse_tensor.indices(vec, c0)
vec_values = irb.sparse_tensor.values(vec)
irb.memref.store(c3_i64, vec_indices, c0)
irb.memref.store(c3_i64, vec_values, c0)

irb.return_vars(vec)

good_func = irb.compile(engine=engine, passes=GRAPHBLAS_PASSES)

ans = good_func()

print()
print(f"ans.pointers {repr(ans.pointers)}")
print(f"ans.indices {repr(ans.indices)}")
print(f"ans.values {repr(ans.values)}")
print(f"ans.toarray() {repr(ans.toarray())}")
print()

bad.py

from mlir_graphblas import MlirJitEngine
from mlir_graphblas.mlir_builder import MLIRFunctionBuilder, GRAPHBLAS_PASSES
from mlir_graphblas.algorithms import _build_common_aliases

import numpy as np

engine = MlirJitEngine()

irb = MLIRFunctionBuilder(
    "bad_func",
    input_types=[],
    return_types=["tensor<?xi64, #CV64>", "tensor<?xi64, #CV64>"],
    aliases=_build_common_aliases(),
)
 
c0 = irb.constant(0, "index")
c1 = irb.constant(1, "index")
c8 = irb.constant(8, "index")
c3_i64 = irb.constant(3, "i64")

vec = irb.util.new_sparse_tensor("tensor<?xi64, #CV64>", c8)
vec_ptr8 = irb.util.tensor_to_ptr8(vec)
irb.util.resize_sparse_index(vec_ptr8, c0, c1)
irb.util.resize_sparse_values(vec_ptr8, c1)

vec_indices = irb.sparse_tensor.indices(vec, c0)
vec_values = irb.sparse_tensor.values(vec)
irb.memref.store(c3_i64, vec_indices, c0)
irb.memref.store(c3_i64, vec_values, c0)

irb.return_vars(vec, vec)

bad_func = irb.compile(engine=engine, passes=GRAPHBLAS_PASSES)

first_vec, second_vec = bad_func()

print()
print(f"first_vec.pointers {repr(first_vec.pointers)}")
print(f"first_vec.indices {repr(first_vec.indices)}")
print(f"first_vec.values {repr(first_vec.values)}")
print(f"first_vec.toarray() {repr(first_vec.toarray())}")
print()
print(f"second_vec.pointers {repr(second_vec.pointers)}")
print(f"second_vec.indices {repr(second_vec.indices)}")
print(f"second_vec.values {repr(second_vec.values)}")
print(f"second_vec.toarray() {repr(second_vec.toarray())}")
print()

Results

(mlir_graphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 /tmp/good.py 
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

ans.pointers (array([0, 0], dtype=uint64),)
ans.indices (array([3], dtype=uint64),)
ans.values array([3])
ans.toarray() array([0, 0, 0, 3, 0, 0, 0, 0])

(mlir_graphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 /tmp/bad.py 
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

first_vec.pointers (array([0, 0], dtype=uint64),)
first_vec.indices (array([3], dtype=uint64),)
first_vec.values array([3])
first_vec.toarray() array([0, 0, 0, 3, 0, 0, 0, 0])

second_vec.pointers (array([0, 0], dtype=uint64),)
second_vec.indices (array([3], dtype=uint64),)
second_vec.values array([3])
second_vec.toarray() array([0, 0, 0, 3, 0, 0, 0, 0])

python3(28303,0x10dbd5e00) malloc: *** error for object 0x7f949622c3c0: pointer being freed was not allocated
python3(28303,0x10dbd5e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
(mlir_graphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$

Both of these scripts run the same MLIR code. The only difference is that bad.py returns the same sparse vector twice. This causes an error on the Python-side for some reason. Not sure if this is a bug or if I'm doing something wrong.

CC @eriknw @jim22k

JIT Engine has different behavior when compiling functions individually vs in one combined string.

I've attached a notebook demonstrating the issue. See notebook.tar.gz.

This is probably just a subtle issue where I'm writing the MLIR incorrectly, but I figured this issue would be worth mentioning.

Here's a demonstration.

import numpy as np
import mlir_graphblas
from mlir_graphblas.sparse_utils import MLIRSparseTensor

engine_good = mlir_graphblas.MlirJitEngine()
engine_bad = mlir_graphblas.MlirJitEngine()
STANDARD_PASSES = [
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--tensor-constant-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-std",
    "--convert-std-to-llvm",
]

csr_densify_mlir_text = """
#trait_densify_csr = {
  indexing_maps = [
    affine_map<(i,j) -> (i,j)>,
    affine_map<(i,j) -> (i,j)>
  ],
  iterator_types = ["parallel", "parallel"]
}

#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @csr_densify2x2(%argA: tensor<2x2xf64, #CSR64>) -> tensor<2x2xf64> {
  %output_storage = constant dense<0.0> : tensor<2x2xf64>
  %0 = linalg.generic #trait_densify_csr
    ins(%argA: tensor<2x2xf64, #CSR64>)
    outs(%output_storage: tensor<2x2xf64>) {
      ^bb(%A: f64, %x: f64):
        linalg.yield %A : f64
    } -> tensor<2x2xf64>
  return %0 : tensor<2x2xf64>
}
"""

csc_densify_mlir_text = """
#trait_densify_csc = {
  indexing_maps = [
    affine_map<(i,j) -> (j,i)>,
    affine_map<(i,j) -> (i,j)>
  ],
  iterator_types = ["parallel", "parallel"]
}

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @csc_densify2x2(%argA: tensor<2x2xf64, #CSC64>) -> tensor<2x2xf64> {
  %output_storage = constant dense<0.0> : tensor<2x2xf64>
  %0 = linalg.generic #trait_densify_csc
    ins(%argA: tensor<2x2xf64, #CSC64>)
    outs(%output_storage: tensor<2x2xf64>) {
      ^bb(%A: f64, %x: f64):
        linalg.yield %A : f64
    } -> tensor<2x2xf64>
  return %0 : tensor<2x2xf64>
}
"""

indices = np.array(
    [
        [0, 1],
        [1, 0],
    ],
    dtype=np.uint64,
)
values = np.array([50,100], dtype=np.float64)
sizes = np.array([2, 2], dtype=np.uint64)
sparsity = np.array([False, True], dtype=np.bool8)
a = MLIRSparseTensor(indices, values, sizes, sparsity)

Working

engine_good.add(csr_densify_mlir_text, STANDARD_PASSES)

['csr_densify2x2']

engine_good.add(csc_densify_mlir_text, STANDARD_PASSES)

['csc_densify2x2']

engine_good.csr_densify2x2(a)

array([[  0.,  50.],
       [100.,   0.]])

Not Working

engine_bad.add(csr_densify_mlir_text+csc_densify_mlir_text, STANDARD_PASSES)

['csr_densify2x2', 'csc_densify2x2']

engine_bad.csr_densify2x2(a)

array([[  0., 100.],
       [ 50.,   0.]])

Turn our TableGen file into documentation?

I've noticed in recent PRs that I've had to update the docs as I've changed ops for the GraphBLAS dialect. A lot of the time, I'm simply copying and pasting from the TableGen file. I'm wondering if there's some magical way to have our docs use the TableGen file to generate documentation.

Dimension size ordering for CSC

We currently store the dimension sizes for both CSR and CSC as (nrows, ncols).
This makes it easy to get nrows. We always call memref.dim %tensor, %c0.

However, this storage scheme complicates finding nvals because the size of the pointers array depends on whether the storage fomat is CSR or CSC. For CSR, len(pointers) == nrows + 1 while for CSC, len(pointers) == ncols + 1.

Normally, this would be an irrelevant choice, but it seems that linalg requires storing the dimension sizes as (ncols, nrows) for CSC format. Attempting to densify a 3x4 CSC matrix using linalg.generic requires defining the input as tensor<4x3xf64, #CSC64> because the iteration order is (i, j) -> (j, i).

Defining the input as tensor<?x?xf64, #CSC64> avoids this issue and Aart's lowering pass does the right thing. I don't know if this means Aart ignores the dimensions in the input CSC or if he assumes they are always (nrows, ncols). But linalg definitely wants the dimensions in reversed order.

We should dig into this to see if this will be a problem going forward as we will hopefully be using linalg.generic a lot more in the future and don't want to be bitten by this after a lot of code is already written with assumptions baked in.

Matrix select random needs trimming

matrix_select_random isn't trimming the indices and values arrays properly. The tests are currently passing because densify_csr reads nnz from the pointers and then only reads that many positions from indices and `values. However, there is extra garbage left over in those arrays. To have a proper CSR format, we should trim those arrays.

So for example, the result might be a sparse CSR matrix that looks like:
pointers = [0, 1, 2] # last number is always nnz
indices = [1, 1, 0, 0, 1] # the last 3 shouldn't be here
values = [10, 20, 10, 20, 15] # the last 3 shouldn't be here

Calling resizeIndex and resizeValues at the end of the function should fix this.

Improve formatting of printed matrices

#227 adds support in graphblas.print to print dense, CSR, CSC, and CV tensors. When printing matrices, it would be nice if we could print them in such a way that all the values of a column line up vertically (similar to how NumPy prints matrices).

Question on NumRowsOp and NumColsOp

Hi All,
@seibert @paul-tqh-nguyen @jim22k @eriknw

I have a question regarding NumRowsOp and NumColsOp. To my understanding, both operations require a matrix, but NumRowOps verifier also checks that the passed tensor has the sparseEncodingAttribute. Why this behaviour? I think the operations should work even without the attribute. If the attribute is mandatory, why NumColsOp does not check it?

Thanks!

mlir-graphblas/mlir_graphblas/src/lib/GraphBLAS/GraphBLASOps.cpp

Line 126 in f387df6

static LogicalResult verify(NumRowsOp op) {

Add MLIRSparseTensor.toarray()

As pointed out here, having to include "densify" functions when using the JIT engine is brittle.

It'd be much nicer to have a method on the MLIRSparseTensor class that returned a dense NumPy array.

It'd be a good idea to update our documentation and tests that include a "densify" function as well to use this method to make them less verbose.

Update TableGen documentation for graphblas.matrix_multiply_generic

graphblas.matrix_multiply_generic currently has incomplete documentation.

def GraphBLAS_MatrixMultiplyGenericOp : GraphBLAS_Op<"matrix_multiply_generic", [NoSideEffect]> {
    let summary = "generic matrix multiply operation with an optional structural mask";
    let description = [{
        Performs a matrix multiply according to the given semiring and optional
        structural mask.  The given sparse tensors must be a matrix, i.e. have rank 2.
        The first input tensors must be CSR format, while the second input tensor must
        be CSC format.  The mask (if provided) must be CSR format.
    }];

    let arguments = (ins
      GraphBlasMatrixOrVectorOperand:$a,
      GraphBlasMatrixOrVectorOperand:$b,
      Optional<GraphBlasMatrixOrVectorOperand>:$mask,
      BoolAttr:$mask_complement);
    let results = (outs AnyType:$output);
    let regions = (region VariadicRegion<SizedRegion<1>>:$extensions);
    
    let assemblyFormat = [{
           $a `,` $b (`,` $mask^)? attr-dict `:` `(` type($a) `,` type($b)  (`,` type($mask)^)? `)` `to` type($output) $extensions
    }];

    let verifier = [{ return ::verify(*this); }];
}

We should update it and also update our overall documentation to be consistent with it.

mlir_graphblas.tools.utils.sparsify_array can't create an empty sparse array of

I have this script (named test.py):

import mlir_graphblas
import numpy as np
from mlir_graphblas.tools.utils import sparsify_array

dense_array = np.zeros(25, dtype=np.float64).reshape([5,5])

sparse_array = sparsify_array(dense_array, [False, True])

It gets this result:

(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 test.py
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    sparse_array = sparsify_array(dense_array, [False, True])
  File "/Users/pnguyen/code/mlir-graphblas/mlir_graphblas/tools/utils.py", line 24, in sparsify_array
    sparse_tensor = MLIRSparseTensor(indices, values, sizes, sparsity)
  File "mlir_graphblas/sparse_utils.pyx", line 520, in mlir_graphblas.sparse_utils.MLIRSparseTensor.__init__
    _build_sparse_tensor(self, indices, values, sizes, sparsity, pointer_type)
  File "mlir_graphblas/sparse_utils.pyx", line 725, in mlir_graphblas.sparse_utils._build_sparse_tensor
    raise ValueError(
ValueError: Second dimension of indices array must match length of sparsity and sizes arrays: 1 != 2
(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$

Move mlir_graphblas.tests.jit_engine_test_utils.sparsify_array into mlir_graphblas.tools.utils

The sparsify_array function in mlir_graphblas.tests.jit_engine_test_utils is quite useful outside of testing.

We should move it into a non-testing Python module. I think mlir_graphblas.tools.utils would be a good place to put it.

Update JIT Engine Example Documentation

Since we've updated our version of MLIR, the JIT engine docs are out of date.

Support the graphblas.throw op

It'd be nice to do some runtime checks in our MLIR code.

For example, if we have a function taking in a tensor<?x?xf64, #CSR64> where the function assumes the input is square, it'd be nice to actually check that at runtime.
Another example is when we pass in an int32 tensor when the function expects a tensor<?x?xf64, #CSR64>. We'd just get a segfault and lose a lot of time debugging.

We could solve these problems with safety checks that use exceptions. Since I was unable to find any MLIR-specific exception throwing functionality, I think the solution is to implement our own graphblas.throw.

graphblas.throw would lower down to a call to an external C/C++ function that just throws some exception. The LLVM dialect supports global strings that we could use for error messages. Alternatively, we could make the error message a compile-time constant/op attribute.
Since sparse tensor support currently boils down to a bunch of function calls to external C/C++ functions, we're not doing anything less clean than what we're already doing.

Argument to open explorer to specific page

When making example notebooks, it would be handy to have optional arguments to the explore() method that would select the tab and pass to show initially. This is helpful for demo notebooks where we might want to jump to a specific part of the translation process in an embedded explorer without having to tell the user where to click.

Pagerank crash

There seems to be an intermittent segfault in the pagerank test. So far I can only reproduce it on Linux with:

pytest -v --forked -k test_pagerank

(--forked comes from the pytest-forked plugin, which allows pytest to run each test in a sub-process for better isolation.)
I get no useful output besides:

==================================================================== test session starts ====================================================================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /home/sseibert/miniconda3/envs/mgng/bin/python
cachedir: .pytest_cache
rootdir: /home/sseibert/repo/mlir-graphblas, configfile: setup.cfg, testpaths: mlir_graphblas/tests
plugins: forked-1.3.0, cov-2.12.1
collected 178 items / 177 deselected / 1 selected

mlir_graphblas/tests/test_algorithms.py::test_pagerank Fatal Python error: Segmentation fault

Current thread 0x00007f0a3f9af740 (most recent call first):
  File "/home/sseibert/repo/mlir-graphblas/mlir_graphblas/engine.py", line 980 in python_callable
  File "/home/sseibert/repo/mlir-graphblas/mlir_graphblas/algorithms.py", line 470 in pagerank
  File "/home/sseibert/repo/mlir-graphblas/mlir_graphblas/tests/test_algorithms.py", line 160 in test_pagerank
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pytest_forked/__init__.py", line 62 in runforked
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/py/_process/forkedfunc.py", line 65 in _child
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/py/_process/forkedfunc.py", line 50 in __init__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pytest_forked/__init__.py", line 67 in forked_run_report
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pytest_forked/__init__.py", line 46 in pytest_runtest_protocol
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
  File "/home/sseibert/miniconda3/envs/mgng/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/home/sseibert/miniconda3/envs/mgng/bin/pytest", line 11 in <module>
FAILED                                                                                         [100%]

========================================================================= FAILURES ==========================================================================
_______________________________________________________________________ test_pagerank _______________________________________________________________________
:-1: running the test CRASHED with signal 11

Reformat C++ to match MLIR style

As per this comment: #121 (comment)

We can use clang-format to format our C++ in the style used by MLIR. We should do this once to bring everything up to spec, and then verify formatting in CI, just as we do with black for Python.

clang-format is in the clang-tools conda package.

Update to newer MLIR

We're currently stuck at llvm/llvm-project@3a2ff98 because of later changes to sparse tensor storage and handling of CSC matrices. Once we can update again, we can remove builtin from func and module. These changes were made as part of #149, but had to be reverted by #161 temporarily once we understood the cause of the subsequent failures more clearly.

The change that seems to trigger the problem is https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp#L233-L261 which automatically permutes the indices before calling sparseDimSize. I'm assuming there is a corresponding change to SparseUtils.cpp that we need to incorporate for this to work, because without that change, it appears that row and column sizes are backwards for CSC matrices.

Once we are ready fix this issue (likely first week of October), we can revert this commit:

04e4779

cc: @chelini

sparsify_array(np.arange(10, dtype=np.int16), [True]).values returns inconsistent results

It's unclear if this is a bug or if this is just bad practice on my part.

I expect this

    a = sparsify_array(np.arange(10, dtype=np.int16), [True])
    print(a.values)

to have the same results as this

    print(sparsify_array(np.arange(10, dtype=np.int16), [True]).values)

That's not the case as demonstrated by the script below.

Click here to see the script.

import numpy as np

from mlir_graphblas.tests.jit_engine_test_utils import sparsify_array

np.set_printoptions(linewidth=float('inf'))

print()

print("Expected:")
for _ in range(10):
  a = sparsify_array(np.arange(10, dtype=np.int16), [True])
  print(a.values)

print()

print("Unexpected:")
for _ in range(10):
  print(sparsify_array(np.arange(10, dtype=np.int16), [True]).values)

print()

Click here to see the script's result.


(mlirgraphblas) pnguyen@CONDA-0584:/$ python3 /tmp/script.py 

Expected:
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]
[1 2 3 4 5 6 7 8 9]

Unexpected:
[  2311   1023   -251     -1 -23069   8877 -18466   5755   6192]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]
[    0     0     0 12288     0     0     0 12288     3]

(mlirgraphblas) pnguyen@CONDA-0584:/$

On the surface, it looks like the sparse array simply gets deallocated immediately before the access to the values attribute happens in sparsify_array(np.arange(10, dtype=np.int16), [True]).values. Not sure if this is a bug or just an expected sharp edge of using Python.

CC @eriknw

explore() twice creates javascript error

If I re-run the explore() method on a DebugResult in a Jupyter cell, I see this error:

If I scroll down far enough, I do see the Panel widget still.

graphblas.matrix_reduce_to_vector is missing from functions.py

functions.py is missing a class corresponding to graphblas.matrix_reduce_to_vector. We should add one for completeness.

Make a GraphBLAS op for creating an new sparse tensor given the number of non-zero elements

It would be nice to have a GraphBLAS op for creating a new sparse matrix given the number of non-zero elements.

The shape and element type could be inferred from the op, e.g.

%new_tensor = graphbas.new_tensor %nnz : tensor<6x4xi32, #CSR64>

In cases where certain dimension sizes are not fixed, I think an approach like that taken by memref.alloc would be good (see the MLIR docs).

%c4 = constant 4 : index
%new_tensor = graphbas.new_tensor %nnz(%c4) : tensor<6x?xi32, #CSR64> // creates a 6x4 matrix

Use scipy.sparse in testing to avoid creating many matrix densifying functions

Currently, we have many functions like csr_densify5x5 and csc_densify8x8 in many of our tests.

Since many of our test files use their own JIT engine fixtures whose scope is just the test file, we have to redundantly have this same densifying code in many test files.

It'd be nicer to have a Python function that takes in a MLIRSparseTensor and returns a np.ndarray.

It might look something like this:

import numpy as np
import scipy.sparse as ss
from mlir_graphblas.sparse_utils import MLIRSparseTensor

def densify_csc(tensor_csc: MLIRSparseTensor) -> np.ndarray:
    ss_tensor_csc = ss.csc_matrix(
        (tensor_csc.values, tensor_csc.indices[1], tensor_csc.pointers[1]),
        shape=(
            tensor_csc.get_dimsize(0),
            tensor_csc.get_dimsize(1)
        )
    )
    return ss_tensor_csc.toarray()

mlir_graphblas/tests/jit_engine_test_utils.py would be an appropriate place to put it.

We'd need one for CSR as well. It might be worth investigating the same for sparse vectors.

This would allow us to get rid of all the densifying functions in our JIT engine tests. This would also allow us to have one densifying function for all dtypes.

This would require us to have scipy in our development environments, which would require us to change some of the environment yml files so that our CI jobs don't fail.

We could use this code to densify vectors:

def densify_vector(sparse_vec: MLIRSparseTensor) -> np.ndarray:
    (vec_length,) = sparse_vec.shape
    dense_vec = np.zeros(vec_length, dtype=sparse_vec.values.dtype)
    dense_vec[sparse_vec.indices] = sparse_vec.values
    return dense_vec

Make channel for mlir package explicit in environment yml file.

The contents of mlir-graphblas/continuous_integration/environment-3.8.yml is currently:

name: mg

channels:
- conda-forge
- metagraph

dependencies:
# dev environment
  - python=3.8
  - coverage
  - pytest
  - pytest-cov
  - black=19.10b0


# dependencies (so setup.py develop doesn't pip install them)
  - grblas
  - mlir
  - donfig

Regarding the mlir package, we might want to change it to metagraph::mlir so that a human reading this file might not get it confused with the mlir package on conda-forge https://anaconda.org/conda-forge/mlir

Replace "#SparseVec64" with "CV64"

In several places throughout our code base (e.g. in tersify_mlir) and documentation, we have two names for sparse vector encodings, #SparseVec64 and #CV64.

#SparseVec64 = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed" ],
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

We should be consistent and pick one. I vote for #CV64.

Building MLIR-GraphBlas

Dear All,
I am excited about the project. Do you have any documentation other than the website that illustrates the graphs blas dialect in more detail? How do you plan to evolve it? Currently, it seems a simple addition on top of the sparse dialect, already available in MLIR.

In addition, I am not able to install the software with build.py. Get the error message:

Traceback (most recent call last):
  File "build.py", line 114, in <module>
    build_graphblas_opt(args.build_clean)
  File "build.py", line 68, in build_graphblas_opt
    (PYTHONPATH,) = site.getsitepackages()
ValueError: too many values to unpack (expected 1)

Any tip on how to solve it?

Implement graphblas.diag op

We need the graphblas.diag op.

It does two things:

Takes a sparse vector and returns a sparse square matrix with the vector's elements along the diagonal.
Takes a sparse square matrix and returns a sparse vector containing the matrix's diagonal elements.

Guiding example of matrix → vector.

import numpy as np

from mlir_graphblas import MlirJitEngine
from mlir_graphblas.tests.jit_engine_test_utils import *

engine = MlirJitEngine()

mlir_text = """

#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (i,j)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (j,i)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>

#SparseVec64 = #sparse_tensor.encoding<{
dimLevelType = [ "compressed" ],
pointerBitWidth = 64,
indexBitWidth = 64
}>

module {

func private @vector_empty(tensor<?x?xf64, #CSR64>, index) -> tensor<?xf64, #SparseVec64>
func private @vector_resize_dim(tensor<?xf64, #SparseVec64>, index, index) -> ()
func private @vector_resize_pointers(tensor<?xf64, #SparseVec64>, index, index) -> ()
func private @vector_resize_index(tensor<?xf64, #SparseVec64>, index, index) -> ()
func private @vector_resize_values(tensor<?xf64, #SparseVec64>, index) -> ()

func @mat_to_vec(%matrix: tensor<?x?xf64, #CSR64>) -> tensor<?xf64, #SparseVec64> {

  %c1_i1 = constant 1 : i1
  %c0_f64 = constant 0.0 : f64
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c2 = constant 2 : index

  %nrows = tensor.dim %matrix, %c0 : tensor<?x?xf64, #CSR64>

  %matrix_pointers = sparse_tensor.pointers %matrix, %c1 : tensor<?x?xf64, #CSR64> to memref<?xi64>
  %matrix_indices = sparse_tensor.indices %matrix, %c1 : tensor<?x?xf64, #CSR64> to memref<?xi64>
  %matrix_values = sparse_tensor.values %matrix : tensor<?x?xf64, #CSR64> to memref<?xf64>

  %output = call @vector_empty(%matrix, %c1) : (tensor<?x?xf64, #CSR64>, index) -> tensor<?xf64, #SparseVec64>
  call @vector_resize_dim(%output, %c0, %nrows) : (tensor<?xf64, #SparseVec64>, index, index) -> ()

  // We do two loops, one to find the outoput vector's nnz 
  // and one to fill up the output's indices and values.
  // We have to get the nnz first to allocate space in the
  // output vector correctly.
  // We could do one loop where we do both.
  //  1) assume that the output nnz is some arbitrary size
  //  2) resize accordingly
  //  3) on each iteration,
  //        - store the values and indices
  //        - track the correct nnz
  //        - if we reach the limit of whatever size the
  //          output vector is, resize (double it or 
  //          something like that)
  //  4) resize the output vector to the correct nnz
  // It's unclear which approach is more performant since
  // it depends on how accurate our arbitrary guesses are
  // for initial size and how much we should resize.

  %output_nnz = scf.for %matrix_row_index = %c0 to %nrows
      step %c1
      iter_args(%num_diagonal_containing_rows = %c0) -> (index) {
    %next_matrix_row_index = addi %matrix_row_index, %c1 : index

    %first_ptr_64 = memref.load %matrix_pointers[%matrix_row_index] : memref<?xi64>
    %second_ptr_64 = memref.load %matrix_pointers[%next_matrix_row_index] : memref<?xi64>

    %first_ptr = index_cast %first_ptr_64 : i64 to index
    %second_ptr = index_cast %second_ptr_64 : i64 to index

    %matrix_row_index_i64 = index_cast %matrix_row_index : index to i64
    %throw_away_ptr, %diagonal_not_found = scf.while (
                %ptr = %first_ptr,
                %diagonal_position_not_found = %c1_i1
           ) : (index, i1) -> (index, i1) {
      %more_ptrs = std.cmpi "ult", %ptr, %second_ptr : index
      %continue = std.and %more_ptrs, %diagonal_position_not_found : i1
      scf.condition(%continue) %ptr, %diagonal_position_not_found : index, i1
    } do {
      ^bb0(%current_ptr: index, %throw_away: i1):
        %element_column_index_i64 = memref.load %matrix_indices[%current_ptr] : memref<?xi64>
        %is_not_diagonal_position = std.cmpi "ne", %element_column_index_i64, %matrix_row_index_i64 : i64
        %next_ptr = addi %current_ptr, %c1 : index
        scf.yield %next_ptr, %is_not_diagonal_position : index, i1
    }

    %updated_num_diagonal_containing_rows = scf.if %diagonal_not_found -> (index) {
      scf.yield %num_diagonal_containing_rows : index
    } else {
      %incremented_num_diagonal_containing_rows = addi %num_diagonal_containing_rows, %c1 : index
      scf.yield %incremented_num_diagonal_containing_rows : index
    }

     scf.yield %updated_num_diagonal_containing_rows : index
  }

  call @vector_resize_pointers(%output, %c0, %c2) : (tensor<?xf64, #SparseVec64>, index, index) -> ()
  call @vector_resize_index(%output, %c0, %output_nnz) : (tensor<?xf64, #SparseVec64>, index, index) -> ()
  call @vector_resize_values(%output, %output_nnz) : (tensor<?xf64, #SparseVec64>, index) -> ()

  %output_pointers = sparse_tensor.pointers %output, %c0 : tensor<?xf64, #SparseVec64> to memref<?xi64>
  %output_nnz_i64 = index_cast %output_nnz : index to i64
  memref.store %output_nnz_i64, %output_pointers[%c1] : memref<?xi64>

  %output_indices = sparse_tensor.indices %output, %c0 : tensor<?xf64, #SparseVec64> to memref<?xi64>
  %output_values = sparse_tensor.values %output : tensor<?xf64, #SparseVec64> to memref<?xf64>

  scf.for %row_index = %c0 to %nrows
      step %c1
      iter_args(%output_values_position = %c0) -> (index) {
    %first_ptr_64 = memref.load %matrix_pointers[%row_index] : memref<?xi64>
    %first_ptr = index_cast %first_ptr_64 : i64 to index
    %next_row_index = addi %row_index, %c1 : index
    %second_ptr_64 = memref.load %matrix_pointers[%next_row_index] : memref<?xi64>
    %second_ptr = index_cast %second_ptr_64 : i64 to index

    // instead of having a var for whether or not a diagonal value was found 
    // and the value itself, we could just track whether or not the diagonal
    // value is zero (or whatever the missing value represents). 
    // This will cause bugs with malformed sparse tensors that have the 
    // missing value in the values array.
    %row_index_i64 = index_cast %row_index : index to i64
    %throw_away_ptr, %diagonal_not_found, %diagonal_value = scf.while (
                %ptr = %first_ptr,
                %diagonal_position_not_found = %c1_i1,
                %diagonal_value = %c0_f64 // dummy value
           ) : (index, i1, f64) -> (index, i1, f64) {
      %more_ptrs = std.cmpi "ult", %ptr, %second_ptr : index
      %continue = std.and %more_ptrs, %diagonal_position_not_found : i1
      scf.condition(%continue) %ptr, %diagonal_position_not_found, %diagonal_value : index, i1, f64
    } do {
      ^bb0(%current_ptr: index, %diagonal_position_not_found: i1, %previous_diagonal_value: f64):
        %element_column_index_i64 = memref.load %matrix_indices[%current_ptr] : memref<?xi64>
        %is_not_diagonal_position = std.cmpi "ne", %element_column_index_i64, %row_index_i64 : i64

        %updated_diagonal_value = scf.if %is_not_diagonal_position -> (f64) {
          scf.yield %c0_f64 : f64 // dummy value
        } else {
          %actual_diagonal_value = memref.load %matrix_values[%current_ptr] : memref<?xf64>
          scf.yield %actual_diagonal_value : f64
        }

        %next_ptr = addi %current_ptr, %c1 : index
        scf.yield %next_ptr, %is_not_diagonal_position, %updated_diagonal_value : index, i1, f64
    }

    %next_output_values_position = scf.if %diagonal_not_found -> (index) {
      scf.yield %output_values_position : index
    } else {
      memref.store %diagonal_value, %output_values[%output_values_position] : memref<?xf64>
      memref.store %row_index_i64, %output_indices[%output_values_position] : memref<?xi64>

      %updated_output_values_position = addi %output_values_position, %c1 : index
      scf.yield %updated_output_values_position : index
    }

    scf.yield %next_output_values_position : index

  }

  return %output : tensor<?xf64, #SparseVec64>

}

}
"""

engine.add(mlir_text, GRAPHBLAS_PASSES)

dense_input_tensor = np.array(
  [
      [1, 0, 0, 0],
      [2, 0, 3, 4],
      [0, 0, 5, 0],
      [0, 0, 6, 7],
  ],
  dtype=np.float64,
)
input_tensor = sparsify_array(dense_input_tensor, [False, True])

sparse_ans = engine.mat_to_vec(input_tensor)
dense_ans = densify_vector(sparse_ans)

print()
print(f"dense_input_tensor \n{repr(dense_input_tensor)}")
print()
print(f"input_tensor.pointers[1] {repr(input_tensor.pointers[1])}")
print(f"input_tensor.indices[1]  {repr(input_tensor.indices[1])}")
print(f"input_tensor.values      {repr(input_tensor.values)}")
print()
print(f"sparse_ans.pointers[0] {repr(sparse_ans.pointers[0])}")
print(f"sparse_ans.indices[0]  {repr(sparse_ans.indices[0])}")
print(f"sparse_ans.values      {repr(sparse_ans.values)}")
print()
print(f"dense_ans {repr(dense_ans)}")
print()

Guiding example of vector → matrix.

import numpy as np

from mlir_graphblas import MlirJitEngine
from mlir_graphblas.tests.jit_engine_test_utils import *

engine = MlirJitEngine()

mlir_text = """

#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (i,j)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (j,i)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>

#SparseVec64 = #sparse_tensor.encoding<{
dimLevelType = [ "compressed" ],
pointerBitWidth = 64,
indexBitWidth = 64
}>

module {

func private @matrix_empty(tensor<?xf64, #SparseVec64>, index) -> tensor<?x?xf64, #CSR64>
func private @matrix_resize_dim(tensor<?x?xf64, #CSR64>, index, index) -> ()
func private @matrix_resize_values(tensor<?x?xf64, #CSR64>, index) -> ()
func private @matrix_resize_index(tensor<?x?xf64, #CSR64>, index, index) -> ()
func private @matrix_resize_pointers(tensor<?x?xf64, #CSR64>, index, index) -> ()

func @vec_to_mat(%vector: tensor<?xf64, #SparseVec64>) -> tensor<?x?xf64, #CSR64> {

  %c0_i64 = constant 0 : i64
  %c1_i64 = constant 1 : i64
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c2 = constant 2 : index

  %vector_length = tensor.dim %vector, %c0 : tensor<?xf64, #SparseVec64>
  %vector_indices = sparse_tensor.indices %vector, %c0 : tensor<?xf64, #SparseVec64> to memref<?xi64>
  %vector_values = sparse_tensor.values %vector : tensor<?xf64, #SparseVec64> to memref<?xf64>

  %output = 
    call @matrix_empty(%vector, %c2) : (tensor<?xf64, #SparseVec64>, index) -> tensor<?x?xf64, #CSR64>

  call @matrix_resize_dim(%output, %c0, %vector_length) : (tensor<?x?xf64, #CSR64>, index, index) -> ()
  call @matrix_resize_dim(%output, %c1, %vector_length) : (tensor<?x?xf64, #CSR64>, index, index) -> ()

  %vector_length_plus_one = addi %vector_length, %c1 : index
  call @matrix_resize_pointers
      (%output, %c1, %vector_length_plus_one) : (tensor<?x?xf64, #CSR64>, index, index) -> ()
  %output_nnz = graphblas.num_vals %vector : tensor<?xf64, #SparseVec64>
  call @matrix_resize_index(%output, %c1, %output_nnz) : (tensor<?x?xf64, #CSR64>, index, index) -> ()
  call @matrix_resize_values(%output, %output_nnz) : (tensor<?x?xf64, #CSR64>, index) -> ()

  %output_indices = sparse_tensor.indices %output, %c1 : tensor<?x?xf64, #CSR64> to memref<?xi64>
  %output_values = sparse_tensor.values %output : tensor<?x?xf64, #CSR64> to memref<?xf64>
  scf.for %output_position = %c0 to %output_nnz step %c1 {
    %matrix_column_index = memref.load %vector_indices[%output_position] : memref<?xi64>
    memref.store %matrix_column_index, %output_indices[%output_position] : memref<?xi64>
    
    %matrix_value = memref.load %vector_values[%output_position] : memref<?xf64>
    memref.store %matrix_value, %output_values[%output_position] : memref<?xf64>
  }

  %output_pointers = sparse_tensor.pointers %output, %c1 : tensor<?x?xf64, #CSR64> to memref<?xi64>
  %initial_vector_indices_value = memref.load %vector_indices[%c0] : memref<?xi64>
  scf.for %pointers_position = %c0 to %vector_length
      step %c1
      iter_args(
          %ptr_i64 = %c0_i64,
          %vector_indices_position = %c0,
          %vector_indices_value = %initial_vector_indices_value
      ) -> (i64, index, i64) {
    memref.store %ptr_i64, %output_pointers[%pointers_position] : memref<?xi64>

    %pointers_position_i64 = index_cast %pointers_position : index to i64
    %row_has_value = cmpi "eq", %vector_indices_value, %pointers_position_i64 : i64
    %updated_ptr_i64, %updated_vector_indices_position, %updated_vector_indices_value = 
        scf.if %row_has_value -> (i64, index, i64) {
      %next_ptr_i64 = addi %ptr_i64, %c1_i64 : i64
      %next_vector_indices_position = addi %vector_indices_position, %c1 : index
      %next_updated_vector_indices_value = 
        memref.load %vector_indices[%next_vector_indices_position] : memref<?xi64>
      scf.yield %next_ptr_i64, %next_vector_indices_position, %next_updated_vector_indices_value
        : i64, index, i64
    } else {
      scf.yield %ptr_i64, %vector_indices_position, %vector_indices_value : i64, index, i64
    }

    scf.yield %updated_ptr_i64, %updated_vector_indices_position, %updated_vector_indices_value
      : i64, index, i64
  }
  %output_nnz_i64 = index_cast %output_nnz : index to i64
  memref.store %output_nnz_i64, %output_pointers[%vector_length] : memref<?xi64>

  return %output : tensor<?x?xf64, #CSR64>

}

}

"""

engine.add(mlir_text, GRAPHBLAS_PASSES)

dense_input_tensor = np.array(
  [1, 0, 2, 3],
  dtype=np.float64,
)
input_tensor = sparsify_array(dense_input_tensor, [True])

sparse_ans = engine.vec_to_mat(input_tensor)
dense_ans = densify_csr(sparse_ans)

print()
print(f"dense_input_tensor {repr(dense_input_tensor)}")
print()
print(f"input_tensor.pointers[0] {repr(input_tensor.pointers[0])}")
print(f"input_tensor.indices[0]  {repr(input_tensor.indices[0])}")
print(f"input_tensor.values      {repr(input_tensor.values)}")
print()
print(f"sparse_ans.pointers[1]    {repr(sparse_ans.pointers[1])}")
print(f"sparse_ans.indices[1]     {repr(sparse_ans.indices[1])}")
print(f"sparse_ans.values         {repr(sparse_ans.values)}")
print(f"sparse_ans.get_dimsize(0) {repr(sparse_ans.get_dimsize(0))}")
print(f"sparse_ans.get_dimsize(1) {repr(sparse_ans.get_dimsize(1))}")
print()
print(f"dense_ans \n{repr(dense_ans)}")
print()

We should also add documentation for this op.

Document the exact MLIR commit

As we will be likely having to move from commit to commit on the MLIR main branch, we should document that commit hash in our build instructions. We can get that hash from here:

https://github.com/metagraph-dev/recipes/blob/main/mlir/meta.yaml#L4

Cannot properly represent i64 sparse tensor via mlir_graphblas.sparse_utils.MLIRSparseTensor

This script (click to expand) shows that we can't represent a sparse tensor of 64-bit integers via mlir_graphblas.sparse_utils.MLIRSparseTensor.

import numpy as np

from mlir_graphblas.tests.jit_engine_test_utils import *

dense_input_tensor = np.array(
  [0, 1, 2, 0, 0, 3, 0, 0],
  dtype=np.int64,
)
sparse_tensor = sparsify_array(dense_input_tensor, [True])

print()
print(f"sparse_tensor.pointers[0]    {repr(sparse_tensor.pointers[0])}")
print(f"sparse_tensor.indices[0]     {repr(sparse_tensor.indices[0])}")
print(f"sparse_tensor.values         {repr(sparse_tensor.values)}")
print(f"sparse_tensor.get_dimsize(0) {repr(sparse_tensor.get_dimsize(0))}")
print()

The result of the script's execution is:


(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 test.py

sparse_tensor.pointers[0]    array([0, 3], dtype=uint64)
sparse_tensor.indices[0]     array([1, 2, 5], dtype=uint64)
sparse_tensor.values         array([         1, 8589934592,          2])
sparse_tensor.get_dimsize(0) 8

(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$

This is making it difficult to support argmin and argmax in graphblas.reduce_to_vector and graphblas.reduce_to_scalar as those need to return tensors of i64 (since it appears that sparse tensors with elements of type index are not yet supported in MLIR).

Another example (click to expand).

import numpy as np

from mlir_graphblas import MlirJitEngine
from mlir_graphblas.tests.jit_engine_test_utils import *

engine = MlirJitEngine()

mlir_text = """

#SparseVec64 = #sparse_tensor.encoding<{
dimLevelType = [ "compressed" ],
pointerBitWidth = 64,
indexBitWidth = 64
}>

module {

builtin.func @main(%vector: tensor<?xi64, #SparseVec64>) -> () {
  %vector_values = sparse_tensor.values %vector : tensor<?xi64, #SparseVec64> to memref<?xi64>

  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c2 = constant 2 : index

  %c1_i64 = constant 1 : i64
  %c2_i64 = constant 2 : i64
  %c3_i64 = constant 3 : i64

  memref.store %c1_i64, %vector_values[%c0] : memref<?xi64>
  memref.store %c2_i64, %vector_values[%c1] : memref<?xi64>
  memref.store %c3_i64, %vector_values[%c2] : memref<?xi64>

  return
}

}

"""

engine.add(mlir_text, GRAPHBLAS_PASSES)

dense_input_tensor = np.array(
  [7, 8, 9],
  dtype=np.int64,
)
sparse_tensor = sparsify_array(dense_input_tensor, [True])

print()
print("Before:")
print()
print(f"sparse_tensor.pointers[0]    {repr(sparse_tensor.pointers[0])}")
print(f"sparse_tensor.indices[0]     {repr(sparse_tensor.indices[0])}")
print(f"sparse_tensor.values         {repr(sparse_tensor.values)}")
print(f"sparse_tensor.get_dimsize(0) {repr(sparse_tensor.get_dimsize(0))}")
print()

engine.main(sparse_tensor)
dense_ans = densify_vector(sparse_tensor)

np.set_printoptions(linewidth=float("inf"))

print()
print("After:")
print()
print(f"sparse_tensor.pointers[0]    {repr(sparse_tensor.pointers[0])}")
print(f"sparse_tensor.indices[0]     {repr(sparse_tensor.indices[0])}")
print(f"sparse_tensor.values         {repr(sparse_tensor.values)}")
print(f"sparse_tensor.get_dimsize(0) {repr(sparse_tensor.get_dimsize(0))}")
print()

Execution:


(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$ python3 test.py

Before:

sparse_tensor.pointers[0]    array([0, 3], dtype=uint64)
sparse_tensor.indices[0]     array([0, 1, 2], dtype=uint64)
sparse_tensor.values         array([          7, 34359738368,           8])
sparse_tensor.get_dimsize(0) 3


After:

sparse_tensor.pointers[0]    array([0, 3], dtype=uint64)
sparse_tensor.indices[0]     array([0, 1, 2], dtype=uint64)
sparse_tensor.values         array([         1, 8589934592,          2])
sparse_tensor.get_dimsize(0) 3

(mlirgraphblas) pnguyen@CONDA-0584:/Users/pnguyen/code/mlir-graphblas$

Note that the weird values aren't arbitrary garbage values in memory. Running the above examples gets the same result every time.

CC: @eriknw @jim22k

call* functions in GraphBLASUtils.cpp cause errors when called on fixed-size tensors of different size

This (let's call it Example A) lowers through graphblas-opt --graphblas-lower with no problem:

#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {

    func @matrix_select_triu_100(%sparse_tensor: tensor<100x100xf64, #CSR64>) -> tensor<100x100xf64, #CSR64> {
        %answer = graphblas.matrix_select %sparse_tensor { selectors = ["triu"] } : tensor<100x100xf64, #CSR64> to tensor<100x100xf64, #CSR64>
        return %answer : tensor<100x100xf64, #CSR64>
    }

}

This (let's call it Example B) also lowers through graphblas-opt --graphblas-lower with no problem:

#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {

    func @matrix_select_triu_200(%sparse_tensor: tensor<200x200xf64, #CSR64>) -> tensor<200x200xf64, #CSR64> {
        %answer = graphblas.matrix_select %sparse_tensor { selectors = ["triu"] } : tensor<200x200xf64, #CSR64> to tensor<200x200xf64, #CSR64>
        return %answer : tensor<200x200xf64, #CSR64>
    }

}

This (let's call it Example C) does not:


#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

module {

    func @matrix_select_triu_100(%sparse_tensor: tensor<100x100xf64, #CSR64>) -> tensor<100x100xf64, #CSR64> {
        %answer = graphblas.matrix_select %sparse_tensor { selectors = ["triu"] } : tensor<100x100xf64, #CSR64> to tensor<100x100xf64, #CSR64>
        return %answer : tensor<100x100xf64, #CSR64>
    }

    func @matrix_select_triu_200(%sparse_tensor: tensor<200x200xf64, #CSR64>) -> tensor<200x200xf64, #CSR64> {
        %answer = graphblas.matrix_select %sparse_tensor { selectors = ["triu"] } : tensor<200x200xf64, #CSR64> to tensor<200x200xf64, #CSR64>
        return %answer : tensor<200x200xf64, #CSR64>
    }

}

This problem stems from the following:

Example A declares this function:
builtin.func private @dup_matrix(tensor<100x100xf64, #CSX64>) -> tensor<100x100xf64, #CSX64>
Example B declares this function:
builtin.func private @dup_matrix(tensor<200x200xf64, #CSX64>) -> tensor<200x200xf64, #CSX64>
Example C attempts to declare and use both of the above, which causes problems.

A valid (partial) solution would be to:

Always declare:
builtin.func private @dup_matrix(tensor<?x?xf64, #CSX64>) -> tensor<?x?xf64, #CSX64>
When a call* function from GraphBLASUtils.cpp is used in the lowering code, the function should insert tensor.cast before and after the MLIR assembly that calls dup_matrix (or whatever function name is relevant) where dup_matrix takes a tensor<?x?xf64, #CSX64>.

This is only a partial solution since this same problem might occur if we tried use callDupTensor to duplicate an f64 tensor and i64 tensor in the same MLIR module or if we tried to use callDupTensor to duplicate a CSR tensor and CSC tensor in the same MLIR module.

Use tersify_mlir in FileCheck tests

In FileCheck tests where we have // RUN: graphblas-opt %s | graphblas-opt --graphblas-lower | FileCheck %s, it might be worth changing it to // RUN: graphblas-opt %s | graphblas-opt --graphblas-lower | tersify_mlir | FileCheck %s since this'll make the tests less verbose.

Explorer link to download MLIR from panel

While flipping through MLIR passes, I realized it would be handy to have a link below the code panel to download the MLIR shown in the browser to be able to email, process further, etc.

Functional tests for secondi, firsti, and overlapi

The secondi, firsti, and overlapi choices for the multiplication part of a semiring were added while working on #224. This happened while we were in the middle of converting our FileCheck tests from syntax/translation-checking tests into functional tests. Thus, the task of creating functional tests for secondi, firsti, and overlapi was put on hold.

At some point, we should add functional tests for secondi, firsti, and overlapi.

These tests should:

Allow us to have the output tensor have an integer or floating point type regardless of what the input tensors' types are.
Verify behavior of all combinations of {mat,vec} x {mat,vec}.
Verify that the behavior of matrix multiply fusions with secondi, firsti, and overlapi are sane.

Additionally, it'll be nice to:

Sanity check that our docs are updated with more explanation of how secondi, firsti, and overlapi work (especially with the different output tensor types).
Add more verification to the verifier for MatrixMultiplyGenericOp (e.g. checking that each block has the correct number of arguments, argument types, etc. ; ExtensionBlocks::extractBlocks is a great place to start doing this or investigating this).

Do not assume that getsitepackages() has a single entry in build.py

From this issue:

#90 (comment)

It seems that we are assuming that the return value of site.getsitepackages() will have only one element, but it can have multiple values.