Coder Social home page Coder Social logo

Remove dependency on bindgen about rsmpi HOT 14 CLOSED

rsmpi avatar rsmpi commented on May 31, 2024 1
Remove dependency on bindgen

from rsmpi.

Comments (14)

bsteinb avatar bsteinb commented on May 31, 2024

Hi there!

I understand your concerns about having bindgen as a build dependency. There is indeed some tension between the policies regarding selection of software versions in HPC environments (where often in my experience a somewhat conservative choice is made) and having a quite recent version of Clang as a requirement. I do not think it is quite as bad as you make it seem (I am not convinced #20 or #22 are connected to bindgen), but I concede that not having bindgen as a build dependency would be less painful (for build times alone).

Your observations as to why bindgen is used in rsmpi are correct, as is your conjecture that the FFI declarations could – in principle – be pre-generated and shipped along rsmpi. The problem with this strategy is precisely as you say

pre-generating the FFI declaration for all the different MPI implementations and versions

and

generating the declaration for every single implementation and every single version is not going to be practical

I rarely get around to working on rsmpi and I do not think the project is at a point where I should devote my time to simplifying the installation procedure for production users. However, if you offer to make a contribution in this area, I am inclined to accept it, especially since – as it only concerns the build infrastructure – it should not influence refactorings of the library itself.

An acceptable contribution should – I think – contain at least the following:

  1. Both mechanisms (pre-generated FFI declarations and build-time bindgen) should still be available, one the default (preferably the pre-generated FFI declarations) and the other via a cargo feature.
  2. A well-researched algorithm for inspecting all relevant aspects of the current build environment (this includes finding out which aspects are relevant in the first place: MPI vendor, MPI version, host triple, rustc version, ...) and determining whether an appropriate pre-generated FFI declaration is available
  3. Pre-generated bindings for at least those MPICH and Open MPI versions that are tested against on Travis
  4. An easy way of using the build-time bindgen mechanism to add new pre-generated FFI declarations
  5. Documentation

If this list of requirements makes this task too daunting, I completely understand, I already admitted that I am also not willing to work on this (at least for now). However I feel like anything less would only make this a brittle work-around. If you do still want to work on it, go for it!

Something that could make this task easier is this initiative by the MPICH project which aims to offer a somewhat stable ABI across certain versions of MPICH and various other MPI libraries based on it http://www.mpich.org/abi/. However, the information on that page seems to be a bit stale. Similar information for Open MPI can be found here https://www.open-mpi.org/software/ompi/versions/.

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

Thank you for the quick answer!

Both mechanisms (pre-generated FFI declarations and build-time bindgen) should still be available, one the default (preferably the pre-generated FFI declarations) and the other via a cargo feature.

Yes, this is how I see this implemented too.

A well-researched algorithm for inspecting all relevant aspects of the current build environment (this includes finding out which aspects are relevant in the first place: MPI vendor, MPI version, host triple, rustc version, ...) and determining whether an appropriate pre-generated FFI declaration is available

I feel like this would be the hardest part. I don't really know that much about MPI, so I guess the following would be relevant:

  • MPI vendor;
  • MPI version;
  • host-triple;

I don't see rustc version relevant here, the FFI will always work the same due to backward compatibility requirements. But as we are talking about C software here, maybe the C compiler used will have an influence? Or some compiler flags maybe too (like how in fortran you can specify the default size of integer at the command line)? Or is this all abstracted by mpicc?

An easy way of using the build-time bindgen mechanism to add new pre-generated FFI declarations

This should be as easy as copying the generated file and adding some lines in build.rs to emit the corresponding cfg.

Something that could make this task easier is this initiative by the MPICH project which aims to offer a somewhat stable ABI across certain versions of MPICH and various other MPI libraries based on it http://www.mpich.org/abi/. However, the information on that page seems to be a bit stale. Similar information for Open MPI can be found here https://www.open-mpi.org/software/ompi/versions/.

I did not knew about this, this is very nice! Does this mean that all the listed implementation on the MPICH page are ABI compatible? And maybe the compatibility extends to the following compatible releases (for some definition of compatible ^^)

from rsmpi.

bsteinb avatar bsteinb commented on May 31, 2024

I feel like this would be the hardest part. I don't really know that much about MPI, so I guess the following would be relevant:

I agree this is probably the largest chunk of work. It is not really about MPI though. None of this is specified by the standard.

  • MPI vendor;
  • MPI version;
  • host-triple;

Yes, note that MPI version here means MPI library version (as in Open MPI 1.10.0) not the version of the MPI standard.

I don't see rustc version relevant here, the FFI will always work the same due to backward compatibility requirements.

Yeah, I am pretty sure that it is not of concern at the moment. I do think bindgen can emit things that go beyond just type declarations, like impls for Clone. One could think of a scenario where this extra stuff could in the future come to rely on rustc features that are not backwards compatible. Probably I am just being too paranoid.

But as we are talking about C software here, maybe the C compiler used will have an influence? Or some compiler flags maybe too (like how in fortran you can specify the default size of integer at the command line)? Or is this all abstracted by mpicc?

Your guess is as good as mine. I would say if the headers of an MPI library built with two different C compilers are the same than the compiler does not matter.

One other thing I just thought of: it might not be legal to distribute pre-generated FFI declarations that are based on header files of some of the commercial MPI libraries. E.g. the mpi.h shipped with Intel MPI contains some writing that makes me very reluctant to distribute something that is based on it.

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

One other thing I just thought of: it might not be legal to distribute pre-generated FFI declarations that are based on header files of some of the commercial MPI libraries. E.g. the mpi.h shipped with Intel MPI contains some writing that makes me very reluctant to distribute something that is based on it.

I did not thought of this =/ Yeah, it might be hard to distributed some of the bindings.

For Intel MPI specifically, it looks like it is based on MPICH, so we might get around the issue by using the same bindings for MPICH and Intel.

But maybe rust-lang/rust-bindgen#918 is a better solution to this problem. Bundling libclang would make most of my initial problems go away. I'll try to investigate both solutions.

from rsmpi.

bsteinb avatar bsteinb commented on May 31, 2024

There has been no movement here or over in the bindgen issue (which I have subscribed to) since November. Closing this for now.

from rsmpi.

AndrewGaspar avatar AndrewGaspar commented on May 31, 2024

I've got an idea for a perhaps more tractible solution to this problem.

We could add some tool that generates a vendored version of mpi-sys (as a tar ball or something like that). Then the user can use the [patch] directive to replace mpi-sys with their vendored version. My impression is that HPC systems tend to have a finite combination of compiler-MPI-version tuples, so you could easily pre-generate the mpi-sys for each MPI version you need once. When new MPI versions or compiler versions are added, you can just re-vendor.

The benefits of this are:

  • Eliminates dependency on libclang at build time
    • One user would still need libclang to produce the vendor crate
  • Eliminates the largest branch of the dependency tree[0]
  • Sidesteps the copyright issue
  • Sidesteps the version tuple nightmare

Downsides:

  • Pushes management of mpi-sys versions on to the user
  • Still requires libclang to be available at some point.
    • It's possible you could perform the vendoring off-system, though, if this is an issue - just copy the MPI headers off the system.

[0] Produced using cargo graph --optional-deps false (note: libffi also depends on bindgen).

rsmpi

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

This should work but I am not sure this is the best solution.

Pushes management of mpi-sys versions on to the user
Still requires libclang to be available at some point.

This is a bit of a bummer, because the idea was to completely get rid of the hard to install libclang. Plus pushing management of mpi-sys on the user is less than ideal if you want to have users who are nor rust developers.

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

Just though of another possible fix for the problem here: shipping an implementation of MPI (possibly MPICH/OpenMPI) with rsmpi itself.

This would be under an optional feature, and the MPI implementation would be compiled before compiling rsmpi. This means that we can control and ship the bindgen output for this particular, blessed implementation of MPI.

I am not sure if this could work, or if I am just showing my complete lack of understanding of MPI, but it look like one can install and use it's own MPI implementation on a cluster, without relying on the one provided with the cluster.

What do you think?

from rsmpi.

bsteinb avatar bsteinb commented on May 31, 2024

Well, shipping an open source implementation of MPI with the mpi-sys crate is possible. Although:

  1. I still am not certain that building our own MPI across different platforms will result in the same mpi.h (and thus a stable output of bindgen that we can ship with mpi-sys) every time.
  2. While I have seen people install their own version of MPI on clusters, I think this always requires at least some help from the system administrators. At least your own MPI has to know how to talk to the resource manager (SLURM, Torque, ...) that controls the system. So, on a cluster, I am not convinced that installing your own MPI is really the easier route. On a development workstation, on the other hand, it should be easy enough to get bindgen and its dependencies working.

I will try to do some experiments regarding the first point by installing Open MPI or MPICH an different platforms and seeing whether the resulting mpi.h and bindgen output are compatible.

from rsmpi.

marmistrz avatar marmistrz commented on May 31, 2024

For student cluster competitions we always build OpenMPI 3.0 from source because CentOS ships the ancient 1.x version (and the difference in performance is significant)

On production clusters you often have to build your own version of compilers, etc. because admins don't want to put the newer version, even as a module module. I have rebuilt GCC and binutils myself.

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

I have an other idea to fix the issue here. My understanding of the problem is that some MPI types use different ABI in different implementations, meaning that is it not possible to assume all mpi.h are equivalent.

A possible way to work around this would be to provide a small shim around MPI functions where rsmpi would be in control of the ABI, which would call into the local MPI installation.

Something like this

#include <mpi.h>
#include <stdint.h>

// use types with known size to calling the functions from rust
typedef int32_t RSMPI_Comm;
typedef int32_t RSMPI_Group;

int RSMPI_Comm_create(RSMPI_Comm comm, RSMPI_Group group, RSMPI_Comm *newcomm) {
    MPI_Comm new;
    int status = MPI_Comm_create((MPI_Comm)(comm), (MPI_Group)(group), &new);
    *newcomm = (RSMPI_Comm)(new);
    return status;
}

// and so on for everything used by rsmpi 

Then, this file would be compiled with user-provided mpicc, taking care of all the specific for the current MPI installation. We could also ship bindgen's generated extern function definitions, since we would control them.

The main drawback I can see with this approach are

  1. it is somehow labor intensive to generate all the wrapper functions. This might be alleviated by the reduced need for support of specific MPI implementations
  2. there is an additional function call overhead, which might be alleviated with LTO

What do you think?

from rsmpi.

AndrewGaspar avatar AndrewGaspar commented on May 31, 2024

I think there's some merit to the idea - I've thought about doing this in the past.

A couple things:

  1. You'd almost certainly need to make the "portable" handle types 64-bit or larger. There are MPIs I've seen when the MPI handle type is a pointer.
  2. Something to keep in mind is that some routines will have a bit more overhead than just a function call. Any routine which takes a list of MPI handles may have to allocate memory to translate the lists if the actual handle type differs in size from the RSMPI handle type.

Though a possibility could be that you compile and run a program as part of build.rs than outputs all the handle type sizes.

from rsmpi.

Luthaf avatar Luthaf commented on May 31, 2024

Though a possibility could be that you compile and run a program as part of build.rs than outputs all the handle type sizes.

This is another alternative, and how the Julia bindings to MPI do it. You may also want to get the alignment of the types right. I don't know how one would create a fully opaque type with given size and alignment in safe rust though.

from rsmpi.

jedbrown avatar jedbrown commented on May 31, 2024

I think it's more appropriate for a wrapper layer to live outside rsmpi. This project, for example, has been around for a while, but is now nicely licensed.
https://github.com/cea-hpc/wi4mpi

At this point, I think avoiding bindgen has nontrivial maintenance costs and a specialized mpi-sys is the way to go if it's important to you. If you do create a specialized mpi-sys, we can add it to CI. I'll close this issue now, but feel free to reopen if you think that's inappropriate or you would like to put some effort toward a different strategy.

from rsmpi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.