Comments (8)
Hi, this has become a nuisance for me as well, so I dug in a bit.
Concerning the build timestamp (which also contains R version and operation system), it leads to an additional entry in the DESCRIPTION file, and it does not seem to be present anywhere else. I could not find reference to the operating system anywhere else either.
Concerning the source directory/library directory references, for the packages I surveyed at least, they could be found in the .so, .a, ... binary files. They are not stamps added by R, but come with the debug symbols that are added by R by default. It does not seem to be possible to remove those debug symbols with a flag (see https://stackoverflow.com/questions/9607155/make-gcc-put-relative-filenames-in-debug-information), so the best option IMO might be to invoke strip
on all the binary files, after they are generated by R CMD INSTALL
. If debug symbols are needed, then reproducibility can probably be put aside anyway, and we can have a '--define' Bazel option to disable stripping. strip
is present with Xcode on Mac OSX and is part of the binutils package on Ubuntu/Debian, and installed by default. It should be invoked with '-S' instead of '-d' for Mac OSX/Linux compatibility.
I do not guarantee this will make the builds reproducible, but it should address the two issues you pointed out, without having to acquire a lock.
If this sounds good to you, I'll try to do a proof-of-concept with an additional reproducibility test (I hope it will pass!) sometime during the week.
from rules_r.
Hi Hadrien,
It's not just the compiler adding the debug symbols. When you take a checksum of all the files in the installed package, you will see that the checksum of some .rdb/.rdx files vary as well. I was able to load one of these files in R and see that it had references to the library directory. These checksums become identical when you keep the --library
flag constant. The --built_timestamp
flag is available to make the package completely reproducible but they assume that the destination library is constant.
If after this, we still want to strip the debug symbols, we can add a default Makevars file with the appropriate flags.
from rules_r.
Ok got it, I was wrong. Do you have this issue resolved internally?
I just looked at whether I could find any path in the output files (like, grep -R ...), and I could only find them in the debug symbols of the .so files, so I thought "problem solved!". I don't know how this info ends up in the .rdb/rdx files, but actually even if I remove the debug symbols in the .so files, there are still a few bits that differ in the ELF header, for whatever reason.
So, to have a reproducible build for things that ultimately go into a container layer, built-timestamp, R_MAKEVARS_USER and the package path (e.g., R CMD INSTALL ) must be constant.
from rules_r.
Resolved as much as possible in 5bb812b.
See full commit message for details and caveats.
from rules_r.
I've noticed in openSUSE RPMs , and it appears to also be Fedora RPMs, that the builds are not reproducible so these tricks here havent made their way into R or build systems. I havent checked Debian yet. I did notice that https://salsa.debian.org/reproducible-builds/diffoscope/commit/4d31312 is adding analysis of R packages, esp. the files which embed timestamps and paths.
Is there any ongoing effort to have R support reproducible builds?
from rules_r.
It is not clear with your message if you are building with bazel. This project is an extension to the bazel build system.
These rules should have reproducible builds, at least from R 3.4 onwards.
If you are building outside of bazel, use at least R 3.6, give the --built-timestamp
flag when building. I have not tested it, but it will take you a longer distance. For packages with native code, you will also need to set some C flags.
from rules_r.
Hiya @siddharthab , I am referring to the general problem of R reproducible builds, which bazel appears to be trail-blazing.
--built-timestamp
helps, but I couldn't find any inbuilt R install mechanism to avoid the varying paths in the .rdb/.rdx files. Ideally we find a way to get your solution here, merged into R core.
openSUSE/build-compare#34 does the opposite approach of what you have done here, which is ignoring those specific items which change in every build, so they dont replace the existing 'identical' build artifacts.
from rules_r.
I thought staged installs in R 3.6 solved the problem of hard-coded paths. But I suppose the stage directory itself is not constant. R will simply need to accept a user setting as the stage directory prefix to get complete reproducibility. I suppose it can be brought up in the r-devel mailing list.
from rules_r.
Related Issues (20)
- r_pkg use of cc_deps - does it use the includes directive of cc_library? HOT 4
- trouble building r_pkg depending on cc_library HOT 7
- how to install r check deps into system R library path HOT 1
- "tools" specified in the toolchain are not configured for r_unit_test HOT 1
- r_unit_test should be able to declare data dependencies
- Why not let Bazel compile C++ as well? HOT 8
- Shared objects are thrown out by R cmd build HOT 1
- ARM support HOT 2
- `r_pkg_test` does not seem to work with `pkg_name`
- Rewrite razel as gazelle language extension HOT 1
- Do not rely on users using `+=` in their package Makevars
- Update covr dependency to CRAN HOT 1
- File paths in coverage information are just the package name
- Setup Buildbuddy cache for Github Actions tests HOT 1
- Make source and binary archives byte reproducible
- Collect coverage trace from cc_deps
- Question on header paths HOT 4
- Seems like C_SO_LD_FLAGS has no effect when external *.so is included HOT 2
- Fix reproducibility tests for R 4.1.0 HOT 1
- r_toolchain to support setting r and rscript using Bazel target names/labels HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rules_r.