Coder Social home page Coder Social logo

Comments (25)

WardBrian avatar WardBrian commented on July 4, 2024 1

I can take a closer look tomorrow, with that ruled out my questions would be:
Are there any additional commands used in our compilation compared to CmdStan (I think not, but good to check)
Which commands invoked by make specifically are slower, assuming it isn’t just “all of them”

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024 1

I believe @rok-cesnovar knows more about pre-compiled headers/helped set them up for CmdStan. The doc for them always has lots of scary warnings, like

Some other command-line options starting with -f, -p, or -O must be defined in the same way as when the precompiled header was generated. At present, it’s not clear which options are safe to change and which are not

(emphasis mine)

Any macros defined before the precompiled header is included must either be defined in the same way as when the precompiled header was generated, or must not affect the precompiled header, which usually means that they don’t appear in the precompiled header at all.

This seems to me like it would be a huge problem with STAN_THREADS, which affects quite a bit of stuff in the Math library. However, it seems like it is not a problem in practice for CmdStan - for now, anyway

from bridgestan.

rok-cesnovar avatar rok-cesnovar commented on July 4, 2024 1

The precompiled headers are huge because they include the entire math library plus all the header-only dependencies (boost, eigen, ...). The precomp. headers are much smaller with clang (see stan-dev/cmdstan#872), at least they were a few years back.

clang is particularly annoying with precompiled headers because it breaks for any minor or patch version bump. And its seems that macs just update clang automatically.

This seems to me like it would be a huge problem with STAN_THREADS

CmdStan handles this by having separate precompiled headers for threads. So when you compile the first model with threads it builds separate precompiled headers with _threads suffix.

For me the precompiled header takes compiling a model from 17s to 7s

Similar numbers for me. It has a huge effect. If it didn't I think we would have never made it the default in CmdStan.

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024 1

If I understand this correctly, the reason that the precompiled header helps so much is because without it the compiler has to parse ~300_000 lines of code in ~3000 different files each time a model is compiled. This would also explain why reducing the optimization level helps so little with compile times. The optimization step in general is just shorter than the parsing step?

Then maybe a cleaner solution would be to reduce the number of includes that aren't actually needed in a model. The special functions code from boost for instance is already 26_000 lines of code, but I'd guess only a very small subset of models actually uses more than one or two of those.

I guess that is easier said than done though, as that would require stanc to pick and choose includes depending on which functions are used in the model, instead of pulling everything in each time?

from bridgestan.

bob-carpenter avatar bob-carpenter commented on July 4, 2024

To test compilation time, I'd put copies of the model in different places to make sure CmdStan isn't just reusing the BridgeStan compiled model. Or try to clean out the compiled versions in between. But if that's the difference, I'd expect CmdStanModel to be a split second.

In CmdStan, we only have to translate the Stan program to a C++ class. The sampler and command-line tools are compiled in separate translation units and then linked. The linking does need to happen, though, so that takes a bit of time.

Do you know if BridgeStan and CmdStanPy compile at the same optimization level? I hope CmdStanPy isn't default to O = 0, as that would be very slow. @mitzimorris should know.

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

I wonder if it’s slower to compile position independent code? It’s also possible some of the precompiled header logic in CmdStan didn’t make it into our makefile, though that stuff is very brittle in my experience

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

I don't think the difference is due to caching, just to make sure I gave it slightly different models, and that didn't change the timings. I'm also seeing the pretty consistently in by benchmarks with posteriordb.

In CmdStan, we only have to translate the Stan program to a C++ class. The sampler and command-line tools are compiled in separate translation units and then linked. The linking does need to happen, though, so that takes a bit of time.

Ah, that does make sense. Still doesn't explain the difference I think though...

Do you know if BridgeStan and CmdStanPy compile at the same optimization level? I hope CmdStanPy isn't default to O = 0, as that would be very slow. @mitzimorris should know.

I don't see a big difference in time for a gradient evaluation between bridgestan and cmdstan, so I don't think that's it either. A small difference might be lost in all the other things that differ, but I'm pretty sure something like O0 would be very noticeable.

I guess it could also be because we also compile the hessian? Or something in a makefile isn't right and it recompiles something it shouldn't...

from bridgestan.

bob-carpenter avatar bob-carpenter commented on July 4, 2024

Hessians will add overhead. But it shouldn't be that much because the Hessian code itself is fast and there's nothing else to compile in forward mode other than the functions being used in the code. But then that's also true for the reverse mode functions, so maybe it is just the Hessians. The only way I know how to test that would be modifying the code, but if it really is just the Hessians slowing things down, then we probably want to be able to conditionally compile w/o Hessians.

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

If you compile with autodiff hessians you’re also including all of stan/math/fwd, which definitely adds a lot of files for the preprocessor to handle

from bridgestan.

mitzimorris avatar mitzimorris commented on July 4, 2024

CmdStanPy doesn't change the default optimization level - it just runs the CmdStan makefile.

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

Seems the hessian code is behind a compile time flag right now, and that was off in the code above. To make sure I just deleted all references to the hessian in model_rng.cpp in the .bridgestan/.../src/ dir, and that didn't change the compile (after compiling one model once, the first time compiling any model was ~40s). Assuming I didn't miss anything, I think that should exclude the hessian as an explanation.

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

I was just running into missing symbol issues with models involving sundials, and that made be check the variables in the makefile. Things there don't seem right to me, and that might be the source of this issue as well. It looks like some setup code never ended up being executed. For instance the sundials libs aren't precompiled (no sundials shared libs anywhere in the ~.bridgestan directory), and the linker flags for those are empty. I think the ode example in the test modules doesn't seem to cover sundials either, as that is just using a runge kutta solver.

The output of make in the .bridgestan/bridgestan*/ dir is for me:

  Current configuration:
  - OS (Operating System):       Linux
  - CXX (Compiler):              g++
  - CXX_TYPE                     gcc
  - Compiler version:            12.3
  - O (Optimization Level):      3
  Library configuration:
  - EIGEN                        ./stan/lib/stan_math/lib/eigen_3.4.0
  - BOOST                        ./stan/lib/stan_math/lib/boost_1.78.0
  - SUNDIALS                     ./stan/lib/stan_math/lib/sundials_6.1.1
  - TBB                          ./stan/lib/stan_math/lib/tbb_2020.3
  - GTEST                        ./stan/lib/stan_math/lib/benchmark_1.5.1/googletest/googletest
  - STAN_THREADS                
  - STAN_OPENCL                 
  - STAN_MPI                    
  Compiler flags (each can be overriden separately):
  - CXXFLAGS_LANG                -std=c++1y
  - CXXFLAGS_WARNINGS            -Wno-sign-compare -Wno-ignored-attributes
  - CXXFLAGS_OPTIM              
  - CXXFLAGS_FLTO               
  - CXXFLAGS_BOOST              
  - CXXFLAGS_EIGEN              
  - CXXFLAGS_OS                  -pthread -D_REENTRANT
  - CXXFLAGS_GTEST              
  - CXXFLAGS_THREADS            
  - CXXFLAGS_OPENCL             
  - CXXFLAGS_TBB                 -I ./stan/lib/stan_math/lib/tbb_2020.3/include
  - CXXFLAGS_OPTIM_TBB          
  - CXXFLAGS_MPI                
  - CFLAGS_SUNDIALS             
  - CXXFLAGS_SUNDIALS            -pipe
  - CXXFLAGS_OPTIM_SUNDIALS     
  LDLIBS:
  - LDLIBS_LANG                 
  - LDLIBS_WARNINGS             
  - LDLIBS_EIGEN                
  - LDLIBS_BOOST                
  - LDLIBS_SUNDIALS             
  - LDLIBS_OS                   
  - LDLIBS_GTEST                
  - LDLIBS_OPENCL               
  - LDLIBS_TBB                   -Wl,-L,/home/adr/.bridgestan/bridgestan-2.1.1/stan/lib/stan_math/lib/tbb -Wl,-rpath,/home/adr/.bridgestan/bridgestan-2.1.1/stan/lib/stan_math/lib/tbb
  - LDLIBS_MPI                  
  LDFLAGS:
  - LDFLAGS_LANG                
  - LDFLAGS_WARNINGS            
  - LDFLAGS_EIGEN               
  - LDFLAGS_BOOST               
  - LDFLAGS_SUNDIALS            
  - LDFLAGS_OS                  
  - LDFLAGS_GTEST               
  - LDFLAGS_OPENCL              
  - LDFLAGS_TBB                  -Wl,-L,/home/adr/.bridgestan/bridgestan-2.1.1/stan/lib/stan_math/lib/tbb -Wl,-rpath,/home/adr/.bridgestan/bridgestan-2.1.1/stan/lib/stan_math/lib/tbb
  - LDFLAGS_MPI                 

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

I'm not able to re-create this with a warmed-up (meaning I've built at least one model previously) cmdstan and bridgestan:

CmdStan output

$ /usr/bin/time make examples/bernoulli/bernoulli

--- Translating Stan model to C++ code ---
bin/stanc  --o=examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.stan

--- Compiling, linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS          -c -Wno-ignored-attributes   -x c++ -o examples/bernoulli/bernoulli.o examples/bernoulli/bernoulli.hpp
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS               -Wl,-L,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb"        examples/bernoulli/bernoulli.o src/cmdstan/main.o       -Wl,-L,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb"     stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.so.2 -o examples/bernoulli/bernoulli
rm -f examples/bernoulli/bernoulli.o
18.67user 0.68system 0:19.37elapsed 99%CPU (0avgtext+0avgdata 1322664maxresident)k
0inputs+7608outputs (0major+440066minor)pagefaults 0swaps

BridgeStan output

$ /usr/bin/time make test_models/bernoulli/bernoulli_model.so

--- Translating Stan model to C++ code ---
./bin/stanc  --o=test_models/bernoulli/bernoulli.hpp test_models/bernoulli/bernoulli.stan

--- Compiling C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I ./stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I ./stan/src -I ./stan/lib/rapidjson_1.1.0/ -I ./stan/lib/stan_math/ -I ./stan/lib/stan_math/lib/eigen_3.4.0 -I ./stan/lib/stan_math/lib/boost_1.78.0 -I ./stan/lib/stan_math/lib/sundials_6.1.1/include -I ./stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -fPIC    -DBOOST_DISABLE_ASSERTS          -c -x c++ -o test_models/bernoulli/bernoulli.o test_models/bernoulli/bernoulli.hpp
--- Linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I ./stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I ./stan/src -I ./stan/lib/rapidjson_1.1.0/ -I ./stan/lib/stan_math/ -I ./stan/lib/stan_math/lib/eigen_3.4.0 -I ./stan/lib/stan_math/lib/boost_1.78.0 -I ./stan/lib/stan_math/lib/sundials_6.1.1/include -I ./stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -fPIC    -DBOOST_DISABLE_ASSERTS               -Wl,-L,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb"        -shared -lm -o  test_models/bernoulli/bernoulli_model.so test_models/bernoulli/bernoulli.o ./src/bridgestan.o       -Wl,-L,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb"     ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  ./stan/lib/stan_math/lib/tbb/libtbb.so.2
rm -f test_models/bernoulli/bernoulli.o
18.10user 0.76system 0:18.87elapsed 99%CPU (0avgtext+0avgdata 1339500maxresident)k
0inputs+5304outputs (0major+439114minor)pagefaults 0swaps

Both roughly 18s on my machine. If I set BRIDGESTAN_AD_HESSIAN=true the time doesn't change, which I actually should have expected since the new include doesn't happen in the model file itself but in the built-once bridgestan.cpp

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

It seems the difference I'm observing is because there is a model_header.hpp.gch header cache file in the cmdstan stan directory, but not in the corresponding directory of bridgestan. If I move it, I also observed the same timings in both cases, and manually running the g++ commands shows the timing differences with and without the cache file pretty clearly. That file is ~800MB, so I guess if the compiler has to get that same information out of small individual files instead, I can see why that would take longer.

I'm not sure why the cache file was created in one case but not the other. I've been changing the g++ version quite a bit, maybe that had something to do with it (and if the cache file is not there I get deprecation warnings from boost with g++ 12 and 13).

Anyway, I don't think this really is a bridgestan issue after all, so I'm closing this

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

Yeah, that is the “precompiled header” file. I’m not sure why it would be one place but not the other, since the math library is the same in both

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

Looking at it again, I think the code that creates the precompiled header is only in the cmdstan repo, not in the stan or stan_math repo, so it just isn't available in bridgestan. Given that it seems to give a pretty good speedup (at least for me, no idea why @WardBrian didn't see the same thing).

The code in the makefile is here. I guess it could be included in the bridegestan makefile as well? Or maybe the stan repo would be a better place for it, so that both projects can share the makefile rules?

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

I think I have the precompiled header disabled in my cmdstan installation.

It's a decent speedup, but if we do support it I would make the opposite choice of CmdStan and disable it by default. It's a pretty constant source of cmdstanpy bug reports due to things like xcode updating and the old file becoming invalid

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

Ah, I see your point...

The file size of the precompiled header made me wonder what it could be that makes it so big, so I had a quick look at what headers get included when building the model. Wasn't all that that conclusive, and the prepocessor output is only(?) 18MB. I guess the most interesting part is that various parts of boost make up almost half of the 300_000 lines after the prepocessor runs.

Not sure it's useful to anyone, but just in case, here's the link: https://gist.github.com/aseyboldt/6841525395e37e8ba2fceeaa5021b547

from bridgestan.

bob-carpenter avatar bob-carpenter commented on July 4, 2024

I'd like to see how much precompiled headers save in the way of time. They are a huge pain in the context of CmdStanPy because the PCH is constantly breaking and having to be regenerated, but the suggested fix (which takes a couple minutes to run) gets lost in a huge scrolling error message we've never been able to tamp down.

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

For me the precompiled header takes compiling a model from 17s to 7s

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

Could we do something similar to the _threads trick and hack the compiler version number into the filename?

How does the compiler know to use model_header_threads.hpp.gch when the C++ file is still using #include <stan/model/model_header.hpp>?

from bridgestan.

rok-cesnovar avatar rok-cesnovar commented on July 4, 2024

Could we do something similar to the _threads trick and hack the compiler version number into the filename?

hmmmm, that seems like a great idea at first sight!

How does the compiler know to use model_header_threads.hpp.gch when the C++ file is still using #include <stan/model/model_header.hpp>?

This makefile trick: stan-dev/cmdstan#882

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

Not "including the world" is a project we've been talking about in the math library since I joined the project (and probably before). A fair number of things in math end up including really broad headers, which causes this problem and makes conditional testing really difficult (if you ask a test file "did any of your included headers change" the answer is "yes" with probability approaching 1, given the current state of things).

It's worthwhile to do, but it's a big undertaking and is one of those things that only starts showing real benefits when you've moved a very large fraction of things over to the different style

from bridgestan.

aseyboldt avatar aseyboldt commented on July 4, 2024

Not "including the world" is a project we've been talking about in the math library since I joined the project (and probably before).

Ah, those projects are always great :D

But even if the math library itself does this perfectly, as long as stanc always includes everything anyway (I think that's what's going on at least...) this wouldn't help. If so, then maybe fixing the includes created by stanc first would make it easier to actually see at least some incremental improvement? I get your point about this being a larger project though :-)

from bridgestan.

WardBrian avatar WardBrian commented on July 4, 2024

Yep, the Stan repo and stanc3 would both also need changes to take advantage of it.

For now I'm curious if @rok-cesnovar and I can make the PCH experience sufficiently "nice". I didn't realize we could stuff arbitrary information into the filenames for these (like we do for .o files), so I think we can avoid the worst of the version incompatibilities we currently see

from bridgestan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.