Proper Makefile

I'm happy to work on this but I don't want to start if you're already on it.

Remove protobuf source from repository (or whatever is making cloning slow)

Protostan should be very fast to clone (without submodules). Not sure how best to fix this. It should be fixed though.

POC: cmdstan-protostan

Prototyping actual use of protostan+stan will be easier to do in C++. It's too hard to do in Python right now because protobuf3 is still in beta.

create cmdstan-protostan repo
get a minimal example running ('stanc')

Include path for generated (from .proto) cpp files is wrong

The way the makefile works right now we get a stanc.pb.cc which has this line
``

include "proto/stanc.pb.h"

but our source directory issrc/so this needs to be

include "stan/proto/stanc.pb.h"

``

This project would be more useful if it used gRPC as the serialisation / message handling layer instead of the manual length-prefixed serialisation code it currently uses. The change would not be that fundamental, as gRPC is also based on Protocol Buffers.

Benefits:

It would mean (a little) less code in this project to do the message delimiting stuff.
It would mean less code in the client to do the message delimiting stuff. I think I read somewhere about a plan to make C++ wrapper functions to help with this, but gRPC is language indepedent, like protocol buffers, so this would work for more types of client. Of course the length prefixing stuff is not really that big of a deal but a nice interface doesn't hurt.
It would also help with the maximum proto issue: You can use a streaming request to send multiple protobufs for a single request. Search for "client-side streaming RPC" in the gRPC C++ introdction.
Other gRPC goodies would become possible (although they are probably not that useful for this project) e.g. SSL, load balancing (when it comes), request multiplexing...

Exhaustively test stan::lang::compile wrapper

The tests should test every possible outcome for stanc

From an email from Bob

>> Right now, there's two ways to fail.  You can either get
>> an exception in the code when there's a failed expectation
>> (such as an opening brace without the expected close brace)
>> and a parse failure when the program's ill formed for other
>> reasons.  This is just the behavior inherited from the Spirit
>> Qi parser, but we could always take a negative return and
>> throw an exception.

Tests for these conditions must be somewhere in cmdstan or stan, right?

Makefile should assume user has protoc in path

Currently the Makefile forces the user to use the included protoc which isn't ideal. The default case will be that the user already has protoc installed (via their operating system package manager).

Google C++ style

Is it alright if we standardize on Google's C++ style? https://google.github.io/styleguide/cppguide.html

I find the lack of firm conventions in C++ maddening.

Replace 'stanc' with 'compile'

'stanc' is a cmdstan-ism. Using compile everywhere (incl. filenames) is clearer.

Fix path for generated cpp proto headers

I think they should be put in src/stan/proto in keeping with the namespace.

Document binary format and usage implied by the binary protobuf writer

The pull request was fully tested but now it only makes sense to me:

add documentation to write_delimited_pb function, including comments on how this defines a binary file/streaming format.
add documentation to read_delimited_pb function
add usage example on how to use read_delimited_pb function to read the file/stream format.
add comments to tests for each writer component
add comments on the writer itself, for each method
add a usage example to the writer itself.

Add .proto spec. for output format

add protobuf spec to match writer API
- iteration
- errors
- diagnostic info

Simplify Makefile

There's some opportunity for code reuse in the Makefile.

@sakrejda let me know when you're done working on the Makefile for a bit and I'll work on this.

Remove model_file_name

I think the in_file_name argument is dead code. I'll double check.

(Related to #13)

Rename model_code to program_code, model_name to program_name, etc.

Have change land in Stan
Change CompileRequest etc.

The change in nomenclature is part of the Stan 3 refactor.

Expose C interface

Do we need to drop the namespaces for the shim to be usable from programming languages that only wrap C functions? Will everything need to be declared extern "C" or should the shim library be written in C?

Clean up mess from binary protobuf writer

fix Travis build so that make tests-all runs (that includes cpplint, currently Travis only runs make test which skips cpplint so that I don't hate developing on this repo)
lint cleanup

Rudimentary documentation

What's the best way to document the proto files and the shim C++ files? It would be nice to automatically generate HTML docs for future interface writers.

Get pystan-protostan compile working and tested on travis CI

Once this is done we can move on to bigger fish.

C++11 on travis?

Travis seems to have gcc 4.6.3 which doesn't support C++11 (?) which is used in some of the newer code.

Add test for invalid model

Try compiling "invalid model code"

Example written in C (stanc-lite?)

As a demonstration it might be useful to have a program written in C which uses the interface. Maybe put it in examples/c? (https://github.com/protobuf-c/protobuf-c)

Setup cpplint CI test

cpplint: https://github.com/google/styleguide/tree/gh-pages/cpplint

I think this isn't difficult to do. Perhaps Stan has done this already?

Minimal example C++ project

We need a good, simple example of using protostan in a C++ project as a library. (pystan-protostan is opaque to people who don't know Python.) The point here is mainly to show how you can use this repo (shallow clone), the stan source code, and preinstalled libprotobuf and protoc to do something useful.

My only idea right now is a binary validate_stan_program which just reads a .stan file and returns 0 if everything is ok and 1 if things are not ok (and writes something to stderr).

This little example project would have its own Makefile and instructions.

First writer

To check on how much CPU is chewed up going from Eigen data types to Protocol Buffers in practice, do a simple writer which:

constructs the correct protocol buffers messages
converts to JSON
writes messages in newline delimited JSON to a file.
This should be a drop-in replacement for stream_writer for output, so
fork/hack CmdStan a little to do a comparison of this versus .csv output (which also has to convert to text).

License?

We should have a license. Since it's just a wrapper I feel like MIT or CC0 might be reasonable.

Figure out namespaces

Looks like:

stan::proto should be where the message types live, we got this right.
stan::interface_callbacks::writer is where the new writers should go, even though internally they do some protobuf stuff, they're still writers and should live with all the writers.
stan::proto::compile is in stan::proto, rather than stan::lang, even though it's a "compile"

Looks like we are exposing three types of things: 1) messages; 2) thin wrappers; and 3) callbacks. So generalizing to a namespace policy:

If it's a message it lives in stan::proto
If it's takes message arguments and returns messages, it's part of the thin wrapper and should also live in stan::proto
If it's a callback it's implementing additional logic and not necessarily dealing with protobuf types as function inputs/outputs so it goes in whatever namespace that type of callback usually goes.

@ariddell Feel free to comment, close if you agree and I'll write it up as a Wiki thing and/or add to README.md

Binary protobuf writer

To check on how much CPU is chewed up going from Eigen data types to Protocol Buffers in practice, do a simple writer which:

Fix Makefile to allow for clang to be used

Right now g++ is hard-coded so clang doesn't get used on travis.

Remove / rename model_file_name in StanCompileRequest

So the file name argument in stan::lang::compile (in_file_name) is purely aesthetic -- it's used in the error messages, as far as I can tell.

     * @param in_file_name Name of input file to use in error
     * messages; defaults to <code>input</code>.

I think we should drop it or rename the field (to in_file_name?) in StanCompileRequest. Right now I think it would be very easy to think that it's the location of a source file where the model code is stored.

Thoughts @sakrejda?

Choose hierarchy
Refactor stream writer
Add writer that has a filename constructor (open used internally to get fd)
Modify tests for file-based writer.

sakrejda / protostan Goto Github PK

protostan's People

Contributors

Stargazers

Watchers

Forkers

protostan's Issues

include "proto/stanc.pb.h"

include "stan/proto/stanc.pb.h"

Recommend Projects

Recommend Topics

Recommend Org