Coder Social home page Coder Social logo

phillipstanleymarbell / noisy-lang-compiler Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 1.0 155.59 MB

Noisy language compiler

License: MIT License

Makefile 0.91% Mathematica 4.03% C 85.93% C++ 7.30% Shell 0.19% nesC 0.05% Roff 1.41% Nemerle 0.01% Python 0.17% CMake 0.01%
programming-language physical-computing sensor-fusion measurement-units dimensional-analysis iot cyber-physical-systems

noisy-lang-compiler's Introduction

Noisy and Newton

Noisy is a programming language for talking to sensors. Newton is a specification language for describing physics. Noisy is descended from the M programming language [Stanley-Marbell and Marculescu, 2006] which is in turn descended from the Limbo programming language [Dorward, Pike, Trickey, 1994], and the Alef programming language [Winterbottom, 1992].

Newton is a language for specifying assertions (invariants) about physical systems. Newton was originally intended to be a configuration language for the Noisy compiler to encapsulate information about temporally-invariant physical properties of the hardware on which a Noisy program executes. The first implementation of Newton based on the Noisy code base, and the API for interfacing to the Newton intermediate representation, was the focus of the M.Eng. thesis of Jonatham Lim. A short ArXiv paper summarizes the concepts in Newton. Newton has evolved into a self-contained foundation for research investigating automated dimensional analysis of physical system descriptions, automated analysis for differential privacy in sensors, and automated generation of physics-constrained function approximation, among other things. In contrast to Newton which is designed for specifying physical assertions/invariants, alternatives such as Modelica allow you to imperatively model the dynamics of physical systems.

Because the Newton compiler started out as a modification of the Noisy compiler implementation to test out ideas, the implementation of Newton borrows/shares many components from the Noisy compiler and is therefore distributed with it.


Getting started

The correct way to clone this repository to get the submodules is:

	git clone --recursive [email protected]:phillipstanleymarbell/Noisy-lang-compiler.git

To update all submodules:

	git pull --recurse-submodules
	git submodule update --remote --recursive

If you forgot to clone with --recursive and end up with empty submodule directories, you can remedy this with

	git submodule update --init

Building the Noisy compiler and debug tools depends on the libflex, Wirth-tools, and DTrace-scripts repositories. These repositories are already included as submodules:

	Libflex:		[email protected]:phillipstanleymarbell/libflex.git
	Wirth tools:		[email protected]:phillipstanleymarbell/Wirth-tools.git
	DTrace-scripts:		[email protected]:phillipstanleymarbell/DTrace-scripts.git

For linear algebra in Newton, we use the Eigen library. This is also already linked to the repository as a submodule:

	Eigen:			[email protected]:eigenteam/eigen-git-mirror.git	

The build also depends on the C protobuf compiler, sloccount, and on Graphviz. On Mac OS X, the easiest way to install these is to use macports (macports.org) to install the packages protobuf-c and protobuf-cpp (on Debian, you want the package libprotobuf-c-dev and on Ubuntu you also want protobuf-c-compiler), sloccount, and graphviz-devel.

Furthermore, LLVM is a build and runtime dependency on this project. Currently, passes related to LLVM are tested with LLVM 13 versions.

Make sure llvm-config is installed for one of the above versions. In case it is named differently, e.g., llvm-config-x you will need to create a symbolic link:

cd /location/of/llvm-config-x
ln -s llvm-config-x llvm-config

Once you have the above repositories,

  1. Create a file config.local in the root of the Noisy tree and edit it to contain
	LIBFLEXPATH     = full-path-to-libflex-repository-clone
	CONFIGPATH      = full-path-to-libflex-repository-clone
	OSTYPE		= <one of 'linux' or 'darwin'>
	MACHTYPE	= x86_64

For example,

	LIBFLEXPATH=/home/me/Noisy-lang-compiler/submodules/libflex
	CONFIGPATH=/home/me/Noisy-lang-compiler/submodules/libflex
	OSTYPE		= linux
	MACHTYPE	= x86_64
  1. Copy config.local to the libflex directory
	$ cp config.local submodules/libflex
  1. In src/common/Makefile and src/newton/Makefile, change COMPILERVARIANT as necessary (default is clang).

  2. Build Libflex by going to the directory you cloned for Libflex and running make. The Makefile assumes the environment variables OSTYPE and MACHTYPE are set. If that is not the case, you will need to explicitly set them, for example on macOS:

$ cd submodules/libflex
$ make OSTYPE=darwin MACHTYPE=x86_64
  1. From the root of this top-level repository, build the Noisy and Newton compilers by running make. The makefile assumes the environment variables OSTYPE and MACHTYPE are set. If that is not the case, you will need to explicitly set them, for example on macOS:
	make OSTYPE=darwin MACHTYPE=x86_64

The Newton compiler

You can invoke the compiler on your platform, e.g., ./newton-darwin-EN, with the flags -h or --help to see the usage:

	Newton version 0.3-alpha-756 (da767ee43c2ce361955379f0b5e2a25602ad219d) (build 07-27-2019-13:[email protected]_64).

	Usage:    newton-<uname>-EN
	                [ (--help, -h)                                               
	                | (--version, --V)                                           
	                | (--verbose <level>, -v <level>)                            
	                | (--dot <level>, -d <level>)                                
	                | (--smt <path to output file>, -S <path to output file>)    
	                | (--bytecode <output file name>, -b <output file name>)     
	                | (--optimize <level>, -O <level>)                           
	                | (--dmatrixannote, -m)                                      
	                | (--pigroups, -p)                                           
	                | (--kernelrowcanon, -c)                                     
	                | (--pigroupsort, -r)                                        
	                | (--pigroupdedup, -e)                                       
	                | (--pikernelprinter, -P)                                    
	                | (--pigrouptoast, -a)                                       
	                | (--codegen <path to output file>, -g <path to output file>)
	                | (--trace, -t)                                              
	                | (--statistics, -s) ]                                       
	                | (--latex, -x) ]                                            
                                                                             
	              <filenames>

For example, to compile a Newton description to LaTeX:

	./newton-darwin-EN ../../applications/newton/invariants/Waves-pigroups.nt -x

For example, to execute the state estimator synthesis backend and get the synthesize C source code for the Pendulum.nt input:

	./newton-darwin-EN --estimator-synthesis=<full-path-to-output-file.c> --process=pendulum_ideaL_process --measure=pendulum_measure ../../applications/newton/invariants/Pendulum.nt

The Noisy compiler

The Noisy compiler takes Noisy programs and compiles them to either Noisy Bytecode (the Noisy IR serialized via Google's Protocol Buffers), or renders the IR and symbol table using GraphViz/Dot for debugging.

You can invoke the compiler on your platform, e.g., ./noisy-darwin-EN, with the flags -h or --help to see the usage:

	Noisy version 0.1-alpha-2655d9edbe4e+ (build 11-22-2015-18:[email protected]_64), Phillip Stanley-Marbell.
	
	Usage:    noisy [ (--help, -h)                                       
	                | (--version, --V)                                   
	                | (--verbose <level>, -v <level>)                    
	                | (--dot <level>, -d <level>)                        
	                | (--bytecode <output file name>, -b <output file name>)
	                | (--optimize <level>, -O <level>)                   
	                | (--trace, -t)                                      
	                | (--statistics, -s) ]                               
	                                                                     
	              <filenames>

To compile a Noisy program and display statistics on internal routine calls:

	% ./src/noisy/noisy-darwin-EN --optimize 0 --statistics applications/noisy/helloWorld.n

To compile a Noisy program and emit its IR into dot, and render the generated dot code through dot:

	% ./src/noisy/noisy-darwin-EN --optimize 0 --dot 0 applications/noisy/helloWorld.n | dot -Tpdf -O ; open noname.gv.pdf

The dot detail levels are bit masks: 1<<0 (i.e., 1): no text, 1<<1 (i.e., 2): no nil nodes. You can ease the task of rendering the IR by using one of the helper scripts described below.

The helper scripts noisyIr2dot.sh and newtonIr2dot.sh

The scripts noisyIr2dot.sh and newtonIr2dot.sh generate renderings of the Noisy/Newton AST and symbol table. They take two arguments: a source file, a rendering format (e.g., "pdf" or "png"), and a dot detail level (see the section above in README.md) for the dot backend (e.g., '0'). It is a simple wrapper to the noisy compiler, which it invokes with a useful default set of flags.

For example, from the noisy build directory:

	% ./noisyIr2dot.sh ../../applications/noisy/helloWorld.n pdf 0

Implementation and the Wirth tools

The Noisy and Newton compiler implementations use the Wirth tools (https://github.com/phillipstanleymarbell/Wirth-tools) to generate various helper header files. The Wirth tools are not yet well polished, so the process is a bit messy.

First, run ffi2code on noisy.ffi to generate all the header definitions in a single file. Ignore any debugging statements that appear on stderr and focus only on the output directed via stdout to the output file as in the following example. From the noisy build directory:

 ../../submodules/Wirth-tools/ffi2code-darwin-EN noisy.ffi > noisy-ff-debug.txt

Next, manually copy the part of the result to the appropriate header files:

  1. The array ASTnodeType goes into noisy.h as NoisyIrNodeType

  2. The rest of the generated code goes into noisy-ffi2code-autoGeneratedSets.c. See the comments therein for more.

For an explanation of the T_XXX tokens in older files related to ffi2code, see https://github.com/phillipstanleymarbell/Wirth-tools/blob/master/EXAMPLES/bug.0.ffi

Development

There are pre- and post-commit hooks that will build the compiler, run it against a reference input, and record statistics on number of calls made and time spent in most of the compiler's implementation routines.

The generated statistics are stored in the analysis/statistics/ subdirectory, and can be analyzed using the Mathematica notebook that resides at analysis/mathematica/AnalyzeStatistics.nb.

CGI on Mac OS X

We use a CGI interface along with any web browser to provide a poor-person's GUI interface. Installing the CGI version of the compiler lets us use a web browser and some minimal Javascript to create a cross-platform GUI and IDE.

(On Mac OS X, $kNoisyBasePath is /Library/WebServer/Documents/tmp. See config.$(OSTYPE)-$(MACHTYPE)$(COMPILERVARIANT) for other platforms.)

	% mkdir $kNoisyBasePath
	% cp icons/* $kNoisyBasePath/
	% cp noisycgi-darwin-EN /Library/WebServer/CGI-Executables/
	% chmod 777 $kNoisyBasePath
	% sudo chmod 755 $kNoisyBasePath/*.png

On older versions of MacOS (~10.8 and earlier), enable the web server via the MacOS System Preferences --> Sharing. On Mac OS 10.10 and later, edit /etc/apache2/httpd.conf and (1) uncomment the line for LoadModule cgi_module (2) restart apache (sudo apachectl restart), then (3) and then:

	% open  http://localhost/cgi-bin/noisycgi-darwin-EN?c=HelloWorld+%3A+progtype%0D%0A%7B%0D%0A++++++++init++++%3A+namegen+%28list+of+string%29%3A%28list+of+string%29%3B%0D%0A%7D%0D%0A%0D%0Ainit+%3D%0D%0A%7B%0D%0A++++++++print+%3A%3D+name2chan+string+%22system.print%22+0.0%3B%0D%0A++++++++print+%3C-%3D+%22Hello+World%21%22%3B%0D%0A%7D%0D%0A&w=980&s=0&o=0&t=0&b=compile

The above URL encodes the parameters for the backends, passes, as well as the code, and html render width. The example is for Noisy; the Newton case is similar. The text editor with syntax coloring we now use is ACE (ace.c9.io), in conjunction with the jquery-git plugin to make it work for us (see comments in cgimain.c). Retrieve the JQuery-git from http://code.jquery.com/jquery-git.js and copy it to $kNoisyBasePath (or $kNewtonBasePath in the case of Newton):

	% wget http://code.jquery.com/jquery-git.js /tmp/
	% sudo cp /tmp/jquery-git.js $kNoisyBasePath/

Git clone https://github.com/ajaxorg/ace-builds.git and copy the src-noconflict subdirectory to $kNoisyBasePath (or $kNewtonBasePath in the case of Newton):

	% git clone https://github.com/ajaxorg/ace-builds.git /tmp/ace-builds
	% sudo cp -r /tmp/ace-builds/src-noconflict $kNoisyBasePath

Details on command line parameters:

Dot rendering detail bitmaps for Noisy and Newton:

	typedef enum
	{
		kNoisyDotDetailLevelNoText			= (1 << 0),
		kNoisyDotDetailLevelNoNilNodes			= (1 << 1),
	} NoisyDotDetailLevel;

noisy-lang-compiler's People

Contributors

angelospl avatar blackgeorge-boom avatar btsouts avatar divyakanapram avatar dzufferey avatar hyuglim avatar jarhodes314 avatar kisekihirakawa avatar komagr avatar lilrabbits avatar lvrgustafsson avatar peimu avatar phillipstanleymarbell avatar shaotuanchen avatar siegfriedchao avatar vladmandric avatar wenyuan95 avatar z-gu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

yangwang92

noisy-lang-compiler's Issues

Bugs in Noisy first and follow set definition (noisy.ffi)

Bug noticed by Jonathan:

The follow sets of iterStatements and matchStatement contain semicolon, but semicolon is also included in production for statement.

So, identifier and rightbrace (and not semicolon) should be in follow set of statement.

Follow set of statement should also contain rightBrace.

Followsets of statements

All of the Noisy followsets of iterStatements and matchStatements (and others) have semicolons in them, but semicolon is included in the definition of statement. identifier and rightbrace need to be in the followset of statements instead of semicolons because they are the tokens that can immediately follow a statement. However, the followset of Noisy statement is just firstset of statement. it also needs rightBrace

Syntax for defining relationships between quantities (e.g., via integration etc.)

The current syntax is not self-explanatory:

vectorIntegrals {
    [displacement, velocity, acceleration];
}

scalarIntegrals {
    [distance, speed, scalar_acceleration];
}

More generally, we need a way to describe the possibility of sensor substitutions. I would propose integrating it with the law syntax as one possibility. In the following, the digram o< denotes proportionality. The following would be an example of a set of correlates that are specific to the Warp platform:

WarpPlatformCorrelates : law(a: acceleration, p: pressure, g: anglerate) =
{
       integral integral a   o<   p,
       g  o< integral a,
}

Thus the vectorIntegrals and scalarIntegrals above could be reimagined as:

VectorIntegrals : law(d: displacement, v: velocity, a: acceleration) =
{
       derivative d   o<   v,
       derivative v  o< a,
}

I'm still a bit torn about the need to separate scalars and vectors here. It makes sense in general, but we'd need more concrete arguments.

One way to encapsulate the treatment into the syntax above is, e.g., to specify the relations between scalars and vectors:

VectorScalarPairs : law(d: displacement, x: distance) =
{
       magnitude d   =   x,
}

Wirth tools make error

Makefile:3: /Users/jonathanlim/Documents/Compiler/libflex/config.-.clang: No such file or directory
Makefile:4: config.-.clang: No such file or directory
make: *** No rule to make target `config.-.clang'. Stop.

Alternative syntax for defining physical laws

The current syntax for listing laws is

law {
    velocity = distance / time;
    acceleration = velocity / time;
    force = mass * acceleration;
    work = dot(force, displacement);
}

I think physical laws have a lot of structure, and in that structure there is semantic information that we are missing out on. I think a better alternate syntax would be

SimplePendulum : law(L: distance, period: time) =
{
       period = (4*Pi*Pi*L/g)^(1/2)
}

All statements followed by commas are implicitly in a conjunction.

This allows laws/invariants to be named (e.g., SimplePendulum), and also allows other laws to be referenced / instantiated:

DetailedPendulum : law(L: distance, period: time) =
{
       SimplePendulum(L, period),
       L > 0_m,
       L < 10_m,
}

ParseExpression should call Parse definition terminal

Also another question is about the following call sequence: parseExpression => parseTerm => parseFactor => parseTupleValue => parseIdentifierOrNil => noisyParseIdentifierOrNilList => noisyParseIdentifierOrNil =>noisyParseIdentifierDefinitionTerminal

but isn't parseExpression supposed to call parseIdentifierUsageTerminal?? Iโ€™m slightly confused about this. In Noisy statements, expr comes after assignOp. I understand that in LHS, it can call noisyParseIdentifierDefinitionTerminal

Support for dot product, cross product, integral, derivative operators in Noisy

This is needed if Newton wants to support vector operators and time operators.

Vectors are a commonplace mathematical way to express physical quantities. If there are multiple sensors that describe the same Physics, such as acceleration x, y, z, and we want to support operations like dot product, we need a way to support this.

5. Develop the 12 or more example Newton descriptions.

Make sure the Newton compiler parses the 24 examples and API calls on those examples are correct. Write test driver code to test the 24 examples. This means I should pass Evaluation metric 1, 2, 3, and 4.

by end of early May

Alternative syntax for defining base dimensions / signals

Rather than

dimensionTypeNames { time = "s"; }

I think a more readable syntax would be:

time : signal =
{
        name = "second" EN-US
        symbol = "s";
        derivation = none;
}

time : signal =
{
        name = "hour" EN-US
        symbol = "hr";
        derivation = time/60;
}

distance : signal =
{
        name = "meter" EN-US
        symbol = "m";
        derivation = none;
}

mass : signal =
{
        name = "kilogram" EN-US
        symbol = "kg";
        derivation = none;
}

...

temperature : signal =
{
        name = "kelvin" EN-US
        symbol = "K";
        derivation = none;
}

temperature : signal =
{
        name = "celsius" EN-US
        symbol = "C";
        derivation = temperature-273;
}

...

pressure : signal
{
        name = "Pascal" EN-US
        units = "Pa";
        derivation = mass*(1/distance)*(1/time)*(1/time);
}

The above definition for the concept of time has the advantage that is reads more or less as "The identifier time is of type signal and is defined as having name "seconds" in language localization EN-US, units s (regardless of language) and derivation none (i.e., it is a base concept)."

Separating the definitions for the name and units provides more semantic information and clarity. More information: Because we now specify that "seconds" is just the US-english name, allowing tools to do auto translation. And, more clarity, because we separate the symbol (s) from the core concept (time as a base quantity that might be used in defining the derivation for pressure).

We can then define multiple derivations/versions of time, but only one should have derivation = none, and that should be for the base / SI unit.

For a given declaration in Noisy, we could then have variable : int temperature("celcius") to denote the type and conversion relative to base units.

Similarly, in the new syntax example above, the block for pressure at the end of the code block can be read as "The identifier pressure is a signal and is defined as having name "Pascal" in language localization EN-US and units Pa, and is derived as mass*(1/distance)*(1/time)*(1/time)".

Note that, in the above definition, I think the derivation should be in terms of the signal type names like time, mass etc, rather than in terms of their units (i.e., "s", "kg", etc.). In other words, I much prefer the above definition of pressure to, say, the following:

pressure : signal
{
        units = "Pa";
        derivation = kg*(1/m)*(1/s)*(1/s);
}

You can then use this single syntax for what is now handled separately in dimensionTypeNames and dimensionAliases...

Implement type checker

Implement type checker (complete implementation and create a pull request to merge into master branch).

Tstring vs TstringConst

Consider changing Tstring to TstringTypeName
Tint to TintTypeName

The issue was that kNewtonIrNodeType_TstringConst is literally the string string, and kNewtonIrNodeType_Tstring just means the data type is string

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.