ml4ai / delphi Goto Github PK
View Code? Open in Web Editor NEWFramework for assembling causal probabilistic models from text and software.
Home Page: http://ml4ai.github.io/delphi
License: Apache License 2.0
Framework for assembling causal probabilistic models from text and software.
Home Page: http://ml4ai.github.io/delphi
License: Apache License 2.0
If we are modifying OpenFortranParser to perform program analysis, we should keep the modified source code under version control as well.
This is how I understand how the system has to be run now:
Creation of the model:
root@59c89406cc39:/src/delphils# ./delphi.py --create_model --indra_statements data/sample_indra_statements.pkl --adjective_data data/adjectiveData.tsv --output_cag_json /out/testDelphiDanielJSON --output_dressed_cag /out/testDelphiDanielCAG --output_variables_path /out/testDelphiDanielVar
Execution of the model:
root@59c89406cc39:/src/delphils# ./delphi.py --execute_model --input_dressed_cag /out/testDelphiDanielCAG --input_variables_path /out/testDelphiDanielVar --output_sequences /out/DelphiSequencesResult.csv
The data is contained in shape files, so we will need to figure out how to use those.
The current structure of the pgm.json file is as follows:
{
"start": <name_of_PROGRAM_module>
"name": "pgm.json"
"dateCreated": <date_of_creation>
"functions": [<list_of_functions>]
}
This means that the "start" key is only created when there is a PROGRAM module in the FORTRAN file. Both the PETPT.for and PETASCE.for files do not have a PROGRAM module and only contain SUBROUTINES. For these files, a "start" field is not created and the pgm/lambdas generation script crashes.
For now, I will add an initial check where a search for this PROGRAM module is made and if not found, a dummy "start" field will be added. Moving forward, how can we represent such FORTRAN files?
It would be good to add tooltips to the edges for visualization purposes.
@adarshp : I'd like to kill the json_dev branch. Rather than merge with master (which is far ahead), I'm going to create a new branch called pa_dev which will be for program analysis. But before I go ahead an kill json_dev, just wanted to run by you.
Requested by @dgarijo from the ISI team.
@adarshp, I took a look at the spec for GrFN.
https://delphi.readthedocs.io/en/master/grfn_spec.html#top-level-grfn-specification
This is a great representation. Kind of like a higher level IR for translating between languages.
One thing I noticed is this note:
TODO: we think Fortran is restricted to integer values for iteration variables, which would include iteration over indexes into arrays. Need to double check this.
If the GrFN schema is going to work for multiple languages it is going to need to support iterator loops like C++, Python, Julia have.
I guess you could have int-loop and iterator loop or a per language loop construct.
getting a unique constraint error with experiment.id [SQL INSERT INTO experiment (... ) ...] if running multiple experiments (different sets of interventions) one after the other in the same server session
In order to connect Delphi to other workflows in MINT, we may need to run the model with a subset of variables of the original CAG. In those cases, it would be useful to write a function that takes in a list of variables, and removes nodes in the CAG that are not supposed to be in the CAG, prior to execution.
The lastUpdated
field of a CausalVariable should be the starting point/timestamp for the most recent intervention.
The program analysis code has only been tested on some fairly small programs. So it is likely that there are program constructs it has not seen before, such as arrays, strings and while loops. Handlers will need to be added into the program analysis code for these constructs.
I've already found need to refer to parts of the example DBN-JSON in the program_analysis_kickoff_demo jupyter notebook. Is it possible to add line numbers?
@pauldhein and I were discussing this a while ago - it occurred to us that since the output of the program analysis pipeline is a pickled Python object, there is no reason, in principle, why the lambda functions couldn't be pickled alongside the rest of the output. For example, the following function in PETPT_lambdas.py
:
def PETPT__lambda__TD_0(TMAX, TMIN):
TD = ((0.6*TMAX)+(0.4*TMIN))
return TD
Could be constructed as follows:
PETPT__lambda__TD_0 = eval("lambda TMAX, TMIN: ((0.6*TMAX)+(0.4*TMIN))")
Here, the string argument to eval
could be constructed in the same way the second line of the existing lambda functions in the lambdas.py
files are built up from parsing the XML AST output of OFP.
Alternatively (and this seems to me to be the right way), one could take advantage of type annotations and use the more powerful def
syntax for declaring functions -
exec("def PETPT__lambda__TD_0(TMAX: float, TMIN: float) -> float: return ((0.6*TMAX)+(0.4*TMIN))")
(assuming we can get these types - can we?)
and later the PETPT__lambda__TD_0
object can be used as a value in the dict
object produced by genPGM.py
.
Since functions are first class objects in Python, you can actually set attributes for functions as well - perhaps this might make it easier to keep track of things like the function type (assign
/lambda
/condition
, etc.), the reference, and so on:
PETPT__lambda__TD_0.fn_type = "lambda"
PETPT__lambda__TD_0.reference = 9
PETPT__lambda__TD_0.target = "TD"
And then if someone wants to serialize the GrFN object to a JSON file, we could define the following function:
import inspect
def to_json_serialized_dict(function):
return {
"name": function.__name__,
"type": function.fn_type,
"target": function.target,
"sources": inspect.signature(function)
# Plus some processing to massage the above into a JSON-serializable dict
...
}
Not super urgent but I do think that it might be a investment worth making to simplify things in the long run...
Right now, there is a 'delay' in the updating of nodes in a ProgramAnalysisGraph (which should be integrated better into the AnalysisGraph class) that makes it so that the 'downstream' outputs like YIELD_EST
are updated a couple of steps after the 'upstream' outputs like RAIN
(in the example crop_yield.f
. This results in the DAY
variable which serves as the loop index lagging behind by 2 compared to the FORTRAN program.
As suggested by @pauldhein , we should have a config file for Delphi where users can specify the locations of external resources.
Basically, this is the idea: an analyst should be able to 'set up' the workspace with a script before launching the visualizer/simulator. Thus, the app
object should be available to import, and app.run()
can be called after the setup to launch the app with the desired configuration.
Things that need to be able to be pulled into the workspace:
(Just realized I should have first created this as an issue!)
@stephensj2 : @pauldhein is making good progress getting the DBN-JSON to DBN graph wiring working. Paul identified a change in the DBN-JSON output that we'd like to ask you to make -- this is just for the loop_plate specification. Up to this point, my thought was that the "input" attribute of the loop_place spec should list all of the variables that are referenced within the loop_plate. It turns out, it is much more useful to Paul to have this be the list of variable names that are set in the scope (container fn) that the loop_plate appears within. And in this case, we don't need actual <veriable_reference>s (no need for the index info), just the <variable_name> (the base string name of the variable). I've update the description of the Function Loop Place Specification to reflect this (text in maroon).
Is it easy for you to make this change?
Appropriate handlers need to be created for library functions (such as read, write, etc) in the program analysis code. It is not clear to me at this moment if it is necessary they preserve the actual behavior of fortran's libraries though. By that, I mean I don't think delphi cares about the particulars of the call. I believe it only cares that the code is receiving input or producing output. If that is the case, rather than creating handler to translate fortran library calls into python calls, it may be possible to instead replace all input calls with a single call to a function like 'input' and all output calls to a function like 'output'. The user can then define what an input and output call is.
The program analysis code does not yet handle function returns. This is in part because the handling of such returns requires a few non-trivial additions. As a note, a good resource on functions can be found here:
https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/F90-Subprograms.pdf
The following additions must be made:
delphy/core.py
imports the class Influence
from indra.statements
module.
However, the latest indra version release, 1.5, which is the one that pip installs by default from PyPi, does not include this class, since this feature was introduced after the latest release.
I suggest the following as an interim solution:
requirements.txt
file to point at the indra git repository instead of Pypi.dependency_links
in setup.py
to make it install indra using git instead of PyPi.And, for later releases:
requirements.txt
and setup.py
to avoid confusions and prevent problems due to backwards incompatible changes in any of the dependencies.@JiamingHao I'm opening this issue for us to have a space to discuss the API implementation.
TODO: Make the following function name changes in delphi/export.py
:
to_json -> to_json_file
to_json_dict -> to_dict
We need to make the API documentation reasonably complete.
@dgarijo: Re: our email conversation - it is possible to set the output path of the result folder while creating the model using the --model_dir
flag. However, this is probably not so clear from the help message. In any case, I'll add flags to the delphi CLI to specify the separate locations of the output model files.
Todo: write functions to read DBN JSON from AST analysis and output DBN in delphi's internal representation.
I'm having some issues with global state in genPGM.py
- basically, the lambdas.py
file seems to change upon multiple runs of the same function with the same inputs.
Steps to reproduce (assuming you are in the Delphi repo root directory) -
cd delphi/program_analysis/autoTranslate
./autoTranslate ../../../data/program_analysis/crop_yield.f
python
Then in the Python interpreter, do:
>>> from delphi.program_analysis.autoTranslate.scripts.genPGM.import get_asts_from_files, create_pgm_dict
>>> asts = get_asts_from_files(['crop_yield.py'])
>>> pgm_dict = create_pgm_dict('lambdas.py', asts, 'pgm.json')
The lambdas.py
file is unchanged by this.
def UPDATE_EST__lambda__TOTAL_RAIN_0(TOTAL_RAIN, RAIN):
TOTAL_RAIN = (TOTAL_RAIN+RAIN)
return TOTAL_RAIN
def UPDATE_EST__lambda__IF_1_0(TOTAL_RAIN):
return (TOTAL_RAIN<=40)
def UPDATE_EST__lambda__YIELD_EST_0(TOTAL_RAIN):
YIELD_EST = (-((((TOTAL_RAIN-40)**2)/16))+100)
return YIELD_EST
def UPDATE_EST__lambda__YIELD_EST_1(TOTAL_RAIN):
YIELD_EST = (-(TOTAL_RAIN)+140)
return YIELD_EST
def CROP_YIELD__lambda__MAX_RAIN_0():
MAX_RAIN = 4.0
return MAX_RAIN
def CROP_YIELD__lambda__CONSISTENCY_0():
CONSISTENCY = 64.0
return CONSISTENCY
def CROP_YIELD__lambda__ABSORPTION_0():
ABSORPTION = 0.6
return ABSORPTION
def CROP_YIELD__lambda__YIELD_EST_0():
YIELD_EST = 0
return YIELD_EST
def CROP_YIELD__lambda__TOTAL_RAIN_0():
TOTAL_RAIN = 0
return TOTAL_RAIN
def CROP_YIELD__lambda__RAIN_0(DAY, CONSISTENCY, MAX_RAIN, ABSORPTION):
RAIN = ((-((((DAY-16)**2)/CONSISTENCY))+MAX_RAIN)*ABSORPTION)
return RAIN
However, upon calling this function a second time:
>>> pgm_dict = create_pgm_dict('lambdas.py', asts, 'pgm.json')
The numbers at the end of the function names in lambdas.py
get incremented by one.
def UPDATE_EST__lambda__TOTAL_RAIN_1(TOTAL_RAIN, RAIN):
TOTAL_RAIN = (TOTAL_RAIN+RAIN)
return TOTAL_RAIN
def UPDATE_EST__lambda__IF_1_1(TOTAL_RAIN):
return (TOTAL_RAIN<=40)
def UPDATE_EST__lambda__YIELD_EST_2(TOTAL_RAIN):
YIELD_EST = (-((((TOTAL_RAIN-40)**2)/16))+100)
return YIELD_EST
def UPDATE_EST__lambda__YIELD_EST_3(TOTAL_RAIN):
YIELD_EST = (-(TOTAL_RAIN)+140)
return YIELD_EST
def CROP_YIELD__lambda__MAX_RAIN_1():
MAX_RAIN = 4.0
return MAX_RAIN
def CROP_YIELD__lambda__CONSISTENCY_1():
CONSISTENCY = 64.0
return CONSISTENCY
def CROP_YIELD__lambda__ABSORPTION_1():
ABSORPTION = 0.6
return ABSORPTION
def CROP_YIELD__lambda__YIELD_EST_1():
YIELD_EST = 0
return YIELD_EST
def CROP_YIELD__lambda__TOTAL_RAIN_1():
TOTAL_RAIN = 0
return TOTAL_RAIN
def CROP_YIELD__lambda__RAIN_1(DAY, CONSISTENCY, MAX_RAIN, ABSORPTION):
RAIN = ((-((((DAY-16)**2)/CONSISTENCY))+MAX_RAIN)*ABSORPTION)
return RAIN
This side effect needs to be gotten rid of.
Replying to your comment, @cl4yton - we can't place docstrings in arbitrary places for Python to automatically process them, that's true. However, we can always modify the __doc__
attribute of objects to set their docstrings, for example:
def function_name():
pass
function_name.__doc__ = "docstring"
Pandas is listed twice as a dependendency in setup.py
I've been making up tests for various Fortran language constructs for2py will have to handle (currently: I/O and modules; soon: multi-dimensional arrays). Right now these tests are in a couple of different places: some are in delphi/tests/data
and some are in delphi/delphi/program_analysis/autoTranslate/tests/test_data/
. It would be good for all of these to live in the same place. Where should I put them?
To incorporate delphi into a workflow, the inputs and outputs must be explicitly specified - right now the path to the gradable adjective data file is hard-coded into the system. @dgarijo
We should move away from requiring intermediate files for program analysis and instead be writing functions to produce and consume Python internal data structures - this will help with integrating the program analysis components with the rest of Delphi.
Right now, Delphi uses data tables stored as plain text files to parameterize its models. However, this will not scale with increasing amounts of data. Another concern is minimization of git repo bloat. For these reasons, it might be good to have an online database (hosted on vision or a SISTA server) that Delphi can query programmatically. I'm leaning towards Neo4j since that's the DB system I have the most experience with.
Right now, all experiments are being saved to the database - need to change this behavior to only save the experiments that are successfully run.
While creating a fresh virtual env for delphi development using requirements, I noticed this was generated:
indra 1.7.0 has requirement networkx==1.11, but you'll have networkx 2.1 which is incompatible.
Is this a concern?
seaborn
is imported and used in delphi.views
but not listed as a dependency.
@adarshp : Posting as "question" for discussion, although I'm "stating" it here...
The program analysis project right now has the following components:
(1) Analyze fortran to map to python
(2) Analyze pythons AST to map to CAG with functions that can be input to delphi
(3) Sensitivity analysis of delphi CAG with functions
Item (3) will be in delphi (has general use).
For now, I'd like to put parts of (2) also in the delphi project, under the directory program_analysis/ (at project root, sibling to sensitivity). For now I'll keep this in the sensitivity branch. This means probably adding Jon Stephens to the project.
Long term: This may move out depending on whether we consider the python side of program analysis a component of delphi (which currently I'm ok with it)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.