zhanglab / psamm Goto Github PK
View Code? Open in Web Editor NEWCuration and analysis of metabolic models
Home Page: https://zhanglab.github.io/psamm/
License: GNU General Public License v3.0
Curation and analysis of metabolic models
Home Page: https://zhanglab.github.io/psamm/
License: GNU General Public License v3.0
Currently the gapfill
command will first determine all blocked compounds and then try to construct an extended model where all of the compounds are unblocked. This could be extended so that the user can specify a specific subset of the compounds to unblock.
The following model fails when running FBA. It appears that the 0.001
stoichiometric value for a
in the biomass
is parsed as a float
while the 0.5
stoichiometric value for a
in rxn_1
is parsed as Decimal
. When running FBA, these two values should be added to construct the LP equations but float
and Decimal
cannot be added. This would be solved if the stoichiometric values for the YAML reaction (biomass
in this case) were correctly parsed as Decimal
.
biomass: biomass
reactions:
- id: rxn_1
equation: (0.5) a => (0.5) c
- id: biomass
equation:
left:
- id: a
value: 0.001
right:
- id: b
value: 0.002
The media
key in model.yaml
is currently interpreted as a list of separate media (although only the first one is used, and a warning is generated if more than one is included). It has been proposed that instead the entries in the media
should be combined into one single medium. The medium definition could then be split into separate files based on a logical subdivision of the compounds.
Currently the set_offset()
method is used with the Cplex solver when setting the LP problem objective. Using the set_offset()
method is not strictly necessary to get the correct solution but does allow Cplex to report the correct objective value when a constant term is included in the objective. However, set_offset()
is only available starting from 12.6.2 which means that PSAMM fails to work with older versions of Cplex. We can extend compatibility to older versions of Cplex by only using set_offset()
when available.
The long term plan is to port to Python 3 but currently we are blocked by some external modules that don't support Python 3 yet:
The fastgapfill
command still has some hardcoded parameters: the weights of automatically generated reactions (transport/exchange), the epsilon parameter, and the model compartments. The model compartments are also needed for some other commands. It would be nice if these parameters could be specified through command line options instead of having to edit the python code manually. This should be done by adding parser arguments in the init_parser()
method and using those arguments stored in kwargs
in the __call__
method.
Currently the compartment must be specified twice:
extracellular: e
media:
- compartment: e
compounds:
- id: abc
Instead, the media should by default use the extracellular compartment for compounds.
Currently the format detection is implemented by regexp matching on the file path multiple places in the native
module. This should be refactored so that the same rules are applied consistently. In addition, we may want to allow the user to override the format detection by using a value for the format
key in the same dictionary as the include
key.
We have an implementation of GapFind and GapFill in python as well as the previous implementation in GAMS. Currently the GAMS implementation has been tested on a number of models and we are quite confident that it is working correctly (although it has inherent limitations) but we have not yet tested the python implementation to see if it gives the same result.
The python implementation should be tested on a couple of larger models to make sure that it gives the same result as the GAMS implementation. In addition it would be nice to have a couple of small test cases in tests/test_gapfill.py
that can be run automatically. These can be modeled on the existing tests in tests/test_fastcore.py
.
Hello to everybody,
when I try import SBML files in PSAMM the terminal not found the command psamm-import.
(PSAMM) arturo@arturo-HP:~/Modelos/PSAMM_pruebas$ psamm-import sbml --source e_coli_core.xml --dest E_coli_yaml
psamm-import: no se encontró la orden
There seems to be a performance regression in Cplex 12.6.2 compared to 12.6 with some of the MILP problems that we are using. Running FBA with thermodynamic constraints on the iJO1366 model takes more than 10 minutes with the new Cplex 12.6.2 but only ~12 seconds with 12.6. In comparison, the tFBA on iJO1366 takes ~1:30 minutes with Gurobi 6.0.4. The same performance degradation also happens in PSAMM 0.11 and 0.10.2.
We are using a customized (~8 lines of code) parser in a number of places to parse lists of reactions, or list of (reaction, value)-tuples. In one case, we are parsing a table containing reaction IDs in the first columns, and lower bounds in the second column. The optional third column contains the upper bound.
Common to all of these parsers is that comments, starting with '#', are filtered out. In addition, all kinds of whitespace should be acceptable as a column separator (this is so that columns can be lined up nicely). Both of these requirements make the built-in csv
module useless so we need to keep the existing solution, but we can factor out the code so we don't reimplement the same parser a number of times. The following shows one example of this type of parser, parsing the penalty weights for reactions:
for line in kwargs['penalty']:
line, _, comment = line.partition('#')
line = line.strip()
if line == '':
continue
rxnid, weight = line.split(None, 1)
weights[rxnid] = float(weight)
The new parser should at least be able to handle 1) files with only one column (e.g. model specification files) 2) files with two columns. The format must be specified when the parser is called. It would be nice if it could also handle the case where there are two columns and one optional third column. The new parser can be put in the util
module.
The medium table format supports dash (-) to indicate that the default value should be used. For consistency, this should be supported in the limits table too.
Currently, the code that is responsible for writing YAML model files is embedded in psamm-import. This means that API users will have to copy this code from psamm-import or reimplement it in order to write new model files or convert existing files. Since the YAML-model reading code is already in the main PSAMM package, it would make sense to also include the YAML-writing code.
Currently it is only possible to use the default value of the epsilon parameter for the gapfind and gapfill functions. Using the default value fails for some models that require very small fluxes to be viable.
Currently the gene associations are interleaved with the reaction database. This makes it hard to use shared reaction databases for different organisms. @keitht547 suggested moving the gene associations from the reaction database into the model specification (list of reactions in the model).
Currently to fix the flux of a reaction to a specific value it is necessary to specify a lower and upper bound of that specific value. Instead, we can allow a key fixed
as a shorthand for setting both lower and upper bound to the same value. When this is implemented, psamm-import should be changed to use this format.
The minimal SBML parser in the sbml
module recently broke because of refactoring in the code it depends on. This is bound to happen since the sbml
module does not have any test cases yet. To catch the majority of regressions, write a small test case where an SBML model is loaded using StringIO instead of a file (https://docs.python.org/2/library/stringio.html). The tests should check that the methods of SBMLDatabase
work as expected.
For fba
, robustness
, fva
and similar commands there is an option called --reaction
to select a different reaction to optimize than the biomass reaction specified in model.yaml
. The name of this option is not very descriptive. --objective
seems to be a better choice.
Other options: --maximize
, --optimize
, --biomass
...
Produce a warning during simulation (FBA, ...?) when a compound defined in the extracellular space does not have an exchange reaction. This warning would be silenced by adding the compound to the medium definition (possibly with bounds set to lower: 0, upper: 0 if no exchange in or out is desired).
To make a future transition to Python3 easier, it would be nice to have our Python scripts use the new print
function that was introduced in Python3: http://legacy.python.org/dev/peps/pep-3105/ It can be enabled in Python2 by adding from __future__ import print_function
at the top of the python file. In addition we have some cases where the write
-method of sys.stderr
is used directly. These should also be changed to use the new print
function as the print
function can take a file object to print to.
A statement like
print 'Two numbers: {} {}'.format(a, b)
should be changed to the function call (and adding from __future__ import print_function
at the top of the file)
print('Two numbers: {} {}'.format(a, b))
and a call to the write
method on sys.stderr
like
sys.stderr.write('Two numbers: {}, {}\n'.format(a, b))
should be changed to (notice that the explicit newline character at the end disappears)
print('Two numbers: {} {}'.format(a, b), file=sys.stderr)
Currently the Cplex and QSoptex solvers are supported. QSoptex is a special case since it is an exact solver, so this leaves Cplex as the only normal solver and also the only solver to support MILP problems. Cplex is proprietary, and although they give out free academic licenses it would be nice to have support for a free, open source solver. GLPK may be the best option.
Currently the code to load models in the table-based format is somewhat embedded in the metabolicmodel
and database
modules. This should be split off into a separate module since the internal representation should not depend on the external data format.
The documentation only includes information on the YAML-based medium format.
This would be used by the gap filling commands instead of assuming that the name is e
.
Is there an easy way to add the reactions reported by the gapfill or fastgapfill commands to a model? And do you need to add the artificial transporters and exchanges too?
We will probably need automated access to the KEGG reaction information some time in the near future. We are currently able to parse the KEGG reaction equation format, and we are also able to parse the file containing the information record on the compounds.
It would be nice to have a function in the kegg
module that similarly parses the reaction records into ReactionEntry
objects. The new function should be called parse_reaction_file
and should take a file object and return an iterator over ReactionEntry
objects. Since the file format is very similar, the code from the compound parser can be reused or even factored out into a common function. The ReactionEntry
object should expose all properties through a general interface (like CompoundEntry.__getitem__
) but can also include convenience access for other properties (like name
, enzymes
, formula
, etc. in CompoundEntry
). Specifically, the reaction pairs should be easily accessible.
When solving the LP10 problem in Fastcore with GLPK, the objective becomes very small so that GLPK seems to consider it equal to zero. It may be possible to solve this by reformulating LP10 to include the scaling within the problem, i.e. by multiplying all constraints on fluxes by the scaling.
NativeModel currently only receives exact file name as the parameter. For creating test cases, it's not flexible enough. So, in addition, it should be able to take a string or file object to parse it, or take a dict object and use it directly.
This command should delete a specific gene (or try all genes one-by-one?) and perform FBA on the resulting model.
Currently, the flux limits of the model is not exported to SBML when using the exportsbml
command. These limits could be encoded using the COBRA-compatible extension or using the level 3 package fbc
.
This issue was discovered when a user tried to use "no" as a compound name. The "no" results in a boolean value from the YAML parser instead of the expected "no" string. In the specific case the issue can be worked around by quoting the compound name with single quotes. There should be an error message and an explanation of the issue when a user tries to use a non-string type as an ID.
To be compatible with existing metabolic modeling software it would be useful to be able to export the format used in model_script to an SBML file. This would allow users to create a model using model_script and later export the data to use the tools from COBRA/COBRApy or to compare the results from model_script with those software packages.
Currently the CommandError
exception is available to signal that a command failed. However, this exception causes the argument parser to print out usage information which is only appropriate if the error was caused by incorrect arguments supplied by the user. In other cases, the command may wish to signal an error that does not cause usage information to be printed. Most commands where errors are possible either raise an exception or let an existing exception bubble up. This accomplishes the goal of exiting the command with an error code but does not provide a good error message to the user. Ideally, the stack trace of the exception should be logged (at "debug" level) and a good error message should be logged at "error" level.
If the gapfill command fails it may be necessary to run the command with a lower epsilon value. Currently an exception is raised by gapfind/gapfill which simply results in a stack trace being shown to the user. With #73 and #74 it should be possible to provide an error message to the user that explains that the user can try a lower epsilon value.
Currently, FVA, flux minimization and the consistency check functions in the fluxanalysis
module use an instantiation of FluxBalanceProblem
directly. It would be neat if these functions could accept an optional FluxBalanceProblem
instance as a parameter (instead of solver
) such that a FluxBalanceTDProblem
could be passed and we would automatically have access to the corresponding thermodynamically constrained functions.
It was proposed that additional features should be allowed in the medium table format. Currently the medium table format consists of 4 column (2 required, 2 optional), specifying compound ID, compartment, lower bound and upper bound. With this proposal, additional user defined properties should be allowed after the four existing columns.
The user defined properties should be parsed and be made accessible through the API. For example, with a user defined property "class":
compound compartment lower upper class
akg e 0 400 carbon-source
glcD e -10 - carbon-source
This would require that a header be added to the medium table format so that a key can be specified for each property. It would also change the format of the table file to be strictly tab-separated instead of being space-separated as it is now. A new class MediumEntry
can be defined so that the additional user defined properties can be held. The properties can be made available from the NativeModel
through parse_medium()
which would iterate over MediumEntry
objects instead of tuples.
The code for parsing reactions is currently embedded in the reaction
module. The internal representation does not really depend on the external data format, so these two parts can be split up. This will reduce the complexity of the reaction
module especially as the number of reaction parsers can grow in the future.
In a number of places we are raising generic Exception
s when an error is encountered. This is discouraged since these exceptions cannot be explicitly caught and this can hide errors when the exception is caught.
Go through all instances where we raise an Exception
and replace it with an instance of a more specific Exception
subclass. In some cases the built-in exceptions can be used (e.g. often it is appropriate to use ValueError
, IndexError
, etc.). If no built-in exception applies, we can create a specific one, e.g.
class FluxBalanceError(Exception):
'''Raised when an error occurs in solving a flux balance problem'''
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.