klamt-lab / autopacmen Goto Github PK
View Code? Open in Web Editor NEWRetrieves kcat data and adds protein allocation constraints to stoichiometric metabolic models according to the sMOMENT method
License: Apache License 2.0
Retrieves kcat data and adds protein allocation constraints to stoichiometric metabolic models according to the sMOMENT method
License: Apache License 2.0
Hey!
When running the function "get_initial_spreadsheets_with_sbml" (I'm calling the script directly, as I want my pipeline fully automated), there's a somewhat confusing print towards the end:
NOTE: project_name_protein_data.xlsx has as default value for the enzyme pool P 0.095 mmol/gDW.
I was under the impression that the enzyme pool is given in g/gDW, and indeed that is what is reported in the resulting excel file, so I assume this is just a minor typo, and should therefore be an easy fix :)
Cheers!
When running pip install autopacmen-Paulocracy
, all dependencies are installed as well as the package metadata. However, none of the actual scripts in the package are installed. When checking the Pypi index and downloading the sources, it turns out that the sources from Pypi do not include the scripts, only the package metadata. The same problem appears when cloning this repository and using pip
to install the package locally.
Hi @Paulocracy,
I would like to say congratulations on this an absolutely fantastic piece of code/paper you published :D
I have however ran into a problem with the kcat_database_combined.json file that is generated from the create_combined_kcat_database.py function. It creates kcat values for the species Salmonella enterica subsp. enterica serovar Typhimurium A0A0F6B484. This entry is not in the cache/ncbi_taxonomy folder. This causes an error later on when I use the function...
get_reactions_kcat_mapping(sbml_path, project_folder, project_name, organism, kcat_database_path, protein_kcat_database_path)
It runs halfway before raising the error 'OSError: [Errno 22] Invalid argument: C:/file_path/cache/ncbi_taxonomy/Salmonella enterica subsp. enterica serovar Typhimurium A0A0F6B484'.
It is possible to solve and allow complete model construction by manually deleting the 'Salmonella enterica subsp. enterica serovar Typhimurium A0A0F6B484' entries in the kcat_database_combined.json file.
Whilst it is possible to work around this, I thought it would be worth raising this for anyone else whom may have this problem.
Keep up the good work :D
When using get_reactions_kcat_mapping
, you may not have any user-defined protein database. Hence, requiring the argument protein_kcat_database_path
to have a value is therefore counterintuitive. From inspecting the function, I find that supplying an empty string as the value will make the function ignore the database. I therefore suggest having the empty string as the default argument for protein_kcat_database_path
.
In line 45 of data_parse_brenda_textfile.py
, there is a bug where the click options type=click.Path(exists=True, dir_okay=True)
force the output file to exist even though the purpose of the function is to create it. Hence, trying to run the command results in the error: type=click.Path(file_okay=True, dir_okay=True, writable=True)
. I suggest changing these options to type=click.Path(file_okay=True, dir_okay=True, writable=True)
as in data_parse_brenda_json_for_model.py
.
when calling data_parse_brenda_textfile.py I get an error due to missing variable declaration in submodules/parse_brenda_textfile.py :
File "autopacmen/autopacmen/submodules/parse_brenda_textfile.py", line 170, in parse_brenda_textfile
word = word.replace("\t", "")
UnboundLocalError: local variable 'word' referenced before assignment
File "autopacmen/autopacmen/submodules/parse_brenda_textfile.py", line 170, in parse_brenda_textfile
word = word.replace("\t", "")
UnboundLocalError: local variable 'word' referenced before assignment
Hi!
After successfully installing autopacmen-Paulocracy using pip I cannot import the package.
import autopacmen-Paulocracy
produces a syntax error due to the hyphen. Attempting to import the package otherwise such as
impo = import("autopacemn-Paulocracy")
produces a ModuleNotFoundError.
How does one access the module after a pip install?
This an error which is related to issues #15 and #12. With the most recent versions of autoPACMEN
and dependent packages, get_initial_spreadsheets_with_sbml()
produces an enzyme stoichiometry spreadsheet which looks like this:
As you can see, gene annotations which consist of zero or one genes are surrounded by brackets with quotes. This causes problems in create_smoment_model_reaction_wise_with_sbml()
which does not recognize the brackets and quotes. Previously, I used older package versions (I can't recall the exact numbers), I instead got the expected behavior which excludes reactions without annotations and writes annotations with a single gene without brackets and quotes. I assume this bug is caused by changes in either autoPACMEN
or xlsxwriter
, but I have not yet got time to investigate thoroughly.
Hey @Paulocracy,
I am trying to run the sMOMENT model generation and get a KeyError and a "need to pass list" warning during the call to modeling_create_smoment_model.py:
/home/miniconda3/envs/apacmen/lib/python3.8/site-packages/cobra/core/group.py:107: UserWarning: need to pass in a list
warn("need to pass in a list")
Traceback (most recent call last):
File "modeling_create_smoment_model.py", line 93, in
create_smoment_model_cli()
File "/home/miniconda3/envs/apacmen/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/miniconda3/envs/apacmen/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/miniconda3/envs/apacmen/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/miniconda3/envs/apacmen/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "modeling_create_smoment_model.py", line 84, in create_smoment_model_cli
create_smoment_model_reaction_wise_with_sbml(input_sbml_path, output_sbml_name, project_folder, project_name,
File "/home/Software/autopacmen/autopacmen/submodules/create_smoment_model_reaction_wise.py", line 337, in create_smoment_model_reaction_wise_with_sbml
create_smoment_model_reaction_wise(model, output_sbml_name,
File "/home/Software/autopacmen/autopacmen/submodules/create_smoment_model_reaction_wise.py", line 267, in create_smoment_model_reaction_wise
number_units = reaction_id_gene_rules_protein_stoichiometry_mapping[
KeyError: 'A0A2K3DYJ4'
Since this is the first reaction linked to a single gene / homoeric enzyme I think it has to do something with the parsing of GPR rules. Thanks a lot for designing this tool and for your help!
Hi @Paulocracy,
in the updated version of "autopacmen/submodules/get_initial_spreadsheets.py" single enzyme names are stored as lists and contain brackets inside the excel sheet.
This later on produces an Error, when trying to read single enzyme names.
I am a bit puzzled by the method to determine protein mass from UniProt (https://github.com/ARB-Lab/autopacmen/blob/69a158003d5bab3f597ec5da727515d250f35a43/autopacmen/submodules/get_protein_mass_mapping.py#L133). First the UniProt is queried for the amino acid sequence and then the amino acid sequence is analyzed for molecular mass. However, UniProt can be queried directly for the mass such as (https://www.uniprot.org/uniprot/?query=HXKA_YEAST%20OR%20G6PI_YEAST&format=tab&columns=id,mass). Why is not this simpler approach used in AutoPacmen? Beware though, UniProt outputs the mass with comma as a thousand separator, so you have to write something like float(mass.replace(',','')
to parse the result.
From my understanding, Uniprot's web API has changed from the time autoPACMEN was created and I used it. For this reason the Uniprot ID -> Protein mass mapping no longer works (https://github.com/klamt-lab/autopacmen/blob/cb828391d4cbb17e50ba9752cc974d78775d836d/autopacmen/submodules/get_protein_mass_mapping.py#L116C8-L116C105). However, I one of my group members have come up with a suggested solution; this is: Replace
uniprot_query_url = f"https://www.uniprot.org/uniprot/?query={query}&format=tab&columns=id,mass"
with
uniprot_query_url = f"https://rest.uniprot.org/uniprotkb/search?query=accession:{query}&format=tsv&fields=accession,mass"
I have tested both approaches in the web browser and can confirm that the old query string does no longer work, whereas the newly suggested one does.
I have noticed a some strange code related to lookup of k-cat values
autopacmen/autopacmen/submodules/get_reactions_kcat_mapping.py
Lines 64 to 67 in 180e382
I believe this block of code is there in order to ensure that forward and backwards k-cat values are not mixed together. Note however that the same action is taken regardless of the truth value of kcat_direction == searched_direction == "forward"
. I suggest that the code should have been
# Ensures that only kcat values for the same direction
# is used
if kcat_direction == searched_direction:
max_kcats.append(max_kcat)
However, I ask you to double check this as I might be missing something.
Hello :)
I would like to integrate proteomics data. But it seems that the reactions that have isoenzymes are not split correctly. I saw it with my metabolic model, but also in the model provided here "iJO1366_sMOMENT_2019_06_25_GECKO_ANALOGON.xml". For example the reaction PFK is split into PFK_GPRSPLIT_1 and PFK_GPRSPLIT_2. Each reaction should have only one of the isoenzymes in the reactants but the reactions are identical and they contain both isoenzymes and also the protein_pool.
<listOfReactants>
<speciesReference species="M_atp_c" stoichiometry="1" constant="true"/>
<speciesReference species="M_f6p_c" stoichiometry="1" constant="true"/>
<speciesReference species="M_ENZYME_b3916" stoichiometry="3.77565590812736e-06" constant="true"/>
<speciesReference species="M_ENZYME_b1723" stoichiometry="3.77565590812736e-06" constant="true"/>
<speciesReference species="M_prot_pool" stoichiometry="0.000122541206209238" constant="true"/>
</listOfReactants>
<listOfProducts>
<speciesReference species="M_adp_c" stoichiometry="1" constant="true"/>
<speciesReference species="M_fdp_c" stoichiometry="1" constant="true"/>
<speciesReference species="M_h_c" stoichiometry="1" constant="true"/>
<speciesReference species="M_armm_PFK" stoichiometry="1" constant="true"/>
</listOfProducts>
I have been trying to figure out how to fix it and one problem I found is in the script create_smoment_model_reaction_wise.py.
The function get_model_with_separated_measured_enzyme_reactions()
updates the objects reaction_id_gene_rules_mapping
and reaction_id_gene_rules_protein_stoichiometry_mapping
where I assume it creates new gene rules for these split reactions.
However, later in the script in the main loop, the string "_GPRSPLIT_" is removed from the reaction_id
and then gene_rules are retrieved for the original reaction instead of the split reactions.
Another problem seems to be that it always adds the protein_pool metabolite even if there is proteomics data available. I think this is because of the line 304 reaction.add_metabolites(metabolites)
. It is always run even if the reaction already contains the individual enzyme. Would it make sense to add a condition that if all proteomics data is available, this line would not be run?
Thanks in advance :)
There is a bug where organism names of the BRENDA database are not parsed correctly from the .txt file if they contain a tab ("\t").
For example:
organism line:
"Salmonella enterica subsp. enterica serovar Typhimurium A0A0F6B484 \tand A0A0F6B483 UniProt <138>"
result:
"Salmonella enterica subsp. enterica serovar Typhimurium A0A0F6B484 \tand".
Expected result:
"Salmonella enterica subsp. enterica serovar Typhimurium"
When running the function get_protein_mass_mapping_with_sbml
I got the following error: KeyError: 'G8ZSL3'
at line 141 in get_protein_mass_mapping.py
. The key error does not trigger on all queries, but it does on the query with UniProt ID G8ZSL3. After a manual query (https://www.uniprot.org/uniprot/?query=G8ZSL3&format=tab&columns=id,sequence,mass), we see that this accession ID is valid, but has two amino acids ambiguties. Hence ProteinAnalysis
will yield a ValueError
which is handled by progressing to the next iteration of the loop. However, the dictionary uniprot_id_protein_mass_mapping
does not get updated with the key 'G8ZSL3'
, which triggers the error when trying to access the entry later. I suggest you instead implement the solution I suggested in #8 (comment), because even though two amino acids are ambigous, UniProt can still find a reasonable protein mass. Alternatively, you could replace the ambigous amino acids from the string before feeding it into ProteinAnalysis
, which will still give a good esimate as ambigious amino acids consititue only small fractions of proteins.
https://github.com/ARB-Lab/autopacmen/blob/69a158003d5bab3f597ec5da727515d250f35a43/autopacmen/submodules/get_protein_mass_mapping.py#L133
This is an issue I think is due to updates in cobra
. With cobra
version 0.21.0 create_smoment_model_reaction_wise_with_sbml()
runs in a reasonable amount of time. However, once I use an environment with cobra
version 0.26.3, the same function consumed a prohibiting amount of time. After drilling into the details, I realized that the time was spent deepcopying metabolic reactions
From my understanding, the autoPACMEN
code has not changed on this point, but cobra
probably has changed its procedures for copying reactions, leading to a very slow recursive process. As a side note, I have experienced cobrapy to be slow and cumbersome for modifying models and their components. There exist an alternative metabolic modeling package named reframed which I have used to resolve such problems, but due to technological dept, replacing cobra
with reframed
would take a considerable effort.
Hi @Paulocracy,
I'd like to note that there might be a misleading description for one of the input values in the "protein_data.xlsx" excel sheet.
The description "Fraction of masses of model-included enzymes in comparison to all enzymes" (first sheet) made me think, that this is the sum of all model included protein masses divided by the total mass of all proteins of an organism.
But according to the GECKO Appendix (section 2.5) it is something like the "fraction of unmeasured proteins of the model compared to all unmatched proteins" (and that's also kind of the term you used in your code).
I hope I'm not confusing anything, but this got me puzzled :D
Hello! I just started using the package and I am trying to apply it to my model. But I noticed that the gene rules that I get in the "...enzyme_stoichiometries.xlsx" file are not interpreted correctly (they still contain some "and"). The problem seems to be the function _gene_rule_as_list().
I tested the function alone and it seems that it does not work with the example provided.
"(b0001 or b0002) and b0003" is returned as ['b0001', 'b0002 and b0003'] and not as [["b0001", "b0002"], "b0003"]
from typing import Any, Dict, List, Union
def _gene_rule_as_list(gene_rule: str) -> List[Any]:
"""Returns a given string gene rule in list form.
I.e. (b0001 or b0002) and b0003 is returned as
[["b0001", "b0002"], "b0003"]
Arguments:
*gene_rule: str ~ The gene rule which shall be converted into the list form.
"""
# Gene rules: Only ) or (, (in blocks only and); No ) and (
gene_rule_blocks = gene_rule.split(" ) or ( ")
gene_rule_blocks = [x.replace("(", "").replace(")", "") for x in gene_rule_blocks]
gene_rules_array: List[Union[str, List[str]]] = []
for block in gene_rule_blocks:
if " or " in block:
block_list = block.split(" or ")
block_list = [x.lstrip().rstrip() for x in block_list]
gene_rules_array += block_list
elif " and " in block:
block_list = block.split(" and ")
block_list = [x.lstrip().rstrip() for x in block_list]
gene_rules_array.append(block_list)
else: # single enzyme
gene_rules_array.append(block)
return gene_rules_array
gene_rule = "(b0001 or b0002) and b0003"
_gene_rule_as_list(gene_rule)
This outputs:
['b0001', 'b0002 and b0003']
I hope I did not accidentally mess something up :) I would appreciate any help with this. Thank you!
I got KeyError: '3.6.1.40.5'
in line 69 of create_combined_kcat_database.py
. Tracing back the error, I observed that searching for EC-number 3.6.1.40.5 in Sabio-RK with autopacmen does not give any results on any wildcard level. The last lines of output from create_combined_kcat_database
may shed more light on the problem:
Wildcard level 3... ['3.6.*.*.*'] Performing query [{'ECNumber': '3.6.*.*.*', 'Parametertype': 'kcat', 'EnzymeType': 'wildtype'}]... SABIO-RK API error with query: ((ECNumber:3.6.*.*.* AND Parametertype:kcat AND EnzymeType:wildtype)) Wildcard level 4... ['3.*.*.*.*'] Performing query [{'ECNumber': '3.*.*.*.*', 'Parametertype': 'kcat', 'EnzymeType': 'wildtype'}]... SABIO-RK API error with query: ((ECNumber:3.*.*.*.* AND Parametertype:kcat AND EnzymeType:wildtype))
Sabio-RK has of course entries for these high wildcard levels, but there might just be too many of them for the API of return any results. This mean that even with the wildcard search, you may expect to have some EC-numbers to which no entry is obtainable. Consequently, we must handle the case where a Sabio-RK entry is not available for the combined database.
Hi!
After downloading the 'brenda_download.txt' and saving it in the '/ecModel_2019_06_25_input'-folder and running ec_model_2019_06_25_sMOMENT_iJO_CREATION.py, I get the following error message.
Traceback (most recent call last):
File "C:/Users/cga32/OneDrive/autopacmen-master/autopacmen/ec_model_2019_06_25_sMOMENT_iJO_CREATION.py", line 46, in <module>
parse_brenda_textfile(brenda_textfile_path, bigg_metabolites_json_folder, json_output_path)
File "C:\Users\cga32\OneDrive\autopacmen-master\autopacmen\submodules\parse_brenda_textfile.py", line 84, in parse_brenda_textfile
lines = f.readlines()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4133: character maps to <undefined>
I figured I could fix this by adding "encoding='utf-8' in the open function in parse_brenda_textfile.py at line 83.
However, doing this leads to another error:
Traceback (most recent call last):
File "C:/Users/cga32/OneDrive/autopacmen-master/autopacmen/ec_model_2019_06_25_sMOMENT_iJO_CREATION.py", line 46, in <module>
parse_brenda_textfile(brenda_textfile_path, bigg_metabolites_json_folder, json_output_path)
File "C:\Users\cga32\OneDrive\autopacmen-master\autopacmen\submodules\parse_brenda_textfile.py", line 147, in parse_brenda_textfile
ec_number.lower().split("(transferred to ec")[1].replace(")", "").lstrip()
IndexError: list index out of range
The script should run smoothly, so would anyone have any idea what is going wrong here? Maybe the encoding should be something else than utf-8, but from what I've understood the brenda_download.txt file is in utf-8 format.
Hello,
This might be a bit early to report an issue, however I am really interested in the construction of an enzyme constrained model for yeast using autopacmen. Even though I am not very familiar with the python environment, I was able to follow your manual up to some point (thank you for the clear explanations!), unfortunately I've encountered several small errors that you might want to fix since you consider the autopacmen as an extension for cobrapy:
(skip bullets for the main question)
data_parse_
in their names (except data_parse_bigg_metabolites_file
) require an output file to be in the path beforehand. They overwrite the file, but throw an error about the path if file is not there in the first place.usage: data_parse_brenda_textfile.py
example uses a different script data_parse_bigg_metabolites_file
. This is probably a copy/paste typo.type_of_kcat_selection
is missing in the function get_reactions_kcat_mapping()
in the script modeling_get_reactions_kcat_mapping
. I was able to add the parameter and continue without a problem.data_create_combined_kcat_database
asks for a "BRENDA path" input instead of an "output path" if you run the python script without parameters in the terminal.modeling_create_smoment_model
, asks an input for "SBML name" for a second time, instead of "excluded reactions".This was the point that I could not continue further because I got several errors in modeling_create_smoment_model
. As you can see from the title, it throws an error for my model:
File "~/autopacmen/autopacmen/submodules/helper_general.py", line 243, in get_float_cell_value
cell_value = cell_value.replace(",", ".")
AttributeError: 'NoneType' object has no attribute 'replace'
I have obtained this error using Python 3.7.5 on Linux (Ubuntu 18.04.4 LTS). I have tried to modify get_float_cell_value
to bypass replace
for NoneType
objects, unfortunately my solutions failed in the downstream (mostly in the function add_prot_pool_reaction
). As I mentioned before, I am not good at Python programming, so my solutions can be considered weak. There are several metabolites and enzymes in my model that no information available in the databases (retrival scripts showed NA's and warnings for them). I believe these should not cause a problem, therefore I am asking for a solution.
Sorry if I am not clear or went wrong. I hope these small problems I reported above help you to enhance autopacmen and you can provide a generalized solution for my problem soon.
Thank you in advance,
Handan
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.