robaina / brendapyrser Goto Github PK

View Code? Open in Web Editor NEW

17.0 2.0 4.0 719 KB

A Python parser for the BRENDA database

License: Apache License 2.0

Python 93.64% TeX 6.36%

biochemistry brenda computational-biology metabolism bioinformatics enzyme enzyme-kinetics python

brendapyrser's Introduction

Hi 👋,

I'm Semi. Here you'll find source code for packages, research scripts, and fun side projects I have worked on.

About me

💼 Principal Systems Modelling Scientist at New Atlantis Labs
❤️ Python, scientific programming, open source, machine learning, bioinformatics, computational biology, building web apps
💬 Open to collaborations!

GitHub Stats

Languages and Tools:

Currently working on:

TODO Project Board

Badges:

Get in touch:

brendapyrser's People

Contributors

Stargazers

Watchers

Forkers

aiyangchunhe mistyfield omagebright harmsm

brendapyrser's Issues

Refactor code base to meet standards

Encoding problem with reading files downloaded from brenda

The .txt file downloaded from http:/www.brenda-enzymes.org is encoded in utf-8. But in the line 76 of parser.py it's:
with open(path_to_database, encoding="iso-8859-1") as file:
maybe it should be changed into:
with open(path_to_database, encoding="utf-8") as file:?
because if the file was decoded in iso-8859-1, the meta of the reaction may become
'#4# pH 7.0, 30Â°C, recombinantwild-type enzyme <6>; #4# pH 7.0, 30Â°C, recombinant free enzyme <8>'
but if it was changed into utf-8, it turns normal: '#4# pH 7.0, 30°C, recombinantwild-type enzyme <6>; #4# pH 7.0, 30°C, recombinant free enzyme <8>'

r.KMvalues.filter_by_organism seems to make to difference between KKM and KM

Hello,

I have an impression that in some cases I got KKM in addition to KM while doing the following:

brenda = BRENDA("brenda_download_01mar22.txt") r = brenda.reactions.get_by_id('1.8.1.4') tmp = [] tmp.append(r.KMvalues.filter_by_organism('Homo sapiens'))

As a part of the output of the code above, I see the following:
{'value': 4731.6, 'species': ['Homo sapiens'], 'meta': '#80# wild-type enzyme, pH 8.0, 37Â°C <135,136>', 'refs': []}, {'value': 4731.6, 'species': ['Homo sapiens'], 'meta': '#80# wild-type enzyme, pH 8.0, 37Â°C <135,136>', 'refs': []}, {'value': 3404.2, 'species': ['Homo sapiens'], 'meta': '#80# enzyme mutant P156A, pH 8.0, 37Â°C <135>', 'refs': []}, {'value': 1497.6, 'species': ['Homo sapiens'], 'meta': '#80# enzyme mutant P303A, pH 8.0, 37Â°C <135>', 'refs': []}]}

When I grep brenda_download_01mar22.txt for these numbers, e.g.
grep "4731.6" brenda_download_01mar22.txt

I get:
KKM #80# 4731.6 {NAD+} (#80# wild-type enzyme, pH 8.0, 37°C <135,136>)
KKM #80# 4731.6 {NAD+} (#80# wild-type enzyme, pH 8.0, 37°C <135,136>)
Biochim. Biophys. Acta (1973) 309, 289-295. {Pubmed:4731963} (c)
(1973) 309, 307-317. {Pubmed:4731964}
{Pubmed:4731965}
tomatoes. Biochim. Biophys. Acta (1973) 309, 363-369. {Pubmed:4731966}
(2002) 132, 935-943. {Pubmed:12473196}
Mycoplasma pneumoniae. Protein J. (2008) 27, 303-308. {Pubmed:18473156}

and similarly only KKM on
grep "3404.2" brenda_download_01mar22.txt
KKM #80# 3404.2 {NAD+} (#80# enzyme mutant P156A, pH 8.0, 37°C <135>) <135>
Biochem. Parasitol. (1999) 99, 167-181. {Pubmed:10340482} (c)

and similarly only KKM on
grep "1497.6" brenda_download_01mar22.txt

KKM #80# 1497.6 {NAD+} (#80# enzyme mutant P303A, pH 8.0, 37°C <135>) <135>
Biofuels (2019) 12, 208. {Pubmed:31497068}
Biofuels (2019) 12, 208. {Pubmed:31497068}
Chem. Commun. (Camb. ) (2004) 5, 592-3. {Pubmed:14973623}
(2013) 169, 77-87. {Pubmed:23149716}
{Pubmed:23149756}
37, 230-237. {Pubmed:14972646}
J. Antimicrob. Agents (2011) 37, 585-587. {Pubmed:21497068}
reticulum. Eur. J. Biochem. (1975) 51, 353-361. {Pubmed:1149736} (c)
Biosci. (2004) 9, 1944-1953. {Pubmed:14977600}
1016-1025. {Pubmed:11497462}
54, 175-184. {Pubmed:1149746} (c)
erythrocytes. Anal. Biochem. (1984) 141, 510-514. {Pubmed:6149706} (c)

Also, I noticed that if I comment out "[(BRENDA_KMs < 1000) & (BRENDA_KMs >= 0)]" of the example from Readme:

BRENDA_KMs = np.array([v for r in brenda.reactions
                       for v in r.KMvalues.get_values()])
values = BRENDA_KMs #[(BRENDA_KMs < 1000) & (BRENDA_KMs >= 0)]
import matplotlib.pyplot as plt

plt.hist(values)
plt.title(f'Median KM value: {np.median(values)}')
plt.xlabel('KM (mM)')
plt.show()
print(f'Minimum and maximum values in database: {values.min()} mM, {values.max()} mM')

Then min and max from Brenda txt file downloaded on 1st Mar 2022 are:
Minimum and maximum values in database: -1995000000.0 mM, 2100000000.0 mM
which don't look like correct Km mM values

Am I missing something or is it a bug?

Thank you in advance for addressing this issue

the download link provided is deprecated

the link have no accessible file within

Filter reactions by specific substrate or product

Enable filtering reaction list by substrate or product (or both)

Re-implement BRENDApyrser with new BRENDA json

BRENDA started releasing the database in JSON format a while ago, that is, in addition to the old txt. This addition enormously facilitates the parsing of the database. It makes sense to update the code to parse the JSON instead of the old txt file...

species and refs are empty lists in some of r.KMvalues.items() while it doesn't look like this in the original txt-data and in the online version of Brenda

Hello,

Sorry to bother you again. I'm not sure, but probably I noticed something unexpected again. Here is an example:

`brenda = BRENDA(brenda_txt_file)

for r in brenda.reactions:
if r.ec_number == "1.14.18.9":
for met,vals in r.KMvalues.items():
print("EC:",r.ec_number, "metab:", met, "data:",str(vals))`

The output of which has empty lists in species and refs:
EC: 1.14.18.9 metab: 4alpha-methyl-5alpha-cholestan-3beta-ol data: [{'value': 0.125, 'species': [], 'meta': '#2# at pH 7.4 and37Â°C <11>', 'refs': []}]

However, when I go to both Brenda-txt and online version of Brenda, I see that this Km value is from Rattus norvegicus.
I didn't go into the details of your code, but probably empty species output is due to slightly different line pattern for this case (as well as some other cases): "PR #2# Rattus norvegicus (#2# isoform RBCK1 <1>) <1,2,11,12,13>" [most lines of PR don't have a part with "(...)"]. Sorry if it's due to any intentional aspect to omit these data in your parser.

Thank you!

Best regards,
Polina