Coder Social home page Coder Social logo

robaina / brendapyrser Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 4.0 719 KB

A Python parser for the BRENDA database

License: Apache License 2.0

Python 93.64% TeX 6.36%
biochemistry brenda computational-biology metabolism bioinformatics enzyme enzyme-kinetics python

brendapyrser's Introduction

Hi 👋,

I'm Semi. Here you'll find source code for packages, research scripts, and fun side projects I have worked on.

About me

  • 💼 Principal Systems Modelling Scientist at New Atlantis Labs

  • ❤️ Python, scientific programming, open source, machine learning, bioinformatics, computational biology, building web apps

  • 💬 Open to collaborations!

GitHub Stats

Github stats

Languages and Tools:

Top Langs

Currently working on:

TODO Project Board

Badges:

Get in touch:

personal website Twitter LinkedIn

brendapyrser's People

Contributors

mistyfield avatar robaina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

brendapyrser's Issues

Encoding problem with reading files downloaded from brenda

The .txt file downloaded from http:/www.brenda-enzymes.org is encoded in utf-8. But in the line 76 of parser.py it's:
with open(path_to_database, encoding="iso-8859-1") as file:
maybe it should be changed into:
with open(path_to_database, encoding="utf-8") as file:?
because if the file was decoded in iso-8859-1, the meta of the reaction may become
'#4# pH 7.0, 30°C, recombinantwild-type enzyme <6>; #4# pH 7.0, 30°C, recombinant free enzyme <8>'
but if it was changed into utf-8, it turns normal: '#4# pH 7.0, 30°C, recombinantwild-type enzyme <6>; #4# pH 7.0, 30°C, recombinant free enzyme <8>'

r.KMvalues.filter_by_organism seems to make to difference between KKM and KM

Hello,

I have an impression that in some cases I got KKM in addition to KM while doing the following:

brenda = BRENDA("brenda_download_01mar22.txt") r = brenda.reactions.get_by_id('1.8.1.4') tmp = [] tmp.append(r.KMvalues.filter_by_organism('Homo sapiens'))

As a part of the output of the code above, I see the following:
{'value': 4731.6, 'species': ['Homo sapiens'], 'meta': '#80# wild-type enzyme, pH 8.0, 37°C <135,136>', 'refs': []}, {'value': 4731.6, 'species': ['Homo sapiens'], 'meta': '#80# wild-type enzyme, pH 8.0, 37°C <135,136>', 'refs': []}, {'value': 3404.2, 'species': ['Homo sapiens'], 'meta': '#80# enzyme mutant P156A, pH 8.0, 37°C <135>', 'refs': []}, {'value': 1497.6, 'species': ['Homo sapiens'], 'meta': '#80# enzyme mutant P303A, pH 8.0, 37°C <135>', 'refs': []}]}

When I grep brenda_download_01mar22.txt for these numbers, e.g.
grep "4731.6" brenda_download_01mar22.txt

I get:
KKM #80# 4731.6 {NAD+} (#80# wild-type enzyme, pH 8.0, 37°C <135,136>)
KKM #80# 4731.6 {NAD+} (#80# wild-type enzyme, pH 8.0, 37°C <135,136>)
Biochim. Biophys. Acta (1973) 309, 289-295. {Pubmed:4731963} (c)
(1973) 309, 307-317. {Pubmed:4731964}
{Pubmed:4731965}
tomatoes. Biochim. Biophys. Acta (1973) 309, 363-369. {Pubmed:4731966}
(2002) 132, 935-943. {Pubmed:12473196}
Mycoplasma pneumoniae. Protein J. (2008) 27, 303-308. {Pubmed:18473156}

and similarly only KKM on
grep "3404.2" brenda_download_01mar22.txt
KKM #80# 3404.2 {NAD+} (#80# enzyme mutant P156A, pH 8.0, 37°C <135>) <135>
Biochem. Parasitol. (1999) 99, 167-181. {Pubmed:10340482} (c)

and similarly only KKM on
grep "1497.6" brenda_download_01mar22.txt

KKM #80# 1497.6 {NAD+} (#80# enzyme mutant P303A, pH 8.0, 37°C <135>) <135>
Biofuels (2019) 12, 208. {Pubmed:31497068}
Biofuels (2019) 12, 208. {Pubmed:31497068}
Chem. Commun. (Camb. ) (2004) 5, 592-3. {Pubmed:14973623}
(2013) 169, 77-87. {Pubmed:23149716}
{Pubmed:23149756}
37, 230-237. {Pubmed:14972646}
J. Antimicrob. Agents (2011) 37, 585-587. {Pubmed:21497068}
reticulum. Eur. J. Biochem. (1975) 51, 353-361. {Pubmed:1149736} (c)
Biosci. (2004) 9, 1944-1953. {Pubmed:14977600}
1016-1025. {Pubmed:11497462}
54, 175-184. {Pubmed:1149746} (c)
erythrocytes. Anal. Biochem. (1984) 141, 510-514. {Pubmed:6149706} (c)

Also, I noticed that if I comment out "[(BRENDA_KMs < 1000) & (BRENDA_KMs >= 0)]" of the example from Readme:

BRENDA_KMs = np.array([v for r in brenda.reactions
                       for v in r.KMvalues.get_values()])
values = BRENDA_KMs #[(BRENDA_KMs < 1000) & (BRENDA_KMs >= 0)]
import matplotlib.pyplot as plt

plt.hist(values)
plt.title(f'Median KM value: {np.median(values)}')
plt.xlabel('KM (mM)')
plt.show()
print(f'Minimum and maximum values in database: {values.min()} mM, {values.max()} mM')

Then min and max from Brenda txt file downloaded on 1st Mar 2022 are:
Minimum and maximum values in database: -1995000000.0 mM, 2100000000.0 mM
which don't look like correct Km mM values

Am I missing something or is it a bug?

Thank you in advance for addressing this issue

Re-implement BRENDApyrser with new BRENDA json

BRENDA started releasing the database in JSON format a while ago, that is, in addition to the old txt. This addition enormously facilitates the parsing of the database. It makes sense to update the code to parse the JSON instead of the old txt file...

species and refs are empty lists in some of r.KMvalues.items() while it doesn't look like this in the original txt-data and in the online version of Brenda

Hello,

Sorry to bother you again. I'm not sure, but probably I noticed something unexpected again. Here is an example:

`brenda = BRENDA(brenda_txt_file)

for r in brenda.reactions:
if r.ec_number == "1.14.18.9":
for met,vals in r.KMvalues.items():
print("EC:",r.ec_number, "metab:", met, "data:",str(vals))`

The output of which has empty lists in species and refs:
EC: 1.14.18.9 metab: 4alpha-methyl-5alpha-cholestan-3beta-ol data: [{'value': 0.125, 'species': [], 'meta': '#2# at pH 7.4 and37°C <11>', 'refs': []}]

However, when I go to both Brenda-txt and online version of Brenda, I see that this Km value is from Rattus norvegicus.
I didn't go into the details of your code, but probably empty species output is due to slightly different line pattern for this case (as well as some other cases): "PR #2# Rattus norvegicus (#2# isoform RBCK1 <1>) <1,2,11,12,13>" [most lines of PR don't have a part with "(...)"]. Sorry if it's due to any intentional aspect to omit these data in your parser.

Thank you!

Best regards,
Polina

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.