Coder Social home page Coder Social logo

knowledge-graph-hub / kg-microbe Goto Github PK

View Code? Open in Web Editor NEW
15.0 3.0 3.0 428.01 MB

Home Page: https://knowledge-graph-hub.github.io/kg-microbe/index.html

License: BSD 3-Clause "New" or "Revised" License

Python 19.46% Jupyter Notebook 80.30% Makefile 0.24%
knowledge-graph anatomical-knowledge chebi chemicals environments envo go media metabolism microbiology

kg-microbe's People

Contributors

bsantan avatar caufieldjh avatar cmungall avatar dependabot[bot] avatar hrshdhgd avatar realmarcin avatar wdduncan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kg-microbe's Issues

TypeError in TraitsTransform due to Series when str expected

Build currently encounters this problem:

14:33:09  Traceback (most recent call last):
14:33:09    File "run.py", line 121, in <module>
14:33:09      cli()
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
14:33:09      return self.main(*args, **kwargs)
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
14:33:09      rv = self.invoke(ctx)
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
14:33:09      return _process_result(sub_ctx.command.invoke(sub_ctx))
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
14:33:09      return ctx.invoke(self.callback, **ctx.params)
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
14:33:09      return __callback(*args, **kwargs)
14:33:09    File "run.py", line 69, in transform
14:33:09      kg_transform(*args, **kwargs)
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
14:33:09      t.run()
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform_utils/traits/traits.py", line 500, in run
14:33:09      write_node_edge_item(
14:33:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/utils/transform_utils.py", line 83, in write_node_edge_item
14:33:09      fh.write(sep.join(data) + "\n")
14:33:09  TypeError: sequence item 2: expected str instance, Series found

This function attempts to write a single tab-delimited list of strings to a file, but in this case, it can't join the data because it's passed a pandas Series.

Jenkins build error: can't find merged_graph_stats.yaml

On the most recent Jenkins build, the graph merge appears to complete as expected, but the KGX stats file isn't found:

17:57:46  [KGX][cli_utils.py][               merge] INFO: Merged graph has 2715138 nodes and 2979399 edges
18:00:07  [KGX][cli_utils.py][               merge] INFO: Writing merged graph to merged-kg-tsv
[Pipeline] sh
18:09:44  + cp merged_graph_stats.yaml merged_graph_stats_20230106.yaml
18:09:44  cp: cannot stat 'merged_graph_stats.yaml': No such file or directory

So either the stats file isn't created or the Jenkinsfile is set to look in the wrong place for it.

Ingest The Microbe Directory

A curated resource of information about 80,000 microbes. This includes viruses which we can deprioritize for now.

The source data tables from The Microbe Directory can be build using 'make' in this repository:
Data:
https://github.com/dcdanko/MD2

Paper:
https://dqo52087pnd5x.cloudfront.net/manuscripts/13832/792ee5f5-ccd8-43b5-8531-9dc29c1fa117_12772_-_christopher_mason.pdf?doi=10.12688/gatesopenres.12772.1&numberOfBrowsableCollections=3&numberOfBrowsableInstitutionalCollections=0&numberOfBrowsableGateways=8

Table 1: Inventory parameters and descriptions

Parameter
Definition and notes
Optimal pH
The optimal pH at which this species grows. If the species was not widely studied, the American Type Culture Collection (ATCC) was used to determine the optimal pH for storage. If two far ranges of pH were determined, the average was taken.
Optimal temperature
The optimal temperature at which this species grows. If the species was not widely studied, the ATCC was used to determine the optimal temperature for storage. If two far ranges of temperatures were determined, the average was taken.
COGEM pathogenicity rating
COGEM released a comprehensive database of pathogenicity assessment of around 2575 bacterial species in 201110. The database ranks the pathogenicity of species on a scale of 1 to 4 - 1 being not belonging to a recognized group of disease-invoking agents in humans or animals and having an extended history of safe usage and 4 being a species that can cause a very serious human disease, for which no prophylaxis is known.
Antimicrobial susceptibility
Are there any known antibiotics that this species is sensitive to? No = 0, Yes = 1
Spore-formation
Is the species spore-forming? No = 0, Yes = 1
Biofilm-formation
Is the species biofilm-forming? No = 0, Yes = 1
Extremophile
Extremophiles are organisms that live in extreme environments, as opposed to organisms that live in moderate (mesophilic) environments. This category includes acidophiles, thermophiles, osmophiles, halophiles, oligotrophs, and others. Mesophiles = 0, Extremophile = 1
Gram-stain
Negative = 0, Positive = 1, Indeterminate = 2
Found in human microbiome
Microbes that live anywhere in the human body and are not pathogenic to humans (i.e. capable of causing human disease) No=0, Yes=1
Plant pathogen
Does the species causes disease in plants? No = 0, Yes = 1
Animal pathogen
Does the species causes disease in animals? No = 0, Yes =1

From GitHub: Details for available fields, across domains of life

Virus

Genetic material: Virus have either RNA or DNA as their genetic material
Strand: The nucleic acid may be single (ss) or double stranded (ds).
Capsid symmetry: The way in which the capsid units are arranged.
Helical
Icosahedral
Complex
Envelop: The outer layer of a virus that protects the nucleic acid. Virus without envelop are called naked.
Is it a pathogen? If yes, which is its host.
Human
Animal
Plant
Bacteria
Fungi

Bacteria and Archaea Only

Gram stain: Used to distinguish and classify bacterial species into two large groups: Gram-positive and Gram-negative.
Antimicrobial resistance (AMR): Antimicrobial resistance occurs naturally over time, usually through genetic changes. However, the misuse and overuse of antimicrobials is accelerating this process.
Type of metabolisms: the nutrition mode of microbes according to the sources of energy and carbon needed for living, growth and reproduction. All sorts of combinations may exist in nature.
Primary source of energy:
Phototrophs: Light is absorbed in photo receptors and transformed into chemical energy
Chemotrophs: Bond energy is released from a chemical compound.
Primary sources of reducing equivalents:
Organotrophs: Organic compounds are used as electron donor.
Lithotrophs: Inorganic compounds are used as electron donor.
Primary sources of carbon
Heterotrophs: Organic compounds are metabolized to get carbon for growth and development.
Autotrophs: Carbon dioxide (CO2) is used as source of carbon.

Bacteria, Archaea and Eukarya

Biofilm forming: Biofilms are multicellular communities held together by a self-produced extracellular matrix. Biofilms impact humans in many ways as they can form in natural, medical, and industrial settings.
Spore forming: Also referred to as endospores, are the dormant form of vegetative microbes and are highly resistant to physical and chemical influences.
Microbiome: Host or environment where microbes are usually found.
Host: Microbes might be commensal or pathogenic to their host. Commensal microbes are found to be crucial to the survival of their hosts.
Sponges
Corals
Fungi
Plant
Animal
Human: Body sites of Human Microbiome Project
Soil: Microbes are essential for soils. They are main drivers of nutrient cycles in soils, decompose organic matter, promote plant growth and control pests and diseases.
Tundra
Grassland
Croplands
Forest
Tropical
Temperate
Boreal
Extreme: Microbes that live in habitats considered hard to survive in due to its extreme conditions such as temperature, accessibility to different energy sources or under high pressure.
Desert
Polar
Deep ocean
Space
Water: Water can support the growth of many types of microorganisms. Microbes are main drivers of biogeochemical processes and nutrient cycling.
Ocean
Fresh
Mangrove
Sediments
Is it a pathogen? if Yes, which is its host:
Fungi
Plant
Animal
Human: Body sites of Human Microbiome Project
Extremophile: a microbe that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. Microbes that can only live under optimal conditions are called Mesophiles.
If extremophile, which type.
Acidophile: Microbes that live in acidic systems with pH -0.06 to 4.0.
Alkaliphile: Microbes capable of survival in alkaline environments with pH 8.5–11
Halophile: Microbes that thrive in high salt concentrations.
Metallotolerant: Microbes that survive in environments with a high concentration of dissolved heavy metals in solution
Barophile: Also called piezophile, are microbes which thrive at high pressures such as deep seas.
Psychrophile: Also called cryophiles, are microbes capable of growth in low temperatures, ranging from −20°C to 10°C.
Radioresistant: Microbes capable of withstand high levels of ionizing radiation.
Thermophile: Microbes that live at high temperatures between 41°C and 122°C.
Xerophile: Microbes that grow and reproduce in conditions with a low availability of water.
Hypolith: Organisms that live underneath rocks in cold deserts.
Oligotroph: Microbes capable of growth in nutritionally limited environments.

Repair build errors due to ModuleNotFoundError of numpy

The graph currently fails to build on Jenkins (see error below).

08:23:38  Collecting pandas (from kg-microbe==1.0.0)
08:23:38    Using cached https://files.pythonhosted.org/packages/99/f0/f99700ef327e51d291efdf4a6de29e685c4d198cbf8531541fc84d169e0e/pandas-1.3.5.tar.gz
08:23:39      Complete output from command python setup.py egg_info:
08:23:39      Traceback (most recent call last):
08:23:39        File "<string>", line 1, in <module>
08:23:39        File "/tmp/pip-build-2_cqjoc1/pandas/setup.py", line 18, in <module>
08:23:39          import numpy
08:23:39      ModuleNotFoundError: No module named 'numpy'
08:23:39      
08:23:39      ----------------------------------------
08:23:39  Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2_cqjoc1/pandas/

A full refresh of the Jenkins build may require a new Docker image, as currently used for KG-IDG or KG-COVID-19.

sources of microbial culturing media

  1. ATCC (American Type Culture Collection)
    ATCC Culture Guides
    ATCC offers a wide array of resources, including guidelines and recommendations for microbial culture media and their preparations.
  2. DSMZ (German Collection of Microorganisms and Cell Cultures)
    DSMZ Media List
    This provides a comprehensive list of media for cultivating various microorganisms, with detailed recipes and instructions.
  3. Sigma-Aldrich
    Sigma-Aldrich Culture Media
    The site includes recipes for various culture media, and you can purchase the necessary chemicals and reagents directly.
  4. MicrobeOnline
    MicrobeOnline Media Index
    MicrobeOnline provides detailed information on various bacteria and their appropriate culture media.
  5. Becton, Dickinson and Company (BD) - Difco & BBL Manual
    Difco & BBL Manual
    This manual includes guidelines and formulations for the preparation of culture media.
  6. Thermo Fisher Scientific - Oxoid
    Oxoid Culture Media
    Oxoid provides a catalog of various media and supplements for microbial culture.
  7. Scharlau
    Scharlau Microbiology Catalog
    Scharlau's catalog includes various dehydrated culture media and their uses.
  8. PubMed
    PubMed
    PubMed offers numerous peer-reviewed articles and publications, many of which include detailed protocols and recipes for microbial culture media.
  9. ScienceDirect
    ScienceDirect
    ScienceDirect is a repository of scientific publications, including articles related to microbiology and culture media recipes.
  10. LPSN - List of Prokaryotic names with Standing in Nomenclature
    LPSN
    LPSN is a compendious resource on prokaryotic nomenclature and can have relevant information or links regarding microbial culture media.
  11. Sigma Aldrich https://www.sigmaaldrich.com/US/en/products/industrial-microbiology/microbial-culture-media

ingest microbe taxon to taxon edges from mondo

include infectious disease branch of mondo, plus associated taxa

you may want to use robot here

note; we may want to remap. I would do this with a SPARQL CONSTRUCT

mondo has

  • disease has-agent taxon (typically virus/bacteria/fungus)
  • disease transmitted-by taxon (typically eukaryote, e.g insect, mammal)

we want to transform these into organism-organism edges, and annotate the edges with the disease

e.g.

  • <<sars-cov-2 interacts-with homo-sapiens>> causes: mondo:COVID-19

in mondo you can assume the organism is human unless it has an in-taxon edge

we may actually want to push the generation of this kgx file into the mondo pipeline cc @matentzn

we may want to include these edges in kg-covid-19 cc @justaddcoffee @realmarcin

Build fails due to chebi.owl not found

Build currently fails at the transform step as follows:

13:20:08  + python3.8 run.py transform
13:20:11  org.semanticweb.owlapi.io.OWLOntologyInputSourceException: java.io.FileNotFoundException: data/raw/chebi.owl (No such file or directory)
13:20:11  Use the -vvv option to show the stack trace.
13:20:11  Use the --help option to see usage information.
13:20:14  Traceback (most recent call last):
13:20:14    File "run.py", line 121, in <module>
13:20:14      cli()
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
13:20:14      return self.main(*args, **kwargs)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
13:20:14      rv = self.invoke(ctx)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
13:20:14      return _process_result(sub_ctx.command.invoke(sub_ctx))
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
13:20:14      return ctx.invoke(self.callback, **ctx.params)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
13:20:14      return __callback(*args, **kwargs)
13:20:14    File "run.py", line 69, in transform
13:20:14      kg_transform(*args, **kwargs)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
13:20:14      t.run()
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform_utils/traits/traits.py", line 131, in run
13:20:14      create_termlist(self.input_base_dir, "chebi")
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/utils/nlp_utils.py", line 109, in create_termlist
13:20:14      transform(
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 560, in transform
13:20:14      transform_source(
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 869, in transform_source
13:20:14      transformer.transform(input_args, output_args)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 236, in transform
13:20:14      self.process(source_generator, intermediate_sink)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 332, in process
13:20:14      for rec in source:
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/source/obograph_source.py", line 62, in parse
13:20:14      yield from chain(n, e)
13:20:14    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/kgx/source/obograph_source.py", line 84, in read_nodes
13:20:14      FH = open(filename, "rb")
13:20:14  FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/chebi.json'

chebi.json is missing because ROBOT doesn't complete its conversion, and it doesn't do that because it can't find chebi.owl.
chebi.owl.gz is successfully downloaded, but the issue here is likely with the TraitsTransform - the CHEBI transform knows it has to work with a compressed version but the TraitsTransform expects to have chebi.json ready to use.

Make schemas of sources more explicit

Context: Semantic ETL manifesto

We should make this repo an exemplar of a KG ETL repo. We should formally describe upstream sources in a lightweight way and make it very transparent how things map.

We have done most of the work already here:

https://github.com/Knowledge-Graph-Hub/kg-microbe/tree/master/schemas

This describes the core trait table that is ingested. We need to switch this to use linkml first (#21), but we may add more.

We can generate markdown for this using linkml, but we need to think how this integrates with https://knowledge-graph-hub.github.io/kg-microbe/index.html

I like how dipper includes a sources section: https://dipper.readthedocs.io/en/latest/sources.html

Would it make sense to

  1. generate markdown from all upstream schemas (just one so far)
  2. Deposit markdown here https://github.com/Knowledge-Graph-Hub/kg-microbe/tree/master/sphinx (can markdown be mixed with rst?)
  3. Publish

Alternatively we could use the generated python from the schema

Ingest distribution of pathobionts across 12 skin sites

The data used for figure 3 is publicly available and the source is listed in the figure legend. It was taken from Table S3 and S5 from Oh et al. 2014.

Source data is found here:
Table S3. Relative Abundance of Propionibacterium acnes Strains, Related to Supplemental Experimental Procedures
https://www.cell.com/cms/10.1016/j.cell.2016.04.008/attachment/755072a6-35e2-4b3e-8f9e-eb0b9ec09274/mmc4.xlsx

Table S5. Relative Abundance of Staphylococcus epidermidis Strains and Presence/Absence of Gene Clusters, Related to Supplemental Experimental Procedures
https://www.cell.com/cms/10.1016/j.cell.2016.04.008/attachment/a80c1d13-73b7-4bd3-90bc-c2e42b3ec6b3/mmc6.xlsx

Source paper link:
https://www.cell.com/fulltext/S0092-8674(16)30399-3#supplementaryMaterial

Secondary referenced article with figure:
Living in Your Skin: Microbes, Molecules, and Mechanisms
https://journals.asm.org/doi/10.1128/IAI.00695-20?utm_source=informz&utm_medium=email&utm_campaign=sign-up-journals&utm_term=20220127&utm_content=nn-hmb-mktg&utm_source=Informz&utm_medium=Email&utm_campaign=Campaign&utm_content=Message_Name&_zs=6bprl&_zl=CBs62

NER for chemicals aka carbon substrates in trait table

This is the unique of chemicals listed under the carbon substrate column:
https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/schemas/distinct_carbon_substrates.txt

  • Run OGER NER on carbon substrates and characterize output based on exact string match vs X etc.
  • Use (four different types) synonyms from CHEBI to expand matches.
  • Identify poorly matching subset and see if there are additional matches (and append to mapping table).

The goal is to create a SSOM mapping file to encode the results of the above chemical matching, as here for pathways:
https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/pathways.sssom.tsv

Ingest gutMEGA

The gutMEGA resource provides a few useful files for KG-Microbe.
http://gutmega.omicsbio.info/download.php

The first set of ingests cover different microbial taxonomy resources, including NCBITaxonomy which is already present in KG-Microbe. Looking at these text files, there may be some alignment necessary between these different taxonomies, ideally as an NER task. There will be some disagreements in taxonomy structure, and we even pick clique leaders. Note that the three non-NCBI taxonomies are all specific to microbes (as opposed to NCBI, that is why we had to trim).

NCBI taxonomy | Reformatted NCBI taxonomy information, including diifferent ranks of NCBI taxa |
Greengenes taxonomy | Reformatted Greengenes taxonomy information, including diifferent ranks of Greengenes taxa |  
RDP taxonomy | Reformatted RDP taxonomy information, including diifferent ranks of RDP taxa |  
SILVA taxonomy | Reformatted SILVA taxonomy information, including diifferent ranks of SILVA taxa |  

Quantitative data table ingest, this will be valuable and to start can be an NER task (but need to identify reference ontology set). These will be taxa -> condition -> relative abundance, where the 'condition' is a free text short description like a sample title.
gutMEGA data table | All quantification events provided in gutMEGA |  

This dataset provides the literature provenance for the quantitative data in the data table (above):
Literature information summary | Related information about the curated literature

map pathways column in big trait table

Split from #2.

1 trithionate_oxidation
1 carbonmonoxide_oxidation
1 tetrathionate_oxidation, iron_reduction
1 pyrrhotite_oxidation
1 galena_oxidation
1 thiocyanate_oxidation
1 carbonylsulfide_oxidation
...
182 sulfur_reduction
186 thiosulfate_reduction
353 aerobic_chemo_heterotrophy
366 fermentation
371 denitrification
400 nitrite_reduction
983 NA
1420 nitrate_reduction

These should all map to GO

ingest Web of Microbes environment-metabolite change-microbe associations

General info here:
http://www.webofmicrobes.org/about

Download SQLite via 'download' button.

This is the primary relevant data:
"The Web", links the actions of microorganisms on metabolites from within a single environment."

And these are the two basic assertions:

There are two types of assertions that are made on the WoM:

Assertions of 'present in environment': The metabolite to be annotated as detected in >2/3 replicates. These are indicated by tan table cells or filled in circles on the web.
Assertions of 'increase' or 'decrease' by an organism: Metabolites that were significantly different from the control environment versus the transformed environment are asserted as increased (red cells on tables and red lines on The Web) or decreased (blue cells on tables and blue lines on The Web) with darker shading indicating a greater fold change.

Map cell_shape in big trait table

Split from #2

1 spirochete
1 ring
1 triangular
1 cell_shape
2 tailed
3 branced
3 spindle
3 star
4 irregular
5 flask
7 square
11 disc
12 fusiform
60 pleomorphic
289 vibrio
328 filament
400 spiral
402 coccobacillus
1563 coccus
3794 NA
4035 bacillus

I think the majority should map to PATO. We may want to consider making OBA terms for some

Ingest Disbiome microbiome data

Disbiome database is a collection of microbiome compositional effects related to human disease

https://disbiome.ugent.be/home

Here is an example record for a disbiome entry:
https://disbiome.ugent.be/experimentdetail/11342

The pertitent pieces of info for KG ingestion are:

  • "Disease" (MESH?)
  • "Organism" (taxonomic string)
  • "Qualititative outcome" (reduced, elevated etc)
  • "Sample type" (material really) eg faeces
  • Three level anatomical "Location" (MESH?)
  • possibly 'Reference classification database' (eg SILVA)
  • possibly cohort details? (eg age, BMI, M/F ratio)

Ingest Weissman et al human microbiome taxa microbial trait data

The data is from this paper:
Exploring the functional composition of the human microbiome using a hand-curated microbial trait database
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2

The dataset itself is Additional File 1:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04216-2#MOESM1

To start we could perform NER with the same dictionaries as Madin et al - so NCBI Taxonomy, ENVO, ECOCORE, ChEBI.

There are additional numerical columns of interest here beyond what Madin et al provided:

  • Temperature ranges
  • pH ranges
  • NaCl ranges (need to get unit from paper)
  • Growth on an expanded set of subtrates
  • oxygen.Preference is separate column
  • Shape is encoded in two ways: as a string and as 0/1 for select shapes
  • A few other traits are 0/1 and performing NER on field names would give us the entity in many cases.

ingest gene knockout data from LBL microbial fitness experiments

All of the data is here (84G total):
http://genomics.lbl.gov/supplemental/bigfit/

The numerical relative growth data would have to be converted - growth vs no growth, via eg thresholding.

Just taking the first organism as an example:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/

On the organism page, under 'Genes' the 'Specific phenotypes' link gives a table of most significant phenotype per gene for this KO dataset:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/specific_phenotypes
and this file can serve as the primary data source.
These columns:

sysName desc name lrn t Group Condition_1 Concentration_1 Units_1

provide the following data:

gene name
description
internal name
log ratio normalized
t-statistic
condition group
condition name
concentration
unit

For reference under 'Genes' the 'Gene fitness' link gives a full table of relative fitness values:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_logratios_good.tab
The y-axis labels are 'locusId' which are gene ids and the x-axis labels are condition (sample) ids including a text description.

There is additional data on each condition on the organism page under 'Tables' then 'Experiments' then 'Detailed metadata for experiments':
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/expsUsed

A basic ingest of this data would model as mutant alleles or a gene-condition relation indicating that this gene X is essential for growth in condition Y. As key supporting data the gene annotations should also be ingested:
http://genomics.lbl.gov/supplemental/bigfit/html/acidovorax_3H11/fit_genes.tab
with the caveat that these are 'free text' annotations so may require standardization.

Further ingests could include:

Jenkins build error: call to multi-indexer not working

Jenkins build encounters this error when setting up index pages:

13:34:04  + multi_indexer -v --directory kg-microbe --prefix https://kg-hub.berkeleybop.io/kg-microbe/ -x -u
13:34:04  /var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo@tmp/durable-51f8b30b/script.sh: 1: multi_indexer: not found

The Docker image in use may not have multi_indexer installed.
Have the Jenkins run install it.

Build process fails due to missing argument for transform function

Build currently fails when calling transforms:

11:51:07  + python3.8 run.py transform
11:51:09  Traceback (most recent call last):
11:51:09    File "run.py", line 121, in <module>
11:51:09      cli()
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
11:51:09      return self.main(*args, **kwargs)
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
11:51:09      rv = self.invoke(ctx)
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
11:51:09      return _process_result(sub_ctx.command.invoke(sub_ctx))
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
11:51:09      return ctx.invoke(self.callback, **ctx.params)
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
11:51:09      return __callback(*args, **kwargs)
11:51:09    File "run.py", line 69, in transform
11:51:09      kg_transform(*args, **kwargs)
11:51:09    File "/var/lib/jenkins/workspace/edge-graph-hub_kg-microbe_master/gitrepo/kg_microbe/transform.py", line 43, in transform
11:51:09      t.run()
11:51:09  TypeError: run() missing 1 required positional argument: 'data_file'

If the function is reaching t.run() without including any sources, then some sources exist but aren't being recognized by name.

Could be related to specifying

def transform(input_dir: str, output_dir: str, sources: Optional[List[str]] = None) -> None:

instead of

def transform(input_dir: str, output_dir: str, sources: List[str] = None) -> None:

to avoid the no-implicit-optional thing.

trait, qualities, characteristics

@deepakunni3 @hrshdhgd @cmungall @ @realmarcin

If we are going to interact with other OBO ontologies (e.g., ENVO, PATO, COB), I think it would be helpful if we were could provide some clarity as to how the use of trait differs from other OBO terms such as quality, and characteristic.

Getting clear on this may help us integrate better with OBO ontologies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.