Coder Social home page Coder Social logo

microbiomedata / mixs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from genomicsstandardsconsortium/mixs

0.0 0.0 0.0 85.43 MB

Minimum Information about any (X) Sequence” (MIxS) specification

Home Page: https://microbiomedata.github.io/mixs/

License: Creative Commons Zero v1.0 Universal

Python 99.89% Makefile 0.03% Jinja 0.08%

mixs's People

Contributors

dependabot[bot] avatar folker avatar genomicstandardsconsortium avatar lschriml avatar only1chunts avatar ramonawalls avatar renzok avatar sujaypatil96 avatar turbomam avatar

Watchers

 avatar

mixs's Issues

nitrogen terms for bioscales

Bioscales has a column for nitrogen, expressed as a percent

Here are all the MIxS terms containing 'nit' in their term names aka Structured comment names

Environmental package Structured comment name
microbial mat/biofilm carb_nitro_ratio
sediment carb_nitro_ratio
water carb_nitro_ratio
water diss_inorg_nitro
microbial mat/biofilm diss_org_nitro
miscellaneous natural or artificial environment diss_org_nitro
sediment diss_org_nitro
water diss_org_nitro
hydrocarbon resources-cores nitrate
hydrocarbon resources-fluids/swabs nitrate
microbial mat/biofilm nitrate
miscellaneous natural or artificial environment nitrate
sediment nitrate
wastewater/sludge nitrate
water nitrate
hydrocarbon resources-cores nitrite
hydrocarbon resources-fluids/swabs nitrite
microbial mat/biofilm nitrite
miscellaneous natural or artificial environment nitrite
sediment nitrite
water nitrite
microbial mat/biofilm nitro
miscellaneous natural or artificial environment nitro
sediment nitro
water nitro
microbial mat/biofilm org_nitro
miscellaneous natural or artificial environment org_nitro
sediment org_nitro
water org_nitro
water part_org_nitro
water tot_diss_nitro
water tot_inorg_nitro
agriculture tot_nitro
hydrocarbon resources-cores tot_nitro
hydrocarbon resources-fluids/swabs tot_nitro
wastewater/sludge tot_nitro
water tot_nitro
food-farm environment tot_nitro_cont_meth
soil tot_nitro_cont_meth
food-farm environment tot_nitro_content
microbial mat/biofilm tot_nitro_content
sediment tot_nitro_content
soil tot_nitro_content

formalize (axiomatize?) tokens in term names

Could start by making them annotations

Analytes, etc.

  • Ammonium
  • Carbon
  • Nitrate
  • Nitrite
  • Nitrogen
  • Phosphate
  • Phosphorus
  • Temperature

Modifiers

  • air
  • biomass
  • dissolved
  • inorganic
  • method
  • microbial
  • organic
  • particulate
  • reactive
  • soluble
  • total

Additional patterns from @mslarae13 ?

Ignore list ?

  • antimicrobial
  • dioxide
  • hydrocarbon
  • ratio

What term attributes should be invariant

For example, in the ...packages... sheet, where terms can be combined with different environments?

Proposal:

MIxS LinkML seq assumed invariant notes
Environmental package class 1    
Structured comment name slot's name 2 TRUE  
Package item slot's title 3 TRUE  
Definition description 4 TRUE  
Expected value annotation 5    
Value syntax structured_pattern 6   needs rules
Example examples 7    
Section is_a parent 8   only used for one term
Requirement cardinality 9   some error out; some not fully implemented
Preferred unit annotation 10    
Occurrence multivalued 11   use a vmap for m = True and 1 = False
MIXS ID slot_uri 12 TRUE  
github ticket annotation 13    

definition hygiene

  • improve definitions for consistency and readability
  • scrub garbage characters

carbon terms for Bioscales (plus errors)

Bioscales has a carbon column for their soil samples, expressed in percent

Here are the existing MIxS terms containing 'car'

Note agriculture's 'tot_car' vs microbial mat/biofilm's & sediment's 'tot_carb'

Environmental package Structured comment name
microbial mat/biofilm bacteria_carb_prod
sediment bacteria_carb_prod
water bacteria_carb_prod
air carb_dioxide
built environment carb_dioxide
air carb_monoxide
microbial mat/biofilm carb_nitro_ratio
sediment carb_nitro_ratio
water carb_nitro_ratio
hydrocarbon resources-cores diss_carb_dioxide
hydrocarbon resources-fluids/swabs diss_carb_dioxide
microbial mat/biofilm diss_carb_dioxide
miscellaneous natural or artificial environment diss_carb_dioxide
sediment diss_carb_dioxide
water diss_carb_dioxide
hydrocarbon resources-cores diss_inorg_carb
hydrocarbon resources-fluids/swabs diss_inorg_carb
microbial mat/biofilm diss_inorg_carb
miscellaneous natural or artificial environment diss_inorg_carb
sediment diss_inorg_carb
water diss_inorg_carb
hydrocarbon resources-cores diss_org_carb
hydrocarbon resources-fluids/swabs diss_org_carb
microbial mat/biofilm diss_org_carb
sediment diss_org_carb
water diss_org_carb
built environment height_carper_fiber
microbial mat/biofilm org_carb
miscellaneous natural or artificial environment org_carb
sediment org_carb
water org_carb
microbial mat/biofilm part_org_carb
sediment part_org_carb
water part_org_carb
agriculture root_med_carbon
food-farm environment root_med_carbon
plant-associated root_med_carbon
agriculture tot_car
microbial mat/biofilm tot_carb
sediment tot_carb
agriculture tot_org_carb
food-farm environment tot_org_carb
microbial mat/biofilm tot_org_carb
sediment tot_org_carb
soil tot_org_carb
water tot_part_carb

Refactor documentation round 2

Comments from @cmungall:

  • full name not just short name
  • don’t bother including things like Usage
  • leave off mappings for now - there are no mappings in MIxS (yet)
  • put descriptions of classes (combos and packages and checklists) on the front page and just include a bit more text to let people know what is going on
  • for classes, let’s generate a title that is not camelcase and use that (this is an upstream change though)
    maybe show combos as a grid?
  • hyperlink URLs
  • Maybe “Inheritance” is not so useful - we know everything has max one ancestor here?
  • it’s not so useful to end up here - maybe exclude abstract classes?

clear up string_serialization and structured_pattern

Early on, I converted MIxS Value syntaxes to LinkML string_serializations. @cmungall also parses the Value syntaxes to detect potential enumerations. There are some hybrid Value syntaxes that get converted into enums with mangled permissible values that are especially problematic when serializing to RDF.

string_serializations have two possible applications:

  1. combining the contents of two separate fields into one, based on the pattern. I believe @sujaypatil96 has implemented that in linkml-convert.
  2. parsing the contents of a field. I don't believe that has been implemented anywhere, although it would be handy for making the conversion of DataHarmonizer output into schema-compliant JSON more declarative.

structured_patterns can be used to assemble complex regular expressions from reusable components, like the {}-wrapped tokens from MIxS Value syntaxes. The complex regular expressions can then be used to validate input into DataHarmonizer.

I think I should switch the instantiation of MIxS Value syntaxes from string_serializations to structured_patterns

phosph*? terms for Bioscales

Bioscales has a P column, with values 111.4 mg/kg (ppm)

@mslarae13 has raised some concerns about errors or ambiguity in these slots:

note term MIXS:0000689 tot_phos from agriculture

  • all other packages have a tot_phosp term
  • the Package Item 'total phosphorous' and the Definition `'Total amount or concentration of phosphate' are incompatible
Environmental package Structured comment name Package item Definition MIXS ID
hydrocarbon resources-cores diss_inorg_phosp dissolved inorganic phosphorus Concentration of dissolved inorganic phosphorus in the sample MIXS:0000106
hydrocarbon resources-fluids/swabs diss_inorg_phosp dissolved inorganic phosphorus Concentration of dissolved inorganic phosphorus in the sample MIXS:0000106
water diss_inorg_phosp dissolved inorganic phosphorus Concentration of dissolved inorganic phosphorus in the sample MIXS:0000106
microbial mat/biofilm phosphate phosphate Concentration of phosphate MIXS:0000505
miscellaneous natural or artificial environment phosphate phosphate Concentration of phosphate MIXS:0000505
sediment phosphate phosphate Concentration of phosphate MIXS:0000505
wastewater/sludge phosphate phosphate Concentration of phosphate MIXS:0000505
water phosphate phosphate Concentration of phosphate MIXS:0000505
microbial mat/biofilm phosplipid_fatt_acid phospholipid fatty acid Concentration of phospholipid fatty acids; can include multiple values MIXS:0000181
miscellaneous natural or artificial environment phosplipid_fatt_acid phospholipid fatty acid Concentration of phospholipid fatty acids; can include multiple values MIXS:0000181
sediment phosplipid_fatt_acid phospholipid fatty acid Concentration of phospholipid fatty acids; can include multiple values MIXS:0000181
water phosplipid_fatt_acid phospholipid fatty acid Concentration of phospholipid fatty acids; can include multiple values MIXS:0000181
water soluble_react_phosp soluble reactive phosphorus Concentration of soluble reactive phosphorus MIXS:0000738
agriculture tot_phos total phosphorous Total amount or concentration of phosphate MIXS:0000689
hydrocarbon resources-cores tot_phosp total phosphorus Total phosphorus concentration in the sample, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus MIXS:0000117
hydrocarbon resources-fluids/swabs tot_phosp total phosphorus Total phosphorus concentration in the sample, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus MIXS:0000117
water tot_phosp total phosphorus Total phosphorus concentration in the sample, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus MIXS:0000117
wastewater/sludge tot_phosphate total phosphate Total amount or concentration of phosphate MIXS:0000689

improve search experience

nobody should ever want to look at the spreadsheets

searches like pH, class and lat (from lat_lon) should bring most relevant terms to the top. we're probably more interested in term titel and description.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.