Coder Social home page Coder Social logo

karrlab / obj_tables Goto Github PK

View Code? Open in Web Editor NEW
8.0 10.0 2.0 36.2 MB

Tools for creating and reusing high-quality spreadsheets

Home Page: https://objtables.org

License: MIT License

Python 83.63% CSS 1.82% JavaScript 0.39% HTML 13.88% Dockerfile 0.29%
complex-datasets relational-data data-tables schema object-mapping excel csv tsv python

obj_tables's Introduction

PyPI package Documentation Test results Test coverage Code analysis License Analytics

ObjTables: Tools for creating and reusing high-quality spreadsheets

ObjTables is a toolkit which makes it easy to use spreadsheets (e.g., XLSX workbooks) to work with complex datasets by combining spreadsheets with rigorous schemas and an object-relational mapping system (ORM; similar to Active Record (Ruby), Django (Python), Doctrine (PHP), Hibernate (Java), Propel (PHP), SQLAlchemy (Python), etc.). This combination enables users to use programs such as Microsoft Excel, LibreOffice Calc, and OpenOffice Calc to view and edit spreadsheets and use schemas and the ObjTables software to validate the syntax and semantics of datasets, compare and merge datasets, and parse datasets into object-oriented data structures for further querying and analysis with languages such as Python.

ObjTables makes it easy to:

  • Use collections of tables (e.g., an XLSX workbook) to represent complex data consisting of multiple related objects of multiple types (e.g., rows of worksheets), each with multiple attributes (e.g., columns).
  • Use complex data types (e.g., numbers, strings, numerical arrays, symbolic mathematical expressions, chemical structures, biological sequences, etc.) within tables.
  • Use progams such as Excel and LibreOffice as a graphical interface for viewing and editing complex datasets.
  • Use embedded tables and grammars to encode relational information into columns and groups of columns of tables.
  • Define clear schemas for tabular datasets.
  • Use schemas to rigorously validate tabular datasets.
  • Use schemas to parse tabular datasets into data structures for further analysis in languages such as Python.
  • Compare, merge, split, revision, and migrate tabular datasets.

The ObjTables toolkit includes five components:

  • Format for schemas for tabular datasets
  • Numerous data types
  • Format for tabular datasets
  • Software tools for parsing, validating, and manipulating tabular datasets
  • Python package for more flexibility and analysis

Please see https://objtables.org for more information.

Installing the command-line program and Python API

Please see the documentation.

Examples, tutorials, and documentation

Please see the user documentation, developer documentation, and tutorials.

License

ObjTables is released under the MIT license.

Development team

ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the National Research Institute for Agriculture, Food and Environment in Jouy en Josas, FR.

Questions and comments

Please contact the developers with any questions or comments.

obj_tables's People

Contributors

0u812 avatar artgoldberg avatar bilalshaikh42 avatar johnsekar avatar jonrkarr avatar saapooch avatar yinhoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obj_tables's Issues

Support schema migration

Stored Models, such as spreadsheets and delimited files on disk, can become incompatible with updates to model definitions in obj_model. Create a utility that migrates stored models to be compatible with a modified model. E.g., Django has a migration utility, which was originally called fabric.

Multiple tables in single file

Table header: \n!!SBtab ... \n

Example:

!!SBtab   TableID='def_table' SBtabVersion='1.0' TableType='Definition'   TableName='Allowed_types'

latent bug in ModelMeta.validate_attributes

consider this code:

is_attr = False
for base in bases:
    if attr_name in dir(base):
        is_attr = True

this code is risky, because dir(base) may contain a string that matches attr_name
but is not an attribute inherited from base. E.g., the name of a method could match.
a better approach would be to directly check whether attr_name is an attribute of a base that is an obj_model.core.Models. namespace doesn't have the same problem.

Future enhancements to migration

Listed in decreasing order of my subjective assessment of importance

  1. avoid need for obj_model.core.ModelMeta.CHECK_SAME_RELATED_ATTRIBUTE_NAME by comparing related models by value, not name
  2. use Model.revision to label git commit of wc_lang and automatically migrate models to current schema and report inconsistency between a schema and model file
  3. move generate_wc_lang_migrator() to wc_lang
  4. provide a well-documented example
  5. YAML config examples with multiple existing_files and multiple migrated_files
  6. use deepcopy on obj_model.ontology.OntologyAttribute attributes when deepcopy of pronto terms works
  7. associate schema pairs with renaming maps
  8. separately specified default value for attribute
  9. improve performance of test_migration
  10. obtain sort order of sheets in existing model file and replicate in migrated model file
  11. confirm migration works for json, etc.
  12. test sym links in Migrator.parse_module_path
  13. use PARSED_EXPR everywhere applicable

Better error message for set_value()

Hi Jonathan

In def set_value(self, obj, new_value), at ../obj_model/obj_model/core.py:3884: ValueError
an example error looks like:
ValueError: Attribute '<wc_lang.core.RateLaw object at 0x7fbb31678a58>' of '<wc_lang.core.RateLawEquation object at 0x7fbb31678b00>' must be None

I think it would be better if the first value was the string value of the attribute and the 2nd was something like the classname, id and name of the Model.
Thanks

Use Excel validation

  • SlugAttribute --> custom
  • IntAttribute --> Whole number (min, max)
  • FloatAttribute --> Decmial (min, max)
  • OneToOneAttribute --> List
  • ManyToOneAttribute --> List
  • DateAttribute --> Date
  • TimeAttribute --> Time
  • EnumAttribute --> List

Add method to get nested objects

Features

  • traverse object graph in only one direction
  • filter related objects by attributes

Example:

  • Get DOIs of all nested references of a gene in a knowledge base
    • Get all nested objects
    • Filter for objects of type Identifier
    • Filter for namespace = DOI

Query functions (both class & instance) for attributes.

Class methods

  • cls.get_related_attributes() - returns names of all RelatedAttributes of the class
  • cls.get_scalar_attributes() - returns all LiteralAttributes of the class
  • cls.get_attributes() - returns names of all Attributes
  • cls.get_related_name(attribute) - for a given RelatedAttribute of this class, get the related name

Instance methods

  • self.get_empty_scalar_attributes() - returns LiteralAttributes that are set to None
  • self.get_nonempty_scalar_attributes() - opposite of above
  • self.get_empty_related_attributes() - returns RelatedAttributes that are set to None or []
  • self.get_nonempty_related_attributes() - opposite of above

Better error checking on attribute_order

Verify that attribute_order is a tuple (or at least not a string) so that a model definition like this

class A(obj_model.Model):
    id = SlugAttribute()

    class Meta(obj_model.Model.Meta):
        attribute_order = ('id')

does not return an incomprehensible error like this:

                    raise ValueError("`attribute_order` must contain attribute names; '{}' not found in "
>                                    "attributes of {}".format(attr_name, name))
E                   ValueError: `attribute_order` must contain attribute names; 'i' not found in attributes of A

test __prep_expr_for_tokenization

Add a test of ParsedExpression.__prep_expr_for_tokenization() to obj_model/tests/test_expression.py. It's covered because it's always called, but the substitutions it makes need to be tested.

Correct errors in Excel data validation

  • Incorrect links to column oriented worksheets.
    • For example, in worksheet 19 of h1_hesc KB /xl/worksheets/sheet19.xml 'Cell'!$B$0:$XFD$0 should be 'Cell'!$B$1:$XFD$1
      <dataValidation type="list" errorStyle="warning" allowBlank="1" showInputMessage="1" showErrorMessage="1" errorTitle="Cell" error="Value must be a value from &quot;Cell:0&quot; or blank." promptTitle="Cell" prompt="Select a value from &quot;Cell:0&quot; or blank." sqref="B1:B2">
        <formula1>'Cell'!$B$0:$XFD$0</formula1>
      </dataValidation>
      
  • Incorrect links to parent models which are represented by multiple worksheets. For example,
    • Reference to "Polymer species types" from "Polymer" column of "Genes" worksheet. This should not reference a worksheet because PolymerSpeciesType is a parent class which is represented by multiple worksheets.

Docstring issue

In the return values of obj_model.io.get_fields(), what's the difference between attrs
& sub_attrs? Docs say:

    :obj:`list` of :obj:`Attribute`: attributes in the order they should be printed
    :obj:`list` of tuple of :obj:`Attribute`: attributes in the order they should be printed

Unspecific error

A modeler writing a wc_lang spreadsheet might have trouble fixing this error:

   ValueError: The model cannot be loaded because '2_species_1_reaction.xlsx' contains error(s):
  Taxon
    The attributes must be defined in this order:
      Id
      Name
      Rank
      Comments
      References
  Submodel
    The attributes must be defined in this order:
      Id
      Name
      Algorithm
      Compartment
      Biomass reaction
      Objective function
      Comments
      References
  Compartment
    The attributes must be defined in this order:
      Id
      Name
      Initial volume
      Comments
      References
  SpeciesType
    The attributes must be defined in this order:
      Id
      Name
      Structure
      Empirical formula
      Molecular weight
      Charge
      Type
      Comments
      References
  Observable
    The attributes must be defined in this order:
      Id
      Name
      Species
      Observables
      Comments
  Function
    The attributes must be defined in this order:
      Id
      Name
      Expression
      Comments
  StopCondition
    The attributes must be defined in this order:
      Id
      Name
      Expression
      Comments
  Reference
    The attributes must be defined in this order:
      Id
      Name
      Title
      Author
      Editor
      Year
      Type
      Publication
      Publisher
      Series
      Volume
      Number
      Issue
      Edition
      Chapter
      Pages
      Comments

check 'return self.default_cleaned_value()'

the line return self.default_cleaned_value() in

        if isinstance(self.default_cleaned_value, (
                six.types.FunctionType, six.types.MethodType, six.types.LambdaType)):
            return self.default_cleaned_value()

looks wrong. default_cleaned_value isn't defined as a function anywhere.

don't have time to investigate now.

Migration enhancements to do later

add migrate commands to H1 & Mp
remove branch from Git metadata
expose the optional locations for migrated files
obtain schema commit metadata from pip installed package
detect erroneous schema changes file annotations
ensure that the Git version of a data file is a sentinel commit
enforce this invariant: each sentinel commit must be identified by one schema changes file

Add feature to define groups of Excel columns

  • Add attribute to obj_model.Model.Meta to define column groupings and the heading for the group
  • Export additional row with merged cells that are headings for multiple individual columns
  • Update Excel import

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.