Coder Social home page Coder Social logo

aind-data-schema-models's People

Contributors

arielleleon avatar dbirman avatar dyf avatar github-actions[bot] avatar helen-m-lin avatar jtyoung84 avatar lambdaloop avatar saskiad avatar sun-flow avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

aind-data-schema-models's Issues

Add tests for scripts folder

User story

As a software engineer, I want to add coverage for scripts folder, so I can test code in scripts package.

Acceptance criteria

  • Configure pyproject.toml coverage parameters
  • Add test scripts for code in scripts package

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Missing fields

Describe the bug

  1. Missing Fujinon in LENS_MANUFACTURERS (organizations.py)

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add funding organizations

Add to organizations and to funding list:

  • Chan Zuckerberg Initiative (CZI). ror = 02qenvm24
  • National Center for Complementary & Integrative Health (NCCIH). ror = 00190t495
  • Michael J. Fox Foundation for Parkinson's Research (MJFF). ror = 03arq3225
  • Templeton World Charity Foundation (TWCF). ror = 00x0z1472
  • MBF Bioscience. Ror=02zynam48

Import enums from aind_data_schema.data_description into this package

Is your feature request related to a problem? Please describe.
I'd like to import a few data models into packages other than aind-data-schema. It'd be nice if they were in this repo instead of aind-data-schema.

Describe the solution you'd like
Import this stuff here:

class RegexParts(str, Enum):
    """regular expression components to be re-used elsewhere"""

    DATE = r"\d{4}-\d{2}-\d{2}"
    TIME = r"\d{2}-\d{2}-\d{2}"


class DataRegex(str, Enum):
    """regular expression patterns for different kinds of data and their properties"""

    DATA = f"^(?P<label>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
    RAW = (
        f"^(?P<platform_abbreviation>.+?)_(?P<subject_id>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>"
        f"{RegexParts.TIME.value})$"
    )
    DERIVED = (
        f"^(?P<input>.+?_{RegexParts.DATE.value}_{RegexParts.TIME.value})_(?P<process_name>.+?)_(?P<c_date>"
        f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})"
    )
    ANALYZED = (
        f"^(?P<project_abbreviation>.+?)_(?P<analysis_name>.+?)_(?P<c_date>"
        f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
    )
    NO_UNDERSCORES = "^[^_]+$"
    NO_SPECIAL_CHARS = '^[^<>:;"/|? \\_]+$'
    NO_SPECIAL_CHARS_EXCEPT_SPACE = '^[^<>:;"/|?\\_]+$'


class DataLevel(str, Enum):
    """Data level name"""

    DERIVED = "derived"
    RAW = "raw"
    SIMULATED = "simulated"


class Group(str, Enum):
    """Data collection group name"""

    BEHAVIOR = "behavior"
    EPHYS = "ephys"
    MSMA = "MSMA"
    OPHYS = "ophys"


def datetime_to_name_string(dt):
    """Take a date and time object, format it a as string"""
    return dt.strftime("%Y-%m-%d_%H-%M-%S")


def datetime_from_name_string(d, t):
    """Take date and time strings, generate date and time objects"""
    d = datetime.strptime(d, "%Y-%m-%d").date()
    t = datetime.strptime(t, "%H-%M-%S").time()
    return datetime.combine(d, t)


def build_data_name(label, creation_datetime):
    """Construct a valid data description name"""
    dt_str = datetime_to_name_string(creation_datetime)
    return f"{label}_{dt_str}"

Describe alternatives you've considered
Importing from aind-data-schema, but that requires importing all the other dependencies.

Additional context
Add any other context or screenshots about the feature request here.

NCBI taxonomy ids

the registry_id for our species is currently only the number (e.g. 10090 for mouse). The correct ID is NCBI:txid10090

  • Fix the registry_ids for the species in our models.

Schemas are brittle with respect to manufacturers

User story

As a scientists, we may want to use a new piece of hardware. Right now, if that hardware is from a new manufacturer that is not listed in aind_data_schema_models.Organizations then we would need to use Other, or submit a PR to this repo, and then bump all of the schemas we are using. With complex schemas, this could require significant updates. Am I missing something? Is there a better way to handle this issue?

Acceptance criteria

  • This is something that can be verified to show that this user story is satisfied.

Sprint Ready Checklist

  • 1. Acceptance criteria defined
  • 2. Team understands acceptance criteria
  • 3. Team has defined solution / steps to satisfy acceptance criteria
  • 4. Acceptance criteria is verifiable / testable
  • 5. External / 3rd Party dependencies identified
  • 6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Control vocabulary for targeted structures

Is your feature request related to a problem? Please describe.
We need to control the vocabulary used to describe targeted structures - in multiple places.

Describe the solution you'd like
I don't know how to do this elegantly - I'd like to use existing ontology. But I'm not sure how to do this without making a GIANT enum.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Consider supporting backwards compatiblity between `ophys` and `pophys`

A recent change made the previous tag ophys incompatible with the new pophys modality. In order to support previously serialized schemas, it would be nice to have some custom logic to coerce the previous definition to the current one.

This coercion mechanism should ideally make things compatible at the level of the python API (e.g. calling Modality.OPHYS) and at the level of deserialization. In both cases, the output should coerce to the new Modality.POPHYS object.

error related to pydantic 2.6.4

This is happening on pydantic 2.6.4. Let's update the minimum version to `>=2.7.

Repro:

  1. install aind-data-schema-models with pydantic 2.6
  2. from aind_data_schema_models.registries import Registry
pydantic.errors.PydanticUserError: Field 'registry' defined on a base class was overridden by a non-annotated attribute. All field definitions, including overrides, require a type annotation.

Alphabetical order is inconsistent

Describe the bug
Use the model name to alphabetize models in both the model and enum lists.

To Reproduce
example: IMEC. In the models it's ordered by "Interuniversity..." and in the enum it's order be "IMEC"

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Runtime created classes do not afford user-friendly interface

The way that classes are created during runtime does not afford auto-completion.
There are a few ways to approach this problem.

One would be to overload a few dunder methods to allow IDEs to list object attributes. However this doesn't fix the problem since, before the object is created, the attributes do not exist.

Another way to approach this problem is to take inspiration from automatically generated C wrappers and simply create all classes via templating.

clean up process names

Once QualityControl is merged into aind-data-schema:

remove
MANUAL_ANNOTATION = "Manual annotation"
QUALITY_CONTROL_AND_ASSESSMENT = "Quality control and assessment"
from ProcessName Model

Store models as CSV

Currently the models are large python files that are verbose, hard to edit, and hard to replace if someone else wants to use a different set of objects.

  1. Convert the data into CSV
  2. Add methods to dynamically generate the existing classes from those CSV files on import.
  3. Don't break the existing API

Capsule where I experimented with this:
https://codeocean.allenneuraldynamics.org/capsule/0992554/tree

Publish schemas to s3

Is your feature request related to a problem? Please describe.
As a user, I'd like to pull the data models from a database

Describe the solution you'd like

  • A github action will convert the csv files into json and push the json into docdb.
  • On document record per csv file name like modalities, harp_types, etc.
  • It will check if the record exists and overwrite the existing record
{"_id": UUID, "model": "modalities", "contents":[list of models] }

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

DataRegex for special chars are not working for `/`

Describe the bug

  • DataRegex.NO_SPECIAL_CHARS and DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE are not working for / chars.
  • The result is that fields that should not allow / are allowing it without raising validation errors.
  • This is causing validation errors in the Metadata data entry app with: Invalid regular expression: /^[^<>:;"/|?\_]+$/u: Invalid escape

To Reproduce
Steps to reproduce the behavior:

  1. Create a pydantic model that uses DataRegex.NO_SPECIAL_CHARS or DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE for the pattern of a field.
  2. Try adding a string that has / in that field.
  3. Observe that validation errors are not showing up, meaning that / is being allowed.

Alternatively, validate any data_description json using the Metadata entry app, and observe the error:
image

Expected behavior
The DataRegex class should have enums with the correct regex to match all special characters.

Additional context
Add any other context about the problem here.

Add the MGI alleles as a csv

Is your feature request related to a problem? Please describe.
We want to have the MGI allele IDs in the subject metadata

Describe the solution you'd like
there is a rpt file here: https://www.informatics.jax.org/downloads/reports/MGI_PhenotypicAllele.rpt
convert this to a csv, add it to the repo, and construct the allele class

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Convert data (not schemas) into normal table format

Is your feature request related to a problem? Please describe.
Putting enums and lists of objects in code is brittle and makes it difficult for others to contribute.

Describe the solution you'd like
Represent data in CSV or parquet, separate from the schema.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.