allenneuraldynamics / aind-data-schema-models Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 338 KB

Data models used in aind-data-schema

License: MIT License

Makefile 1.52% Batchfile 1.83% Python 96.65%

aind-data-schema-models's People

Contributors

Watchers

Forkers

lambdaloop bruno-f-cruz

aind-data-schema-models's Issues

User story

As a software engineer, I want to add coverage for scripts folder, so I can test code in scripts package.

Acceptance criteria

Configure pyproject.toml coverage parameters
Add test scripts for code in scripts package

Sprint Ready Checklist

1. Acceptance criteria defined
2. Team understands acceptance criteria
3. Team has defined solution / steps to satisfy acceptance criteria
4. Acceptance criteria is verifiable / testable
5. External / 3rd Party dependencies identified
6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Missing fields

Describe the bug

Missing Fujinon in LENS_MANUFACTURERS (organizations.py)

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

add a body part selection for mouse anatomy

Add funding organizations

Add to organizations and to funding list:

Chan Zuckerberg Initiative (CZI). ror = 02qenvm24
National Center for Complementary & Integrative Health (NCCIH). ror = 00190t495
Michael J. Fox Foundation for Parkinson's Research (MJFF). ror = 03arq3225
Templeton World Charity Foundation (TWCF). ror = 00x0z1472
MBF Bioscience. Ror=02zynam48

add Dan to this repo

Since I don't have permission to do this, someone else needs to

Import enums from aind_data_schema.data_description into this package

Is your feature request related to a problem? Please describe.
I'd like to import a few data models into packages other than aind-data-schema. It'd be nice if they were in this repo instead of aind-data-schema.

Describe the solution you'd like
Import this stuff here:

class RegexParts(str, Enum):
    """regular expression components to be re-used elsewhere"""

    DATE = r"\d{4}-\d{2}-\d{2}"
    TIME = r"\d{2}-\d{2}-\d{2}"


class DataRegex(str, Enum):
    """regular expression patterns for different kinds of data and their properties"""

    DATA = f"^(?P<label>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
    RAW = (
        f"^(?P<platform_abbreviation>.+?)_(?P<subject_id>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>"
        f"{RegexParts.TIME.value})$"
    )
    DERIVED = (
        f"^(?P<input>.+?_{RegexParts.DATE.value}_{RegexParts.TIME.value})_(?P<process_name>.+?)_(?P<c_date>"
        f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})"
    )
    ANALYZED = (
        f"^(?P<project_abbreviation>.+?)_(?P<analysis_name>.+?)_(?P<c_date>"
        f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
    )
    NO_UNDERSCORES = "^[^_]+$"
    NO_SPECIAL_CHARS = '^[^<>:;"/|? \\_]+$'
    NO_SPECIAL_CHARS_EXCEPT_SPACE = '^[^<>:;"/|?\\_]+$'


class DataLevel(str, Enum):
    """Data level name"""

    DERIVED = "derived"
    RAW = "raw"
    SIMULATED = "simulated"


class Group(str, Enum):
    """Data collection group name"""

    BEHAVIOR = "behavior"
    EPHYS = "ephys"
    MSMA = "MSMA"
    OPHYS = "ophys"


def datetime_to_name_string(dt):
    """Take a date and time object, format it a as string"""
    return dt.strftime("%Y-%m-%d_%H-%M-%S")


def datetime_from_name_string(d, t):
    """Take date and time strings, generate date and time objects"""
    d = datetime.strptime(d, "%Y-%m-%d").date()
    t = datetime.strptime(t, "%H-%M-%S").time()
    return datetime.combine(d, t)


def build_data_name(label, creation_datetime):
    """Construct a valid data description name"""
    dt_str = datetime_to_name_string(creation_datetime)
    return f"{label}_{dt_str}"

Describe alternatives you've considered
Importing from aind-data-schema, but that requires importing all the other dependencies.

Additional context
Add any other context or screenshots about the feature request here.

NCBI taxonomy ids

the registry_id for our species is currently only the number (e.g. 10090 for mouse). The correct ID is NCBI:txid10090

Fix the registry_ids for the species in our models.

Add EMAPA to list of registries

name: Edinburgh Mouse Atlas Project
abbreviation: EMAPA

Schemas are brittle with respect to manufacturers

User story

As a scientists, we may want to use a new piece of hardware. Right now, if that hardware is from a new manufacturer that is not listed in aind_data_schema_models.Organizations then we would need to use Other, or submit a PR to this repo, and then bump all of the schemas we are using. With complex schemas, this could require significant updates. Am I missing something? Is there a better way to handle this issue?

Acceptance criteria

This is something that can be verified to show that this user story is satisfied.

Sprint Ready Checklist

1. Acceptance criteria defined
2. Team understands acceptance criteria
3. Team has defined solution / steps to satisfy acceptance criteria
4. Acceptance criteria is verifiable / testable
5. External / 3rd Party dependencies identified
6. Ticket is prioritized and sized

Notes

Add any helpful notes here.

Control vocabulary for targeted structures

Is your feature request related to a problem? Please describe.
We need to control the vocabulary used to describe targeted structures - in multiple places.

Describe the solution you'd like
I don't know how to do this elegantly - I'd like to use existing ontology. But I'm not sure how to do this without making a GIANT enum.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Add NIMH as a Funder

name = National Institute of Mental Health
abbreviation = NIMH
ror = 04xeg9z08

Allow models to be built from CSV files with un-used columns

Consider supporting backwards compatiblity between `ophys` and `pophys`

A recent change made the previous tag ophys incompatible with the new pophys modality. In order to support previously serialized schemas, it would be nice to have some custom logic to coerce the previous definition to the current one.

This coercion mechanism should ideally make things compatible at the level of the python API (e.g. calling Modality.OPHYS) and at the level of deserialization. In both cases, the output should coerce to the new Modality.POPHYS object.

Provenance for CSV file data

Add some provenance information so we can track where CSV files came from / were generated from

Add voltage units

error related to pydantic 2.6.4

This is happening on pydantic 2.6.4. Let's update the minimum version to `>=2.7.

Repro:

install aind-data-schema-models with pydantic 2.6
from aind_data_schema_models.registries import Registry

pydantic.errors.PydanticUserError: Field 'registry' defined on a base class was overridden by a non-annotated attribute. All field definitions, including overrides, require a type annotation.

Add Expanision procedure type

Add Expansion to the specimen_procedure_types.

Add IDT to organizations

name: Integrated DNA Technologies
abbreviation: IDT
rorid: https://ror.org/009jvpf03

Alphabetical order is inconsistent

Describe the bug
Use the model name to alphabetize models in both the model and enum lists.

To Reproduce
example: IMEC. In the models it's ordered by "Interuniversity..." and in the enum it's order be "IMEC"

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add model for muscles

@lambdaloop will provide list

Add brain regions

Add the CCF brain regions.

https://api.brain-map.org/api/v2/data/Structure/query.json?criteria=[graph_id$eq1][hemisphere_id$eq3]&num_rows=all

data model is:

name <-- name
abbreviation <-- acronym
color <-- "#" + color_hex_triplet
hemisphere <-- 1 = "left", 2="right", 3="both"
id <-- id
parent_id <-- parent_structure_id
order <-- graph_order

Runtime created classes do not afford user-friendly interface

The way that classes are created during runtime does not afford auto-completion.
There are a few ways to approach this problem.

One would be to overload a few dunder methods to allow IDEs to list object attributes. However this doesn't fix the problem since, before the object is created, the attributes do not exist.

Another way to approach this problem is to take inspiration from automatically generated C wrappers and simply create all classes via templating.

clean up process names

Once QualityControl is merged into aind-data-schema:

remove
MANUAL_ANNOTATION = "Manual annotation"
QUALITY_CONTROL_AND_ASSESSMENT = "Quality control and assessment"
from ProcessName Model

Add Janelia and Addgene as Organization

Addgene is already a registry, but can also be an vendor, so needs to be an Organization as well.
rorid =01nn1pw54

Janelia Research Campus
Rorid = 013sk6x84

Store models as CSV

Currently the models are large python files that are verbose, hard to edit, and hard to replace if someone else wants to use a different set of objects.

Convert the data into CSV
Add methods to dynamically generate the existing classes from those CSV files on import.
Don't break the existing API

Capsule where I experimented with this:
https://codeocean.allenneuraldynamics.org/capsule/0992554/tree

Publish schemas to s3

Is your feature request related to a problem? Please describe.
As a user, I'd like to pull the data models from a database

Describe the solution you'd like

A github action will convert the csv files into json and push the json into docdb.
On document record per csv file name like modalities, harp_types, etc.
It will check if the record exists and overwrite the existing record

{"_id": UUID, "model": "modalities", "contents":[list of models] }

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

DataRegex for special chars are not working for `/`

Describe the bug

DataRegex.NO_SPECIAL_CHARS and DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE are not working for / chars.
The result is that fields that should not allow / are allowing it without raising validation errors.
This is causing validation errors in the Metadata data entry app with: Invalid regular expression: /^[^<>:;"/|?\_]+$/u: Invalid escape

To Reproduce
Steps to reproduce the behavior:

Create a pydantic model that uses DataRegex.NO_SPECIAL_CHARS or DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE for the pattern of a field.
Try adding a string that has / in that field.
Observe that validation errors are not showing up, meaning that / is being allowed.

Alternatively, validate any data_description json using the Metadata entry app, and observe the error:

Expected behavior
The DataRegex class should have enums with the correct regex to match all special characters.

Additional context
Add any other context about the problem here.

Add Emory to Organizations

(CAMBER: Center for Advanced Motor Bioengineering and Research)

ignore this

Add the MGI alleles as a csv

Is your feature request related to a problem? Please describe.
We want to have the MGI allele IDs in the subject metadata

Describe the solution you'd like
there is a rpt file here: https://www.informatics.jax.org/downloads/reports/MGI_PhenotypicAllele.rpt
convert this to a csv, add it to the repo, and construct the allele class

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

allenneuraldynamics / aind-data-schema-models Goto Github PK

aind-data-schema-models's People

Contributors

Watchers

Forkers

aind-data-schema-models's Issues

User story

Acceptance criteria

Sprint Ready Checklist

Notes

User story

Acceptance criteria

Sprint Ready Checklist

Notes

Recommend Projects

Recommend Topics

Recommend Org