allenneuraldynamics / aind-data-schema-models Goto Github PK
View Code? Open in Web Editor NEWData models used in aind-data-schema
License: MIT License
Data models used in aind-data-schema
License: MIT License
As a software engineer, I want to add coverage for scripts folder, so I can test code in scripts package.
Add any helpful notes here.
Describe the bug
Fujinon
in LENS_MANUFACTURERS
(organizations.py)To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
Add any other context about the problem here.
Add to organizations and to funding list:
Since I don't have permission to do this, someone else needs to
Is your feature request related to a problem? Please describe.
I'd like to import a few data models into packages other than aind-data-schema. It'd be nice if they were in this repo instead of aind-data-schema.
Describe the solution you'd like
Import this stuff here:
class RegexParts(str, Enum):
"""regular expression components to be re-used elsewhere"""
DATE = r"\d{4}-\d{2}-\d{2}"
TIME = r"\d{2}-\d{2}-\d{2}"
class DataRegex(str, Enum):
"""regular expression patterns for different kinds of data and their properties"""
DATA = f"^(?P<label>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
RAW = (
f"^(?P<platform_abbreviation>.+?)_(?P<subject_id>.+?)_(?P<c_date>{RegexParts.DATE.value})_(?P<c_time>"
f"{RegexParts.TIME.value})$"
)
DERIVED = (
f"^(?P<input>.+?_{RegexParts.DATE.value}_{RegexParts.TIME.value})_(?P<process_name>.+?)_(?P<c_date>"
f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})"
)
ANALYZED = (
f"^(?P<project_abbreviation>.+?)_(?P<analysis_name>.+?)_(?P<c_date>"
f"{RegexParts.DATE.value})_(?P<c_time>{RegexParts.TIME.value})$"
)
NO_UNDERSCORES = "^[^_]+$"
NO_SPECIAL_CHARS = '^[^<>:;"/|? \\_]+$'
NO_SPECIAL_CHARS_EXCEPT_SPACE = '^[^<>:;"/|?\\_]+$'
class DataLevel(str, Enum):
"""Data level name"""
DERIVED = "derived"
RAW = "raw"
SIMULATED = "simulated"
class Group(str, Enum):
"""Data collection group name"""
BEHAVIOR = "behavior"
EPHYS = "ephys"
MSMA = "MSMA"
OPHYS = "ophys"
def datetime_to_name_string(dt):
"""Take a date and time object, format it a as string"""
return dt.strftime("%Y-%m-%d_%H-%M-%S")
def datetime_from_name_string(d, t):
"""Take date and time strings, generate date and time objects"""
d = datetime.strptime(d, "%Y-%m-%d").date()
t = datetime.strptime(t, "%H-%M-%S").time()
return datetime.combine(d, t)
def build_data_name(label, creation_datetime):
"""Construct a valid data description name"""
dt_str = datetime_to_name_string(creation_datetime)
return f"{label}_{dt_str}"
Describe alternatives you've considered
Importing from aind-data-schema, but that requires importing all the other dependencies.
Additional context
Add any other context or screenshots about the feature request here.
the registry_id for our species is currently only the number (e.g. 10090 for mouse). The correct ID is NCBI:txid10090
name: Edinburgh Mouse Atlas Project
abbreviation: EMAPA
As a scientists, we may want to use a new piece of hardware. Right now, if that hardware is from a new manufacturer that is not listed in aind_data_schema_models.Organizations
then we would need to use Other
, or submit a PR to this repo, and then bump all of the schemas we are using. With complex schemas, this could require significant updates. Am I missing something? Is there a better way to handle this issue?
Add any helpful notes here.
Is your feature request related to a problem? Please describe.
We need to control the vocabulary used to describe targeted structures - in multiple places.
Describe the solution you'd like
I don't know how to do this elegantly - I'd like to use existing ontology. But I'm not sure how to do this without making a GIANT enum.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
name = National Institute of Mental Health
abbreviation = NIMH
ror = 04xeg9z08
A recent change made the previous tag ophys
incompatible with the new pophys
modality. In order to support previously serialized schemas, it would be nice to have some custom logic to coerce the previous definition to the current one.
This coercion mechanism should ideally make things compatible at the level of the python API (e.g. calling Modality.OPHYS
) and at the level of deserialization. In both cases, the output should coerce to the new Modality.POPHYS object.
Add some provenance information so we can track where CSV files came from / were generated from
This is happening on pydantic 2.6.4. Let's update the minimum version to `>=2.7.
Repro:
from aind_data_schema_models.registries import Registry
pydantic.errors.PydanticUserError: Field 'registry' defined on a base class was overridden by a non-annotated attribute. All field definitions, including overrides, require a type annotation.
Add Expansion
to the specimen_procedure_types.
name: Integrated DNA Technologies
abbreviation: IDT
rorid: https://ror.org/009jvpf03
Describe the bug
Use the model name to alphabetize models in both the model and enum lists.
To Reproduce
example: IMEC. In the models it's ordered by "Interuniversity..." and in the enum it's order be "IMEC"
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
Add any other context about the problem here.
@lambdaloop will provide list
Add the CCF brain regions.
data model is:
name <-- name
abbreviation <-- acronym
color <-- "#" + color_hex_triplet
hemisphere <-- 1 = "left", 2="right", 3="both"
id <-- id
parent_id <-- parent_structure_id
order <-- graph_order
The way that classes are created during runtime does not afford auto-completion.
There are a few ways to approach this problem.
One would be to overload a few dunder methods to allow IDEs to list object attributes. However this doesn't fix the problem since, before the object is created, the attributes do not exist.
Another way to approach this problem is to take inspiration from automatically generated C wrappers and simply create all classes via templating.
Once QualityControl is merged into aind-data-schema:
remove
MANUAL_ANNOTATION = "Manual annotation"
QUALITY_CONTROL_AND_ASSESSMENT = "Quality control and assessment"
from ProcessName Model
Currently the models are large python files that are verbose, hard to edit, and hard to replace if someone else wants to use a different set of objects.
Capsule where I experimented with this:
https://codeocean.allenneuraldynamics.org/capsule/0992554/tree
Is your feature request related to a problem? Please describe.
As a user, I'd like to pull the data models from a database
Describe the solution you'd like
modalities
, harp_types
, etc.{"_id": UUID, "model": "modalities", "contents":[list of models] }
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
DataRegex.NO_SPECIAL_CHARS
and DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE
are not working for /
chars./
are allowing it without raising validation errors.Invalid regular expression: /^[^<>:;"/|?\_]+$/u: Invalid escape
To Reproduce
Steps to reproduce the behavior:
DataRegex.NO_SPECIAL_CHARS
or DataRegex.NO_SPECIAL_CHARS_EXCEPT_SPACE
for the pattern of a field./
in that field./
is being allowed.Alternatively, validate any data_description json using the Metadata entry app, and observe the error:
Expected behavior
The DataRegex class should have enums with the correct regex to match all special characters.
Additional context
Add any other context about the problem here.
(CAMBER: Center for Advanced Motor Bioengineering and Research)
ignore this
Is your feature request related to a problem? Please describe.
We want to have the MGI allele IDs in the subject metadata
Describe the solution you'd like
there is a rpt file here: https://www.informatics.jax.org/downloads/reports/MGI_PhenotypicAllele.rpt
convert this to a csv, add it to the repo, and construct the allele class
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
name is "Midwest Optical Systems, Inc" abbreviation is "MidOpt"
no ror id
Add to organizations and to the FILTER_MANUFACTURER list
Is your feature request related to a problem? Please describe.
Putting enums and lists of objects in code is brittle and makes it difficult for others to contribute.
Describe the solution you'd like
Represent data in CSV or parquet, separate from the schema.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.