The issue
tl;dr The Allen Institute ontology classifies hippocampus as part of cortex, not subcortex, which could cause problems for matching some microarray samples to ROIs.
When users provide a file or dataframe to the atlas_info
parameter in abagen.get_expression_data()
they are required to specify a broad structural class for each region in their atlas
(in a column labelled 'structure' in the file/dataframe). The current options for this structural class include:
- cortex,
- subcortex,
- cerebellum,
- brainstem,
- white matter, and
- other (i.e., ventricles and such)
We match these designations with the information from the Allen ontology such that samples that don't fall directly within a region in the atlas
aren't incorrectly assigned to regions in atlas
across hemispheric / structural boundaries.
That is, if one of the samples from the Allen Institute is labelled as having come from the left hemisphere subcortex we make sure to only assign it to a region in the user-specified atlas
labelled as belonging to the left hemisphere subcortex. This impacts only a minority of samples (i.e., we don't currently check whether this is the case for those samples having coordinates directly within a region in the atlas), but a significant minority, nonetheless.
While matching these designations seems like a reasonable approach in most cases, the one point of contention that a general user might have is that the Allen Institute ontology classifies the hippocampal formation (including the subiculum, dentate gyrus, and CA1-4) as part of "cortex" rather than "subcortex". Specifically, their ontology specifies:
brain
└─ gray matter
└─ telencephalon
└─ cerebral cortex
└─ limbic lobe
└─ hippocampal formation
Thus, if a researcher provides an atlas where they label all their hippocampal ROIs as "subcortex" they're liable to get vastly different results than if they label all their hippocampal ROIs as "cortex."
While I have it on good authority that the hippocampus is often considered part of "allocortex," I'm hesitant to add this as a permissible structural class to abagen
since it seems quite a bit more specific than the current (rather broad) structural designations listed above (1-6).
Proposed solution
I genuinely don't know! It would be great to allow either specification for the hippocampus (i.e., "cortex" or "subcortex"), but the current framework for getting these structural classes from the Allen ontology doesn't allow for this hedging. I can think about how to modify it for this one instance in particular, but in the interim it would be great to come up with alternatives.
One option that might be worthwhile is to simply allow users to specify either (or both) of the expected 'hemisphere' and 'structure' information in atlas_info
and just use whatever is available. Then, users who have hippocampal ROIs can refrain from specifying the 'structure' for their ROIs and we'll do our best to ensure samples simply don't cross hemispheric boundaries. This isn't necessarily ideal because there's the possibility that samples will get incorrectly assigned across e.g., cortical/subcortical boundaries for regions that aren't the hippocampus (but we might still consider this option outside of the current problem!).
Alternatively (and perhaps most immediately appealing), we can add a warning on the documentation about this designation and inform users to specify that their hippocampal ROIs are part of "cortex" (not "subcortex") when they provide atlas_info
.