hdmf-dev / hdmf Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 26.0 9.08 MB

The Hierarchical Data Modeling Framework

Home Page: http://hdmf.readthedocs.io

License: Other

Python 100.00%

hdf hdf5 python

hdmf's People

Contributors

Stargazers

Watchers

hdmf's Issues

get_class, supplied name

Bug

It is possible to define a class and lock it to a specific name:

from pynwb.spec import NamespaceBuilder, NWBGroupSpec

name = 'single_dataset'
doc = 'example extension for single dataset'

ns_builder = NamespaceBuilder(doc=doc, name=name, version='0.1',
                              author=['author'],
                              contact='email')

ns_builder.include_type('LabMetaData', namespace='core')

meta_data = NWBGroupSpec(neurodata_type_inc='LabMetaData', neurodata_type_def='SimulationMetaData', doc='doc',
                         name='simulation_metadata')
meta_data.add_attribute(name='help', doc='doc', dtype='text', default_value='help text here')
dataset = meta_data.add_dataset(name='pin', doc='doc', dtype='float', shape=(None, 12), dims=('runs', 'params'))
dataset.add_attribute(name='help', doc='doc', dtype='text', default_value='help text here')


ns_path = name + '.namespace.yaml'
ext_source = name + '.extensions.yaml'

# Export
for neurodata_type in [meta_data]:
    ns_builder.add_spec(ext_source, neurodata_type)
ns_builder.export(ns_path, outdir='spec')

however the class generated by get_class still requires a name argument:

from pynwb import NWBHDF5IO, NWBFile, load_namespaces, get_class
from datetime import datetime
import numpy as np


nwbfile = NWBFile('aa', 'aa', datetime.now().astimezone())

load_namespaces('/Users/bendichter/dev/to_nwb/to_nwb/ndx_single_dataset/spec/single_dataset.namespace.yaml')

sim_meta_data = get_class('SimulationMetaData', 'single_dataset')

meta_data = sim_meta_data(pin=np.random.randn(1000, 12))

Traceback (most recent call last):
  File "/Users/bendichter/dev/hdmf/issue_910_2.py", line 12, in <module>
    meta_data = sim_meta_data(pin=np.random.randn(1000, 12))
  File "/Users/bendichter/dev/hdmf/src/hdmf/utils.py", line 386, in func_call
    raise_from(ExceptionType(msg), None)
  File "<string>", line 3, in raise_from
TypeError: missing argument 'name'

Although this parameter is unused. The spec name is used, not the supplied name:

from pynwb import NWBHDF5IO, NWBFile, load_namespaces, get_class
from datetime import datetime
import numpy as np

nwbfile = NWBFile('aa', 'aa', datetime.now().astimezone())

load_namespaces('/Users/bendichter/dev/to_nwb/to_nwb/ndx_single_dataset/spec/single_dataset.namespace.yaml')

sim_meta_data = get_class('SimulationMetaData', 'single_dataset')

meta_data = sim_meta_data(name='simulation_params', pin=np.random.randn(1000, 12))
nwbfile.add_lab_meta_data(meta_data)

with NWBHDF5IO('test_a.nwb', 'w') as io:
    io.write(nwbfile, cache_spec=True)


with NWBHDF5IO('test_a.nwb', 'r') as io:
    nwb2 = io.read()
    print(nwb2.lab_meta_data['simulation_metadata'].pin[:])

Environment

Please describe your environment according to the following bullet points.

Python Executable: Conda
Python Version: Python 3.6
Operating System: macOS
HDMF Version: latest

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

I searched through the hdmf and pynwb source code and could not see any place where default_name of a spec is actually used besides when it is stored. In fact, I can change the default_name on a data type like Position in nwb.behavior.yaml and there is no effect:

- neurodata_type_def: Position
  ...
  default_name: Positionsdfsdfds

from pynwb import NWBFile
from pynwb.behavior import Position
from datetime import datetime

module = nwbfile.create_processing_module(name='a module name 2', description="a module description 2")
pos = Position()
module.add_data_interface(pos)
print(module)

a module name 2 <class 'pynwb.base.ProcessingModule'>
Fields:
  data_interfaces: { Position <class 'pynwb.behavior.Position'> }
  description: a module description 2

The default name is set as the class name when no name argument is specified.

Environment

Please describe your environment according to the following bullet points.

Python Executable: Conda
Python Version: Python 3.7
Operating System: Windows
HDMF Version: latest

Cannot write external links

With the most recent 1.0.4 version of hdmf, parent-child relationships are enforced such that a child can have only one parent, which cannot be reassigned. This breaks linking of external files.

from pynwb import get_manager

manager = get_manager()

io = NWBHDF5IO(filename, 'r', manager=manager)
nwbfile1 = io.read()
ts = nwbfile1.get_acquisition('ts')

nwbfile2 = NWBFile(session_description='',
                   identifier='id',
                   session_start_time=start_time)
nwbfile2.add_acquisition(ts)

ts already has a parent nwbfile1. Therefore, nwbfile2 cannot add ts as a child.

Options:

Allow Container.parent to be reset. This could lead to strange behavior and bugs down the road (or bugs that return).
Linking is currently handled automagically during NWBHDF5IO.write. Instead of determining on write whether a link exists, ask the user to make all links explicit, e.g. nwbfile3.add_acquisition(NWBLink(ts)). This is more akin to a filesystem -- you can't have the same file in two different folders, but you can make a link/shortcut to the file and move that link around.

Recusrion error when copy H5DataIO

Description

As reported by @bendichter in #141, copy of H5DataIO currently causes a recursion error.

from hdmf.backends.hdf5.h5_utils import H5DataIO
from copy import copy

copy(H5DataIO(data=[1., 2., 3.]))

.....
 File "/Users/oruebel/anaconda3/envs/py4nwb/lib/python3.7/site-packages/hdmf/backends/hdf5/h5_utils.py", line 306, in valid
    if isinstance(self.data, Dataset) and not self.data.id.valid:
  File "/Users/oruebel/anaconda3/envs/py4nwb/lib/python3.7/site-packages/hdmf/data_utils.py", line 483, in data
    return self.__data
  File "/Users/oruebel/anaconda3/envs/py4nwb/lib/python3.7/site-packages/hdmf/data_utils.py", line 492, in __getattr__
    if not self.valid:
  File "/Users/oruebel/anaconda3/envs/py4nwb/lib/python3.7/site-packages/hdmf/backends/hdf5/h5_utils.py", line 306, in valid
    if isinstance(self.data, Dataset) and not self.data.id.valid:
  File "/Users/oruebel/anaconda3/envs/py4nwb/lib/python3.7/site-packages/hdmf/data_utils.py", line 483, in data
    return self.__data
RecursionError: maximum recursion depth exceeded while calling a Python object

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Refactor docval shape validation

In the process of writing unit tests on shape validation in docval, I realized that the interaction of shape validation and default values is not clear cut. For example, if TimeSeries.timestamps accepts ('array_data', 'data', 'TimeSeries') and has a required shape of (None,), then a default value of None is not compatible with the required shape. Should we check the shape only if a non-None value is provided?

The interaction of shape validation and non-standard types, e.g. when TimeSeries.timestamps is another TimeSeries or a DataChunkIterator, also gets complicated and is not currently tested.

Lastly, in PyNWB, these checks are performed primarily on __init__, but the object shapes can change later.

As per discussion with @ajtritt, I think it would make sense to move shape validation from docval to the setters created for the attribute names in __nwbfields__ (this will eventually move over with NWBBaseType to HDMF see #100).

This is an enhancement that would be nice to do in order to test the code better and make maintaining the code easier.

Failure to write object with dynamic class that contains another class

From @bendichter :
You define a type Baz1 that holds Baz2. After get_class and instantiating, in memory, baz2 is in baz1, but when you write, baz2 is missing from the resulting file.

'numeric' dtype in spec acts as None dtype

Related to #140

For most subtypes of TimeSeries, the "data" dataset has the spec 'dtype: numeric'. In fact, that is the only place in the NWB schema where 'numeric' is used.

As far as I can see, the only times that 'numeric' is actually used in HDMF are in the validator and in the following code:

hdmf/src/hdmf/build/map.py

Lines 474 to 477 in 5a9a9cb

    
           if spec.dtype is None: 
        
               return value, None 
        
           if spec.dtype == 'numeric': 
        
               return value, None

where 'numeric' serves the same purpose as not specifying a dtype at all.

When value, None is returned above, this None is used as the dtype for the DatasetBuilder, and this messes with code downstream where 'dtype=None' is used when writing an AbstractDataChunkIterator (#140).

The above code logic seems like it would also cause other problems that have not yet come up. For example, I can write a list of strings for 'data' of an ElectricalSeries:

signal = ElectricalSeries(name='ElectricalSeries',
                          data=['hello'],
                          electrodes=elecs_region,
                          starting_time=0.,
                          rate=1.)

Writing and reading this works just fine, but this should not be allowed. I think the check for that should be somewhere in HDMF, though I have not yet found it. Help in tackling this would be appreciated.

Error when trying to write new data to existing file

I'm getting this error when adding new data to an existing file and trying to write it. I'm not sure if it's relevant, but this happens just the same to file with or without external links.
It happens on both hdmf==1.1.0 and hdmf=1.0.5.post0.dev14.
The error goes away and writing works fine on hdmf==1.0.5.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-90c1f5fb09a5> in <module>
     45 
     46     #print(nwb)
---> 47     io.write(nwb)

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    412                         raise_from(ExceptionType(msg), None)
    413 
--> 414                 return func(self, **parsed['args'])
    415         else:
    416             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\backends\hdf5\h5tools.py in write(self, **kwargs)
    217 
    218         cache_spec = popargs('cache_spec', kwargs)
--> 219         call_docval_func(super(HDF5IO, self).write, kwargs)
    220         if cache_spec:
    221             ref = self.__file.attrs.get(SPEC_LOC_ATTR)

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in call_docval_func(func, kwargs)
    305 def call_docval_func(func, kwargs):
    306     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 307     return func(*fargs, **fkwargs)
    308 
    309 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    412                         raise_from(ExceptionType(msg), None)
    413 
--> 414                 return func(self, **parsed['args'])
    415         else:
    416             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\backends\io.py in write(self, **kwargs)
     39     def write(self, **kwargs):
     40         container = popargs('container', kwargs)
---> 41         f_builder = self.__manager.build(container, source=self.__source)
     42         self.write_builder(f_builder, **kwargs)
     43 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    412                         raise_from(ExceptionType(msg), None)
    413 
--> 414                 return func(self, **parsed['args'])
    415         else:
    416             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in build(self, **kwargs)
    168                 # TODO: if Datasets attributes are allowed to be modified, we need to
    169                 # figure out how to handle that starting here.
--> 170                 result = self.__type_map.build(container, self, builder=result, source=source, spec_ext=spec_ext)
    171         return result
    172 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    412                         raise_from(ExceptionType(msg), None)
    413 
--> 414                 return func(self, **parsed['args'])
    415         else:
    416             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in build(self, **kwargs)
   1658         builder.set_attribute('namespace', namespace)
   1659         builder.set_attribute(attr_map.spec.type_key(), data_type)
-> 1660         builder.set_attribute(attr_map.spec.id_key(), container.object_id)
   1661         return builder
   1662 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    410                     if parse_err:
    411                         msg = ', '.join(parse_err)
--> 412                         raise_from(ExceptionType(msg), None)
    413 
    414                 return func(self, **parsed['args'])

~\AppData\Roaming\Python\Python37\site-packages\six.py in raise_from(value, from_value)

TypeError: incorrect type for 'value' (got 'NoneType', expected 'NoneType')

Error when reading a file that contains DecompositionSeries without 'source_timeseries'

Bug
When reading from a NWB file that contains a DecompositionSeries without 'source_timeseries', an error is raised. It happens with other classes as well.
Error present in both 1.1.0 and 1.1.0.post0.dev2
Here's a snippet:

import os
import numpy as np
from datetime import datetime
from dateutil.tz import tzlocal
import pynwb
from pynwb import NWBFile, NWBHDF5IO, get_manager, ProcessingModule
from pynwb.core import DynamicTable, DynamicTableRegion, VectorData
from pynwb.misc import DecompositionSeries

manager = get_manager()

# Creates file 1
nwb = NWBFile(session_description='session', identifier='1', session_start_time=datetime.now(tzlocal()))

# data: (ndarray) dims: num_times * num_channels * num_bands
Xp = np.zeros((1000,10,3))

# Spectral band power
# bands: (DynamicTable) frequency bands that signal was decomposed into
band_param_0V = VectorData(name='filter_param_0',
                  description='frequencies for bandpass filters',
                  data=np.array([1.,2.,3.]))
band_param_1V = VectorData(name='filter_param_1',
                  description='frequencies for bandpass filters',
                  data=np.array([1.,2.,3.]))
bandsTable = DynamicTable(name='bands',
                          description='Series of filters used for Hilbert transform.',
                          columns=[band_param_0V,band_param_1V],
                          colnames=['filter_param_0','filter_param_1'])
decs = DecompositionSeries(name='DecompositionSeries',
                           data=Xp,
                           description='Analytic amplitude estimated with Hilbert transform.',
                           metric='amplitude',
                           unit='V',
                           bands=bandsTable,
                           #source_timeseries=lfp
                           rate=1.)

ecephys_module = ProcessingModule(name='ecephys',
                                  description='Extracellular electrophysiology data.')
nwb.add_processing_module(ecephys_module)
ecephys_module.add_data_interface(decs)

with NWBHDF5IO('file_1.nwb', mode='w', manager=manager) as io:
    io.write(nwb)

# Open file 1
with NWBHDF5IO('file_1.nwb', 'r', manager=manager) as io:
    nwb = io.read()

Notice that the files saves alright. The error happens on reading:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-3c4f12e3d123> in <module>
     48 # Open file 1
     49 with NWBHDF5IO('file_1.nwb', 'r', manager=manager) as io:
---> 50     nwb = io.read()
     51 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\backends\hdf5\h5tools.py in read(self, **kwargs)
    244                                        % (self.__path, self.__mode))
    245         try:
--> 246             return call_docval_func(super(HDF5IO, self).read, kwargs)
    247         except UnsupportedOperation as e:
    248             if str(e) == 'Cannot build data. There are no values.':

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in call_docval_func(func, kwargs)
    279 def call_docval_func(func, kwargs):
    280     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 281     return func(*fargs, **fkwargs)
    282 
    283 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\backends\io.py in read(self, **kwargs)
     33             # TODO also check that the keys are appropriate. print a better error message
     34             raise UnsupportedOperation('Cannot build data. There are no values.')
---> 35         container = self.__manager.construct(f_builder)
     36         return container
     37 

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
    198         result = self.__containers.get(builder_id)
    199         if result is None:
--> 200             result = self.__type_map.construct(builder, self)
    201             parent_builder = self.__get_parent_dt_builder(builder)
    202             if parent_builder is not None:

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1670             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
   1671         else:
-> 1672             return attr_map.construct(builder, build_manager)
   1673 
   1674     @docval({"name": "container", "type": Container, "doc": "the container to convert to a Builder"},

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1181         cls = manager.get_cls(builder)
   1182         # gather all subspecs
-> 1183         subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1184         # get the constructor argument that each specification corresponds to
   1185         const_args = dict()

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1125                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1126             # now process groups and datasets
-> 1127             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1128             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1129         elif isinstance(spec, DatasetSpec):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1163                 if dt is None:
   1164                     # recurse
-> 1165                     ret.update(self.__get_subspec_values(sub_builder, subspec, manager))
   1166                 else:
   1167                     ret[subspec] = manager.construct(sub_builder)

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1125                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1126             # now process groups and datasets
-> 1127             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1128             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1129         elif isinstance(spec, DatasetSpec):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1155                 sub_builder = builder_dt.get(dt)
   1156                 if sub_builder is not None:
-> 1157                     sub_builder = self.__flatten(sub_builder, subspec, manager)
   1158                     ret[subspec] = sub_builder
   1159             else:

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __flatten(self, sub_builder, subspec, manager)
   1168 
   1169     def __flatten(self, sub_builder, subspec, manager):
-> 1170         tmp = [manager.construct(b) for b in sub_builder]
   1171         if len(tmp) == 1 and not subspec.is_many():
   1172             tmp = tmp[0]

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in <listcomp>(.0)
   1168 
   1169     def __flatten(self, sub_builder, subspec, manager):
-> 1170         tmp = [manager.construct(b) for b in sub_builder]
   1171         if len(tmp) == 1 and not subspec.is_many():
   1172             tmp = tmp[0]

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
    198         result = self.__containers.get(builder_id)
    199         if result is None:
--> 200             result = self.__type_map.construct(builder, self)
    201             parent_builder = self.__get_parent_dt_builder(builder)
    202             if parent_builder is not None:

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1670             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
   1671         else:
-> 1672             return attr_map.construct(builder, build_manager)
   1673 
   1674     @docval({"name": "container", "type": Container, "doc": "the container to convert to a Builder"},

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1181         cls = manager.get_cls(builder)
   1182         # gather all subspecs
-> 1183         subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1184         # get the constructor argument that each specification corresponds to
   1185         const_args = dict()

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1125                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1126             # now process groups and datasets
-> 1127             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1128             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1129         elif isinstance(spec, DatasetSpec):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1155                 sub_builder = builder_dt.get(dt)
   1156                 if sub_builder is not None:
-> 1157                     sub_builder = self.__flatten(sub_builder, subspec, manager)
   1158                     ret[subspec] = sub_builder
   1159             else:

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __flatten(self, sub_builder, subspec, manager)
   1168 
   1169     def __flatten(self, sub_builder, subspec, manager):
-> 1170         tmp = [manager.construct(b) for b in sub_builder]
   1171         if len(tmp) == 1 and not subspec.is_many():
   1172             tmp = tmp[0]

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in <listcomp>(.0)
   1168 
   1169     def __flatten(self, sub_builder, subspec, manager):
-> 1170         tmp = [manager.construct(b) for b in sub_builder]
   1171         if len(tmp) == 1 and not subspec.is_many():
   1172             tmp = tmp[0]

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
    198         result = self.__containers.get(builder_id)
    199         if result is None:
--> 200             result = self.__type_map.construct(builder, self)
    201             parent_builder = self.__get_parent_dt_builder(builder)
    202             if parent_builder is not None:

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1670             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
   1671         else:
-> 1672             return attr_map.construct(builder, build_manager)
   1673 
   1674     @docval({"name": "container", "type": Container, "doc": "the container to convert to a Builder"},

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    386                         raise_from(ExceptionType(msg), None)
    387 
--> 388                 return func(self, **parsed['args'])
    389         else:
    390             def func_call(*args, **kwargs):

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1181         cls = manager.get_cls(builder)
   1182         # gather all subspecs
-> 1183         subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1184         # get the constructor argument that each specification corresponds to
   1185         const_args = dict()

~\Anaconda3\envs\ecog_vis\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1119             for subspec in spec.links:
   1120                 if subspec.name is not None:
-> 1121                     ret[subspec] = manager.construct(links[subspec.name].builder)
   1122                 else:
   1123                     sub_builder = link_dt.get(subspec.target_type)

KeyError: 'source_timeseries'

Python Executable: Conda
Python Version: Python 3.7
Operating System: Windows
HDMF Version: 1.1.0

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Regression bug reading array metadata extensions

using pynwb: 8539a82d139fd36cdf5aaf48bfadc60acd64b8bd (the current dev as of 4/23/19):

I can generate a minimal NWB file with an extension:

from datetime import datetime
from dateutil.tz import tzlocal
import os
from pynwb import NWBFile, register_class, docval, load_namespaces, popargs, NWBHDF5IO
from pynwb.spec import NWBNamespaceBuilder, NWBGroupSpec, NWBAttributeSpec, NWBDatasetSpec
from pynwb.file import LabMetaData
import numpy as np

# Settings:
neurodata_type = 'OphysBehaviorMetaData'
prefix = 'AIBS_ophys_behavior'
outdir = './'
extension_doc = 'AIBS Visual Behavior ophys lab metadata extension'


metadata_ext_group_spec = NWBGroupSpec(
    neurodata_type_def=neurodata_type,
    neurodata_type_inc='LabMetaData',
    doc=extension_doc,
    attributes=[NWBAttributeSpec(name='ophys_experiment_id', dtype='text', doc='HW', shape=(None,))])


# Export spec:
ext_source = '%s_extension.yaml' % prefix
ns_path = '%s_namespace.yaml' % prefix
ns_builder = NWBNamespaceBuilder(extension_doc, prefix)
ns_builder.add_spec(ext_source, metadata_ext_group_spec)
ns_builder.export(ns_path, outdir=outdir)


# Read spec and load namespace:
ns_abs_path = os.path.join(outdir, ns_path)
load_namespaces(ns_abs_path)


@register_class(neurodata_type, prefix)
class OphysBehaviorMetaData(LabMetaData):
    __nwbfields__ = ('ophys_experiment_id',)

    @docval({'name': 'name', 'type': str, 'doc': 'name'},
            {'name': 'ophys_experiment_id', 'type': np.ndarray, 'doc': 'HW'})
    def __init__(self, **kwargs):
        name, ophys_experiment_id = popargs('name', 'ophys_experiment_id', kwargs)
        super(OphysBehaviorMetaData, self).__init__(name=name)
        self.ophys_experiment_id = ophys_experiment_id


ophys_experiment_id = np.array(['A', 'B'])
nwbfile = NWBFile("a file with header data", "NB123A", datetime(2017, 5, 1, 12, 0, 0, tzinfo=tzlocal()))
nwbfile.add_lab_meta_data(OphysBehaviorMetaData(name='metadata', ophys_experiment_id=ophys_experiment_id))
np.testing.assert_equal(nwbfile.lab_meta_data['metadata'].ophys_experiment_id, ophys_experiment_id)

# Works
with NWBHDF5IO('./tmp.nwb', 'w') as write_io:
    write_io.write(nwbfile)

# Works
io = NWBHDF5IO('./tmp.nwb', 'r')
nwbfile_out = io.read()
np.testing.assert_equal(nwbfile_out.lab_meta_data['metadata'].ophys_experiment_id.astype('<U1'), ophys_experiment_id)

However, when running a separate runtime environment, the read code:

from pynwb import load_namespaces, popargs, NWBHDF5IO
import numpy as np

# Doesnt work
load_namespaces('./AIBS_ophys_behavior_namespace.yaml')
ophys_experiment_id = np.array(['A', 'B'])
io = NWBHDF5IO('./tmp.nwb', 'r')
nwbfile_out = io.read()

print(nwbfile_out.get_lab_meta_data('metadata'))
print(nwbfile_out.lab_meta_data['metadata'].ophys_experiment_id)
np.testing.assert_equal(nwbfile_out.lab_meta_data['metadata'].ophys_experiment_id.astype('<U1'), ophys_experiment_id)

causes an exception when run in a separate python runtime (i.e. restarting python, all other config the same):

TypeError: cannot create weak reference to 'tuple' object

This read code worked on revision 46a7130, and then stopped working on the next revision e91823b, and several commits later was merged into dev in PR #30

Environment:
Python Executable: Conda
Python Version: Python 3.6
Operating System: Linux

Migrate nwb-docutils to NWB agnostic location

Migrate nwb-docutils into hdmf-dev organization and add to documentation to HDMF docs.

Originally, the HDMF documentation discussed using nwb-docutils for building schema documentation. This was removed here

Currently, nwb-docutils contains scripts for building schema documentation, but needs to be generalized for other standards.

float64 != float causes Windows tests to fail randomly

See NeurodataWithoutBorders/pynwb#1015

Proposed workaround:
change the docval macro for 'scalar_data' from:

'scalar_data': [str, int, float]

to:

'scalar_data': [str, int, float, np.float64]

until a fix is released in a future version of h5py. The fix has already been pushed to their master branch, but it may be a while before it is released.

Always cache spec

Is there a reason why we would not want to cache the spec when writing a file?

Advantages of always caching spec: fewer complex use cases and interactions to handle and tests to make.

Support HDF5 Virtual Datasets on Write

For very-large data arrays it can be useful to split the data across files to avoid creation of overly large single files.

HDF5 supports the concept of VirtualDatasets, which allows a dataset in a file to be composed of subblocks from Datasets stored in external HDF5 files. Using this approach allows single large arrays to be broken up in to sub-files while exposing to the user the same interface as if the data where stored in a single array. As such, HDMF already supports read of VirtualDatasets, however, it currently does not support creation of VirtualDatasets on write. To support creation of VirtualDatasets we'd need to:

Update H5DataIO to allow specification of the layout of the sub-files, i.e., define how the virtual dataset should be broken up into sub-blocks
Define a naming-convention for creating sub-files (e.g., we could store all sub-files in a subfolder along with the HDF5 file and name them according to their location in the main file plus a running index for the block)
In HDF5IO we would need to add a method to create virtual datasets. The main write functions should be able to remain as is (since we can write into a VirtualDataset) but how we create the dataset would need to change when a VirtualDataset is required

See http://docs.h5py.org/en/stable/vds.html for a short intro to VDS in h5py

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Default mapping of child attributes depends on order the attrs are read

If a Container, e.g. VoltageClampSeries, has multiple datasets, e.g. data, capacitance_fast, with the same named attribute, e.g. unit, then the first spec named unit that gets encountered by the ObjectMapper gets mapped to the unit attribute on the VoltageClampSeries class. Specs with the name unit encountered later get mapped to the attribute name {dataset_name}_unit, e.g. data_unit.

Normally, this is fine: e.g. for TimeSeries, there is only one dataset/group with a unit attribute -- the data dataset, so the unit attribute spec of TimeSeries.data will be mapped to the unit attribute on TimeSeries. But when there are multiple datasets, which unit spec gets encountered first and is used to set the root level attribute on the class is arbitrary. On Python 3.5, this happens non-deterministically...

Proposed change:

In map.py, in ObjectMapper.__get_fields, change:

if name in all_names:
  name = '_'.join(name_stack)

to:

name = '__'.join(name_stack)

This means all attribute specs of children will be mapped to a class attribute like 'parent1__parent2__parent3__attrname'. These usable, these specs need to be remapped explicitly to a more convenient name. There are a few classes in PyNWB that are affected by this: VoltageClampSeries, ImageSeriesMap, ImagingPlaneMap

And rather than use '_' for the delimited between a spec's parents in the default class attribute name, use '__' (two underscores), to make it less confusing with attribute names that use '_' as a word delimiter. We could completely remove this functionality of mapping child attributes to e.g. 'parent1__parent2__parent3__attrname' and make it so this mapping must be done explicitly in the class' ObjectMapper, but this approach would be cumbersome for dynamically generated classes.

load_namespaces feature is broken

With the current dev branch of hdmf, loading any extension with the NWBHDF5IO flag load_namespaces=True results in the following error:

from pynwb import NWBHDF5IO

with NWBHDF5IO('test_maze.nwb', 'r', load_namespaces=True) as io:
    nwb2 = io.read()

/Users/bendichter/anaconda3/envs/dev_pynwb/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/Users/bendichter/dev/hdmf/src/hdmf/spec/namespace.py:452: UserWarning: ignoring namespace 'core' because it already exists
  warn("ignoring namespace '%s' because it already exists" % ns['name'])
Traceback (most recent call last):
  File "/Users/bendichter/dev/ndx-maze/src/pynwb/ndx_maze/test2.py", line 5, in <module>
    with NWBHDF5IO('test_maze.nwb', 'r', load_namespaces=True) as io:
  File "/Users/bendichter/dev/hdmf/src/hdmf/utils.py", line 388, in func_call
    return func(self, **parsed['args'])
  File "/Users/bendichter/dev/pynwb/src/pynwb/__init__.py", line 216, in __init__
    super(NWBHDF5IO, self).load_namespaces(tm, path)
  File "/Users/bendichter/dev/hdmf/src/hdmf/utils.py", line 388, in func_call
    return func(self, **parsed['args'])
  File "/Users/bendichter/dev/hdmf/src/hdmf/backends/hdf5/h5tools.py", line 109, in load_namespaces
    d.update(namespace_catalog.load_namespaces('namespace', reader=reader))
ValueError: dictionary update sequence element #0 has length 8; 2 is required

For testing, you can use this file: https://drive.google.com/file/d/1iPAokMkr3a4uRlxUZtXQ7LY2gJRb3uG3/view?usp=sharing

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Improve docval usage for subclassed types

It is tedious to change the docval for a hdmf/pynwb class with many subclasses like TimeSeries and then have to change the docval for all subclasses like ElectricalSeries, PatchClampSeries, etc. In other words, there is a lot of duplication of docval values.

It would be a lot cleaner to specify docval for only arguments that are different from the superclass's docval values.

I originally thought something like this would be great:

class ElectricalSeries(TimeSeries):
    @docval({'name': 'data', 'type': ('array_data', 'data', TimeSeries),
             'shape': ((None, ), (None, None), (None, None, None)),
             'doc': 'The data this TimeSeries dataset stores. Can also store binary data e.g. image frames'},
            {'name': 'electrodes', 'type': DynamicTableRegion,
             'doc': 'the table region corresponding to the electrodes from which this series was recorded'},
            *get_docval(TimeSeries.__init__))
    def __init__(self, **kwargs):

or some way to automatically pull the docval values from the superclass, but the order of docval arguments is important and can change in subclasses, so that won't work. Instead, something like this could work:

class ElectricalSeries(TimeSeries):
    @docval(get_docval_arg(TimeSeries.__init__, 'name'),  # required
            {'name': 'data', 'type': ('array_data', 'data', TimeSeries),  # required
             'shape': ((None, ), (None, None), (None, None, None)),
             'doc': 'The data this TimeSeries dataset stores. Can also store binary data e.g. image frames'},
            {'name': 'electrodes', 'type': DynamicTableRegion,  # required
             'doc': 'the table region corresponding to the electrodes from which this series was recorded'},
            get_docval_arg(TimeSeries.__init__, 'resolution'),
            get_docval_arg(TimeSeries.__init__, 'conversion'),
            get_docval_arg(TimeSeries.__init__, 'timestamps'),
            get_docval_arg(TimeSeries.__init__, 'starting_time'),
            get_docval_arg(TimeSeries.__init__, 'comments'),
            get_docval_arg(TimeSeries.__init__, 'description'),
            get_docval_arg(TimeSeries.__init__, 'control'),
            get_docval_arg(TimeSeries.__init__, 'control_description'))
    def __init__(self, **kwargs):

Pro: much less code duplication
Con: chained referrals in code, e.g. if you want to see in the code what types are accepted for the comments argument of a VoltageClampStimulusSeries, you are referred to the docval of its superclass PatchClampSeries and then referred to the docval of its superclass TimeSeries.

Empty dataset throws MissingRequiredWarning

I think we decided that an empty dataset should be allowed. So the following MissingRequiredWarning should not be thrown.

In /tests/unit/build_tests/test_io_map.py:

def test_build_empty(self):
        ''' Test default mapping functionality when no attributes are nested '''
        container = Bar('my_bar', [], 'value1', 10)
        builder = self.mapper.build(container, self.manager)
        expected = GroupBuilder('my_bar', datasets={'data': DatasetBuilder('data', [])},
                                attributes={'attr1': 'value1', 'attr2': 10})
        self.assertDictEqual(builder, expected)

results in:

c:\users\ryan\documents\nwb\hdmf\src\hdmf\build\map.py:974: MissingRequiredWarning: dataset 'data' for 'my_bar' of type (Bar)

Overwriting file with object already in file creates broken link and failure to write

Bug

Pseudocode:

create container C1
add object O to container C1
write container C1 to file F
create container C2
add same object O to container C2
write container C2 to file F

This results in a broken link to object O, and file F no longer has object O.

This example uses pynwb but it also applies to custom containers.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, TimeSeries, NWBHDF5IO, get_manager
import numpy as np

start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())

nwbfile = NWBFile(session_description='demonstrate NWBFile basics',  # required
                  identifier='NWB123',  # required
                  session_start_time=start_time)  # required

data = list(range(100, 200, 10))
timestamps = list(range(10))
test_ts = TimeSeries(name='test_timeseries1', data=data, unit='m', timestamps=timestamps)
nwbfile.add_acquisition(test_ts)

with NWBHDF5IO('example_file_path.nwb', manager=get_manager(), mode='w') as io:
    io.write(nwbfile)

# do it again
nwbfile = NWBFile(session_description='demonstrate NWBFile basics',  # required
                  identifier='NWB123',  # required
                  session_start_time=start_time)  # required

# add object that already exists and was added to the file
nwbfile.add_acquisition(test_ts)

# add new object
test_ts2 = TimeSeries(name='test_timeseries2', data=data, unit='m', timestamps=timestamps)
nwbfile.add_acquisition(test_ts2)

with NWBHDF5IO('example_file_path.nwb', manager=get_manager(), mode='w') as io:
    io.write(nwbfile)  # broken link warnings

with NWBHDF5IO('example_file_path.nwb', manager=get_manager(), mode='r') as io:
    read_file = io.read()
    print(read_file)  # missing test_timeseries1

Environment

Please describe your environment according to the following bullet points.

Python Executable: Conda
Python Version: Python 3.7
Operating System: Windows
HDMF Version: latest dev

Refactor DataChunkIterator to not read first chunk on init

DataChunkIterator reads first chunk on init in order to determine properties such as maxshape and dtype. This can break certain setups where there is no first chunk, such as acquisition systems streaming to a DataChunkIterator.

Docval shape cannot be checked on types without len

When shape is defined for a docval argument and type can be an object that does not have a __len__, e.g. ElectricalSeries.data

class ElectricalSeries(TimeSeries):
    @docval(...
            {'name': 'data', 'type': ('array_data', 'data', TimeSeries),
             'shape': ((None, ), (None, None), (None, None, None)),
             'doc': 'The data this TimeSeries dataset stores. Can also store binary data e.g. image frames'},
             ...)

then doing a direct link between data of one ElectricalSeries and another:

ts1 = ElectricalSeries('test_ts1', [0, 1, 2, 3, 4, 5], region, timestamps=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
ts2 = ElectricalSeries('test_ts2', ts1, region, timestamps=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5])

results in the error:

Traceback (most recent call last):
  File "C:\Users\Ryan\Documents\NWB\temp\pynwb\tests\unit\test_ecephys.py", line 42, in test_timestamps_link
    ts2 = ElectricalSeries('test_ts2', ts1, region, timestamps=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 379, in func_call
    allow_extra=allow_extra)
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 176, in __parse_args
    if not __shape_okay_multi(argval, arg['shape']):
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 71, in __shape_okay_multi
    return any(__shape_okay(value, a) for a in argshape)
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 71, in <genexpr>
    return any(__shape_okay(value, a) for a in argshape)
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 78, in __shape_okay
    if not len(valshape) == len(argshape):
TypeError: object of type 'NoneType' has no len()

miniconda27 failing

miniconda27 started failing CI. @jcfr, any idea what's going on here?

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

optional absolute paths for external links

We are changing paths of external links from absolute to relative (#17) (for Groups- for Datasets it was already set this way), which I think is the ideal default. However, I can also imagine a situation where the user would really want an absolute path. Imagine a lab stores acquisition data in a central read-only location and construct NWB files that link to this data for their everyday analysis. In this case they'll want the ability to move the analysis NWB files around without breaking the link. (I think I remember someone at AIBS talking about this kind of workflow). For this use-case, I think it would be useful to support absolute paths for external links are well. I think adding an option in manager or something would make sense. The complication arises when a user wants to create multiple links in a single NWB file, some of which are absolute and some of which are relative. For this, would it be possible to have a wrapper object similar to HDF5IO that would instruct pynwb which type of link should be used?

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

SpecCatalog copy broken

Bug:
SpecCatalog.copy and SpecCatalog.deepcopy are broken due to: 1) missing import of copy, 2) use of undefined variable spec, and 3) missing memo parameter for the deepcopy method. Pull request with fix is coming soon.

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

nested type definitions

There are two ways to define a new Container type in the spec language. You can either define a Container type in the /groups of the schema with a _def field (and optionally an _inc) and then use it in another place with an _inc field (and no _def field), or you can define it and use it at the same time by including _def and _inc fields in a nested spot of the schema. Most types are defined in /groups and /datasets and included separately, but there are cases in the NWB:N core schema where types are defined as they are used, e.g.

https://github.com/NeurodataWithoutBorders/pynwb/blob/a081b39803f0ffb535b2198a2fdd81007ffa442e/src/pynwb/data/nwb.file.yaml#L234-L243

As far as I can tell, this capability is purely for convenience and offers no additional features or expressive power for the defined types. Currently our get_class (i.e. type_map.get_container_cls in hdmf) function only works at the surface level (in '/groups') and will not register nested Container definitions. As a result, there are schemas that follow the schema language rules, but for which the API auto-generation will not work. I see a few possible solutions.

Here is a minimal example. The last 4 commands are the critical piece.

import os
import unittest2 as unittest
import tempfile
import warnings
import numpy as np

from hdmf.utils import docval, getargs
from hdmf.data_utils import DataChunkIterator
from hdmf.backends.hdf5.h5tools import HDF5IO
from hdmf.backends.hdf5 import H5DataIO
from hdmf.build import DatasetBuilder, BuildManager, TypeMap, ObjectMapper
from hdmf.spec.namespace import NamespaceCatalog
from hdmf.spec.spec import AttributeSpec, DatasetSpec, GroupSpec, ZERO_OR_MANY, ONE_OR_MANY
from hdmf.spec.namespace import SpecNamespace
from hdmf.spec.catalog import SpecCatalog
from hdmf.container import Container
from h5py import SoftLink, HardLink, ExternalLink, File

from tests.unit.test_utils import Foo, FooBucket, CORE_NAMESPACE

foo_spec = GroupSpec('A test group specification with a data type',
                     data_type_def='Foo',
                     datasets=[DatasetSpec('an example dataset',
                                           'int',
                                           name='my_data',
                                           attributes=[AttributeSpec('attr2',
                                                                     'an example integer attribute',
                                                                     'int')])],
                     attributes=[AttributeSpec('attr1', 'an example string attribute', 'text')])

tmp_spec = GroupSpec('A subgroup for Foos',
                     name='foo_holder',
                     groups=[GroupSpec('the Foos in this bucket', data_type_inc='Foo', quantity=ZERO_OR_MANY)])

bucket_spec = GroupSpec('A test group specification for a data type containing data type',
                        data_type_def='FooBucket',
                        groups=[tmp_spec])

class BucketMapper(ObjectMapper):
    def __init__(self, spec):
        super(BucketMapper, self).__init__(spec)
        foo_spec = spec.get_group('foo_holder').get_data_type('Foo')
        self.map_spec('foos', foo_spec)

file_spec = GroupSpec("A file of Foos contained in FooBuckets",
                      name='root',
                      data_type_def='FooFile',
                      groups=[GroupSpec('Holds the FooBuckets',
                                        name='buckets',
                                        groups=[GroupSpec("One ore more FooBuckets",
                                                          data_type_inc='FooBucket',
                                                          quantity=ONE_OR_MANY)])])

class FileMapper(ObjectMapper):
    def __init__(self, spec):
        super(FileMapper, self).__init__(spec)
        bucket_spec = spec.get_group('buckets').get_data_type('FooBucket')
        self.map_spec('buckets', bucket_spec)

spec_catalog = SpecCatalog()
spec_catalog.register_spec(foo_spec, 'test.yaml')
spec_catalog.register_spec(bucket_spec, 'test.yaml')
spec_catalog.register_spec(file_spec, 'test.yaml')
namespace = SpecNamespace(
    'a test namespace',
    CORE_NAMESPACE,
    [{'source': 'test.yaml'}],
    catalog=spec_catalog)
namespace_catalog = NamespaceCatalog()
namespace_catalog.add_namespace(CORE_NAMESPACE, namespace)
type_map = TypeMap(namespace_catalog)

type_map.register_container_type(CORE_NAMESPACE, 'Foo', Foo)
type_map.register_container_type(CORE_NAMESPACE, 'FooBucket', FooBucket)
type_map.register_container_type(CORE_NAMESPACE, 'FooFile', FooFile)

type_map.register_map(FooBucket, BucketMapper)
type_map.register_map(FooFile, FileMapper)

manager = BuildManager(type_map)

spec_catalog = manager.namespace_catalog.get_namespace(CORE_NAMESPACE).catalog
foo_spec = spec_catalog.get_spec('Foo')
# Baz1 class contains an object of Baz2 class
baz_spec2 = GroupSpec('A composition inside',
                      data_type_def='Baz2',
                      data_type_inc=foo_spec,
                      attributes=[
                          AttributeSpec('attr3', 'an example float attribute', 'float'),
                          AttributeSpec('attr4', 'another example float attribute', 'float')])

baz_spec1 = GroupSpec('A composition test outside',
                      data_type_def='Baz1',
                      data_type_inc=foo_spec,
                      attributes=[AttributeSpec('attr3', 'an example float attribute', 'float'),
                                  AttributeSpec('attr4', 'another example float attribute', 'float')],
                      groups=[baz_spec2])

# add directly into the existing spec_catalog. would not do this normally.
spec_catalog.register_spec(baz_spec1, 'test.yaml')

Baz2 = manager.type_map.get_container_cls(CORE_NAMESPACE, 'Baz2')

Inform users that if they want to auto-generate the API, all type definitions must be in /groups and /datasets. This is a limitation only in style and not in function. The biggest downside here is that the rules used for writing extensions will be slightly more strict than those for the core (again, only stylistically).
You could see this freedom of the schema language as a bug, since it provides multiple right ways to do something without offering any benefit. We could change the schema language rules so that you can only define types at the surface and change the schema to match. This should not cause any compatibility issues for the core, but would raise some eyebrows. If we are going to to this we should do it now before community extensions start to accumulate.
We could build logic to search the tree for new types.

Thoughts?

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

DataChunkIterator converting data to float

Description

pynwb==1.1.0.post0.dev1 and hdmf==1.2.0.post0.dev1
I’m trying to store int16 data using DataChunkIterator but it seems to be converting dtype to float. Here’s an example:

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import NWBHDF5IO
from pynwb.ecephys import ElectricalSeries
from hdmf.data_utils import DataChunkIterator
import numpy as np
import os

nwb = NWBFile(session_description='session', identifier='1', session_start_time=datetime.now(tzlocal()))

# Add electrode groups and channels
nChannels = 10
dev0 = nwb.create_device(name='dev0')
elecs_group = nwb.create_electrode_group(name='electrodes', description='', location='ctx', device=dev0)
for i in np.arange(nChannels):
    ii = float(i)
    nwb.add_electrode(x=ii, y=ii, z=ii, imp=ii, location='', filtering='', group=elecs_group)

#Add signal
elecs_region = nwb.electrodes.create_region(name='electrodes',
                                            region=np.arange(nChannels).tolist(),
                                            description='')

def data_generator(nChannels):
    X_data = np.zeros((nChannels,1000)).astype('int16')
    for id in range(0, nChannels):
        data = X_data[id,:]
        print(data.dtype)
        yield data
data = data_generator(nChannels=10)
iter_data = DataChunkIterator(data=data, iter_axis=1)

signal = ElectricalSeries(name='ElectricalSeries', 
                          data=iter_data, 
                          electrodes=elecs_region,
                          starting_time=0.,
                          rate=1.)
nwb.add_acquisition(signal)

#Write file
with NWBHDF5IO('file_1.nwb', mode='w') as io:
    io.write(nwb)
    
# Read file
io = NWBHDF5IO('file_1.nwb', 'r+')
nwbfile = io.read()
print(nwbfile.acquisition['ElectricalSeries'].data[:,0].dtype)
io.close()

If I pass the data directly, it works fine:

# iter_data = DataChunkIterator(data=data, iter_axis=1)
iter_data = np.zeros((nChannels,1000)).astype('int16')

Checklist

[ x] Have you ensured the feature or change was not already reported ?
[ x] Have you included a brief and descriptive title?
[ x] Have you included a clear description of the problem you are trying to solve?
[ x] Have you included a minimal code snippet that reproduces the issue you are encountering?
[ x] Have you checked our Contributing document?

Updarte

Briefly describe the needed feature as well as the reasoning behind it

Is your feature request related to a problem? Please describe.

Trying to get property of a namespace before exporting it is not possible without accessing the internal ivar _NamespaceBuilder__ns_args.

To learn about the context, see nwb-extensions/ndx-template#10

Describe the solution you'd like

Updating the NamespaceBuilder implementing the __getattr__ function could make this easier.

Instead of:

ns_builder._NamespaceBuilder__ns_args['name']

we could have

ns_builder.name

See

hdmf/src/hdmf/spec/write.py

Line 91 in 99ce39e

class NamespaceBuilder(object):

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

float64 types saved and read as float32

Bug

float (which is implicitly 64 bits) and np.float64 are downgraded to float32 in the resulting HDF5 file, and are read as np.float32.

problem identified by NeurodataWithoutBorders/pynwb#874

from pynwb import NWBFile, NWBHDF5IO
from datetime import datetime
from dateutil.tz import tzlocal

start_time = datetime(2019, 1, 1, 11, tzinfo=tzlocal())

nwbfile = NWBFile('aa', 'TSD', start_time)

nwbfile.add_trial(start_time=3.4, stop_time=5.5)
with NWBHDF5IO('/Users/bendichter/dev/pynwb/test_precision.nwb', 'w') as io:
    io.write(nwbfile)

with NWBHDF5IO('/Users/bendichter/dev/pynwb/test_precision.nwb', 'r') as io:
    print(type(io.read().trials['start_time'].data[0]))

<class 'numpy.float32'>

Environment

Please describe your environment according to the following bullet points.

Python Executable: Conda
Python Version: Python 3.6
Operating System: macOS
HDMF Version: latest dev

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

add shape check for docval args of dynamic classes (get_class)

I would like get_class to automatically add a 'shape' key to the docval arg-spec according to the schema.

The following should throw an error:

from pynwb.spec import NWBDatasetSpec, NWBGroupSpec, NWBNamespaceBuilder

name = 'test_shape'
ns_path = name + ".namespace.yaml"
ext_source = name + ".extensions.yaml"


test_type = NWBGroupSpec(
    neurodata_type_def='TestType',
    datasets=[
        NWBDatasetSpec(name='int_var', dtype='int', doc='doc', quantity='?', shape=(2, 2))
    ],
    doc='doc')


ns_builder = NWBNamespaceBuilder(name + ' extensions', name)
ns_builder.add_spec(ext_source, test_type)
ns_builder.export(ns_path)


from pynwb import load_namespaces, get_class

load_namespaces(ns_path)
test_type = get_class('TestType', name)

test_type(name='test_name', int_var=[1,2,3])

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

List of strings encoded in ASCII, not Unicode

An attribute that is a list of strings is encoded as ASCII instead of Unicode. All other strings are encoded as Unicode.

H5DataIO methods break if linked HDF5 dataset is closed

If a Container has an attribute x set to an H5DataIO object which wraps another Container read from a file, then after that file is closed, any methods (e.g. len) called on the x create errors.

Export extension specification YAML from structure of an HDMF file

2) Feature Request

This came out of a discussion with @ttngu207 about generating schema for specific NWB Files.

After writing an NWB file, a user would be able to export a schema extension that specifies the structure of the file.

HDMF would need to crawl the file and identify which components of a file are specializations of the included definition.

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

SpecNamespace.date returns full_name instead of date. Fix forthcoming

SpecNamespace.date returns full_name instead of date. Fix forthcoming

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Improve unit tests that use custom namespaces

Some unit tests within tests/unit/test_io_hdf5_h5tools.py and elsewhere create custom namespaces dynamically (see _get_manager()) using namespace_catalog.add_namespace. This function is deprecated. We should change these tests to create yaml spec files dynamically and load those in instead.

Move NWBBaseType functionality to Container

Some of the functionality in NWB core.py is general-purpose and should eventually be extracted into HDMF, including all of NWBBaseType.

Aside: tracking modified data, if we want to go down that route, should happen in _setter.

Clean up warnings

Running the test suite with all warnings on results in a few warnings, including:

DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
DeprecationWarnings or PendingDeprecationWarnings that we generate (e.g. NamespaceCatalog.add_namespace)
DeprecationWarning on assertRaisesRegexp -> assertRaisesRegex

TODO: clean up these tests and associated source code to make a clean build

Provide more informative warnings if a link is broken

Enhancement

If an external link is broken, currently the user gets a warning message such as the following:

c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\hdf5\h5tools.py:306: UserWarning: Broken Link: /acquisition\acquisition
  warnings.warn('Broken Link: %s' % os.path.join(h5obj.name, k))
c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\hdf5\h5tools.py:306: UserWarning: Broken Link: /acquisition\ts_name
  warnings.warn('Broken Link: %s' % os.path.join(h5obj.name, k))

Suggestion: the warning should say whether the link is broken because

the file is open elsewhere (see #26), or
the file cannot be found, or
the path within the file cannot be found

User should also not get two warnings when there is only one broken link.

Unit tests should test these warnings as well.

Steps to Reproduce

from pynwb import NWBFile, TimeSeries, get_manager, NWBHDF5IO
from datetime import datetime
from dateutil.tz import tzlocal
import os

fpath = 'test_externals.nwb'
fpath2 = 'test_externals2.nwb'

manager = get_manager()

nwbfile = NWBFile("a file with header data", "NB123A", datetime(2017, 5, 1, 12, 0, 0, tzinfo=tzlocal()))
nwbfile.add_acquisition(TimeSeries('ts_name', data=[1,2,3], unit='m', rate=100.))

with NWBHDF5IO(fpath, mode='w', manager=manager) as io:
    io.write(nwbfile)
    io.close()

nwbfile2 = NWBFile("a file with header data", "NB123B", datetime(2017, 5, 1, 12, 0, 0, tzinfo=tzlocal()))

with NWBHDF5IO(fpath, mode='r', manager=manager) as io:
    nwb_read = io.read()
    nwbfile2.add_acquisition(nwb_read.acquisition['ts_name'])
    io.close()

with NWBHDF5IO(fpath2, mode='w', manager=manager) as io:
    io.write(nwbfile2)
    io.close()

os.remove(fpath)

with NWBHDF5IO(fpath2, mode='r') as io:
    nwbfile3 = io.read()
    io.close()

Environment

Python Executable: Conda
Python Version: Python 3.6
Operating System: Windows
HDMF Version: 1.0.3

relative paths for external links

Briefly describe the needed feature as well as the reasoning behind it

Is your feature request related to a problem? Please describe.
I was trying to share two files, one of which has an external link to another, and I realized that this link used the absolute path to second file, not the relative path, so when I moved the two files the link broke, even if the relative position of the files stayed the same (they were in the same directory). This makes it pretty much impossible to share files that have links between them.

Describe the solution you'd like
I would like to be able to use relative links instead, so that I can move both of the files at once and the link continues to work.

Additional context
I am currently using external links to hold subject-specific information. I am working with ECoG data that contains the a cortical surface mesh object. My current solution is to save this mesh object in every session file, but sometimes there are 100+ session files for a single subject and this surface mesh object never changes for a subject, so it would be better to store this as a separate file and link to if from each session file.

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

pycache sneaked into release tarballs uploaded to pypi

λ tar -xvf "hdmf-1.0.5.tar.gz"
x hdmf-1.0.5/
x hdmf-1.0.5/tests/
x hdmf-1.0.5/tests/__pycache__/
x hdmf-1.0.5/tests/__pycache__/__init__.cpython-36.pyc
x hdmf-1.0.5/tests/__init__.py
x hdmf-1.0.5/tests/coverage/
x hdmf-1.0.5/tests/coverage/runCoverage
x hdmf-1.0.5/tests/unit/
x hdmf-1.0.5/tests/unit/test_container.py
x hdmf-1.0.5/tests/unit/__pycache__/
x hdmf-1.0.5/tests/unit/__pycache__/__init__.cpython-36.pyc
x hdmf-1.0.5/tests/unit/__pycache__/test_utils.cpython-36.pyc
x hdmf-1.0.5/tests/unit/__pycache__/test_query.cpython-36.pyc
x hdmf-1.0.5/tests/unit/__pycache__/test_container.cpython-36.pyc
x hdmf-1.0.5/tests/unit/__pycache__/test_io_hdf5_h5tools.cpython-36.pyc
x hdmf-1.0.5/tests/unit/__pycache__/test_io_hdf5.cpython-36.pyc
x hdmf-1.0.5/tests/unit/test_io_hdf5.py
x hdmf-1.0.5/tests/unit/test_io_hdf5_h5tools.py
x hdmf-1.0.5/tests/unit/utils_test/
x hdmf-1.0.5/tests/unit/utils_test/test_core_DataChunkIterator.py
x hdmf-1.0.5/tests/unit/utils_test/__pycache__/
x hdmf-1.0.5/tests/unit/utils_test/__pycache__/__init__.cpython-36.pyc
x hdmf-1.0.5/tests/unit/utils_test/__pycache__/test_core_ShapeValidator.cpython-36.pyc
x hdmf-1.0.5/tests/unit/utils_test/__pycache__/test_core.cpython-36.pyc
x hdmf-1.0.5/tests/unit/utils_test/__pycache__/test_core_DataChunkIterator.cpython-36.pyc
x hdmf-1.0.5/tests/unit/utils_test/__init__.py
x hdmf-1.0.5/tests/unit/utils_test/test_core.py
x hdmf-1.0.5/tests/unit/utils_test/test_core_ShapeValidator.py
x hdmf-1.0.5/tests/unit/test_utils.py
x hdmf-1.0.5/tests/unit/__init__.py
x hdmf-1.0.5/tests/unit/build_tests/
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/__init__.cpython-36.pyc
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/test_io_map.cpython-36.pyc
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/test_io_map_data.cpython-36.pyc
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/test_io_manager.cpython-36.pyc
x hdmf-1.0.5/tests/unit/build_tests/__pycache__/test_io_build_builders.cpython-36.pyc

error when reading from nwb file with electricalseries with rate parameter

I get an error when reading from a nwb file that has an ElectricalSeries that was saved with rate parameter. This happens with the current repo version of hdmf==1.2.0.post0.dev1 and did not happen with previous versions.

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import NWBHDF5IO
from pynwb.ecephys import ElectricalSeries
import numpy as np
import os

nwb = NWBFile(session_description='session', identifier='1', session_start_time=datetime.now(tzlocal()))

# Add electrode groups and channels
nChannels = 10
dev0 = nwb.create_device(name='dev0')
elecs_group = nwb.create_electrode_group(name='electrodes', description='', location='ctx', device=dev0)
for i in np.arange(nChannels):
    ii = float(i)
    nwb.add_electrode(x=ii, y=ii, z=ii, imp=ii, location='', filtering='', group=elecs_group)

#Add signal
elecs_region = nwb.electrodes.create_region(name='electrodes',
                                            region=np.arange(nChannels).tolist(),
                                            description='')
X_data = np.zeros((nChannels,1000))
signal = ElectricalSeries(name='ElectricalSeries', 
                          data=X_data, 
                          electrodes=elecs_region,
                          starting_time=0.,
                          rate=1.)
nwb.add_acquisition(signal)

#Write file
with NWBHDF5IO('file_1.nwb', mode='w') as io:
    io.write(nwb)
    
# Read file
io = NWBHDF5IO('file_1.nwb', 'r+')
nwbfile = io.read()
print(nwbfile)

Saving with timestamps instead of rate works:

signal = ElectricalSeries(name='ElectricalSeries', 
                          data=X_data, 
                          electrodes=elecs_region,
                          #starting_time=0.,
                          #rate=1.,
                          timestamps=np.arange(1000))

The error produced when trying to save it with rate is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1188                               object_id=builder.attributes.get(self.__spec.id_key()))
-> 1189             obj.__init__(**kwargs)
   1190         except Exception as ex:

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:

~\Anaconda3\envs\spike_extract\lib\site-packages\pynwb\ecephys.py in __init__(self, **kwargs)
     95         name, electrodes, data = popargs('name', 'electrodes', 'data', kwargs)
---> 96         super(ElectricalSeries, self).__init__(name, data, 'volt', **kwargs)
     97         self.electrodes = electrodes

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:

~\Anaconda3\envs\spike_extract\lib\site-packages\pynwb\base.py in __init__(self, **kwargs)
    172         else:
--> 173             raise TypeError("either 'timestamps' or 'rate' must be specified")
    174 

TypeError: either 'timestamps' or 'rate' must be specified

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
<ipython-input-1-f7b01394867c> in <module>
     35 # Read file
     36 io = NWBHDF5IO('file_1.nwb', 'r+')
---> 37 nwbfile = io.read()
     38 print(nwbfile)

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\backends\hdf5\h5tools.py in read(self, **kwargs)
    244                                        % (self.__path, self.__mode))
    245         try:
--> 246             return call_docval_func(super(HDF5IO, self).read, kwargs)
    247         except UnsupportedOperation as e:
    248             if str(e) == 'Cannot build data. There are no values.':

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in call_docval_func(func, kwargs)
    325 def call_docval_func(func, kwargs):
    326     fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 327     return func(*fargs, **fkwargs)
    328 
    329 

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\backends\io.py in read(self, **kwargs)
     33             # TODO also check that the keys are appropriate. print a better error message
     34             raise UnsupportedOperation('Cannot build data. There are no values.')
---> 35         container = self.__manager.construct(f_builder)
     36         return container
     37 

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
    205                 # we are at the top of the hierarchy,
    206                 # so it must be time to resolve parents
--> 207                 result = self.__type_map.construct(builder, self, None)
    208                 self.__resolve_parents(result)
    209             self.prebuilt(result, builder)

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1667             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
   1668         else:
-> 1669             return attr_map.construct(builder, build_manager, parent)
   1670 
   1671     @docval({"name": "container", "type": Container, "doc": "the container to convert to a Builder"},

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1161         cls = manager.get_cls(builder)
   1162         # gather all subspecs
-> 1163         subspecs = self.__get_subspec_values(builder, self.spec, manager)
   1164         # get the constructor argument that each specification corresponds to
   1165         const_args = dict()

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1103                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1104             # now process groups and datasets
-> 1105             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1106             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1107         elif isinstance(spec, DatasetSpec):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1141                 if dt is None:
   1142                     # recurse
-> 1143                     ret.update(self.__get_subspec_values(sub_builder, subspec, manager))
   1144                 else:
   1145                     ret[subspec] = manager.construct(sub_builder)

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in __get_subspec_values(self, builder, spec, manager)
   1103                         ret[subspec] = self.__flatten(sub_builder, subspec, manager)
   1104             # now process groups and datasets
-> 1105             self.__get_sub_builders(groups, spec.groups, manager, ret)
   1106             self.__get_sub_builders(datasets, spec.datasets, manager, ret)
   1107         elif isinstance(spec, DatasetSpec):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in __get_sub_builders(self, sub_builders, subspecs, manager, ret)
   1133                 sub_builder = builder_dt.get(dt)
   1134                 if sub_builder is not None:
-> 1135                     sub_builder = self.__flatten(sub_builder, subspec, manager)
   1136                     ret[subspec] = sub_builder
   1137             else:

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in __flatten(self, sub_builder, subspec, manager)
   1146 
   1147     def __flatten(self, sub_builder, subspec, manager):
-> 1148         tmp = [manager.construct(b) for b in sub_builder]
   1149         if len(tmp) == 1 and not subspec.is_many():
   1150             tmp = tmp[0]

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in <listcomp>(.0)
   1146 
   1147     def __flatten(self, sub_builder, subspec, manager):
-> 1148         tmp = [manager.construct(b) for b in sub_builder]
   1149         if len(tmp) == 1 and not subspec.is_many():
   1150             tmp = tmp[0]

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
    201             if parent_builder is not None:
    202                 parent = self.__get_proxy_builder(parent_builder)
--> 203                 result = self.__type_map.construct(builder, self, parent)
    204             else:
    205                 # we are at the top of the hierarchy,

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1667             raise ValueError('No ObjectMapper found for builder of type %s' % dt)
   1668         else:
-> 1669             return attr_map.construct(builder, build_manager, parent)
   1670 
   1671     @docval({"name": "container", "type": Container, "doc": "the container to convert to a Builder"},

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
    436                         raise_from(ExceptionType(msg), None)
    437 
--> 438                 return func(self, **parsed['args'])
    439         else:
    440             def func_call(*args, **kwargs):

~\Anaconda3\envs\spike_extract\lib\site-packages\hdmf\build\map.py in construct(self, **kwargs)
   1190         except Exception as ex:
   1191             msg = 'Could not construct %s object' % (cls.__name__,)
-> 1192             raise_from(Exception(msg), ex)
   1193         return obj
   1194 

~\AppData\Roaming\Python\Python37\site-packages\six.py in raise_from(value, from_value)

Exception: Could not construct ElectricalSeries object

Checklist

[ x] Have you ensured the feature or change was not already reported ?
[ x] Have you included a brief and descriptive title?
[ x] Have you included a clear description of the problem you are trying to solve?
[ x] Have you included a minimal code snippet that reproduces the issue you are encountering?
[ x] Have you checked our Contributing document?

ObjectMapper.convert_dtype fails on length 0 lists and tuples

see here:

hdmf/src/hdmf/build/map.py

Line 439 in d60de33

ret_dtype = tmp_dtype

tmp_dtype is only defined if at least one iteration of the above loop has occurred.

Incorrect error thrown when link is broken

Code flow bug. If an external link is broken, this error is thrown:

Traceback (most recent call last):
  File "test2.py", line 5, in <module>
    nwbfile3 = io.read()
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 388, in func_call
    return func(self, **parsed['args'])
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\io.py", line 32, in read
    f_builder = self.read_builder()
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\utils.py", line 388, in func_call
    return func(self, **parsed['args'])
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\hdf5\h5tools.py", line 227, in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME, ignore=ignore)
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\hdf5\h5tools.py", line 302, in __read_group
    builder = read_method(sub_h5obj)
  File "c:\users\ryan\documents\nwb\hdmf\src\hdmf\backends\hdf5\h5tools.py", line 270, in __read_group
    if sub_h5obj.name in ignore:
AttributeError: 'NoneType' object has no attribute 'name'

Traced down to the code:

if sub_h5obj.name in ignore:
    continue
if not (sub_h5obj is None):
    ...
else:
    warnings.warn('Broken Link: %s' % os.path.join(h5obj.name, k))
    kwargs['datasets'][k] = None
    continue

Solution: logically, the first if statement above should be moved to inside the second if statement.

dynamic class docval type constraints too lenient

Bug

from pynwb.spec import NWBNamespaceBuilder, NWBGroupSpec, NWBAttributeSpec
from pynwb.epoch import TimeIntervals

name = 'test_groups'
ns_path = name + ".namespace.yaml"
ext_source = name + ".extensions.yaml"

group_type = NWBGroupSpec(
    neurodata_type_def='Group1',
    name='group1',
    doc='doc',
    neurodata_type_inc="NWBDataInterface",
    groups=[
        NWBGroupSpec(doc='doc', neurodata_type_def='Group2', neurodata_type_inc="NWBDataInterface",
                     attributes=[NWBAttributeSpec(name='str_type', dtype='text', doc='doc')]
                    )
    ]
)

ns_builder = NWBNamespaceBuilder(name + ' extensions', name)
ns_builder.add_spec(ext_source, group_type)
ns_builder.export(ns_path)


from pynwb import load_namespaces, get_class

load_namespaces(ns_path)
Group1 = get_class('Group1', name)
Group2 = get_class('Group2', name)

#group2 = Group2(str_type='hello', name='hello')
group2 = TimeIntervals(name='hello')

print(Group1( group2=group2))

This passes although it should not, because I am adding a TimeIntervals object where only a Group2 object should go.

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

ObjectMapper convert_dtype function returns dtype object instead of dtype type

PR #143 introduced a bug where a numpy dtype object was returned instead of a type object for the dtype.

hdmf/src/hdmf/build/map.py

Lines 474 to 477 in 62d487e

    
           if spec.dtype is None or spec.dtype == 'numeric' or type(value) in cls.__no_convert: 
        
               # infer type from value 
        
               if hasattr(value, 'dtype'):  # covers numpy types, AbstractDataChunkIterator 
        
                   return value, value.dtype

pynwb validation broken since a16cc63e

The validation in pynwb is broken since a16cc63 (passively read datasets that are vectors of strings, 2019-05-03), see #46 and NeurodataWithoutBorders/pynwb#911.

thomas@thomas-win7-x64 MINGW64 /e/projekte/pynwb (better-validation)
$ python docs/gallery/domain/ophys.py

thomas@thomas-win7-x64 MINGW64 /e/projekte/pynwb (better-validation)
$ python -m pynwb.validate ophys_example.nwb; echo $?
Validating ophys_example.nwb against core namespace
 - found the following errors:
root/file_create_date (file_create_date): incorrect type - expected 'isodatetime', got 'object'
1

schema yaml validator

We should have a tool that checks the yaml files to make sure that schema specification files are correct. In other words, a meta-schema validator that checks the core and any extension schema files. This will be particularly useful for extension sharing, so we could run submitted extensions through an automated validator to ensure that they are at least syntactically correct.

https://json-schema.org/ would be a good tool for this.

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Bail out in @docval if a duplicated parameter name is used

Prevent issues like NeurodataWithoutBorders/pynwb#805

Port of NeurodataWithoutBorders/pynwb#806 @t-b

files with external links cannot be read in append mode

Bug

from pynwb import NWBFile, TimeSeries, get_manager, NWBHDF5IO
from datetime import datetime
from dateutil.tz import tzlocal

fpath = 'test_externals.nwb'
fpath2 = 'test_externals2.nwb'

manager = get_manager()

nwbfile = NWBFile("a file with header data", "NB123A", datetime(2017, 5, 1, 12, 0, 0, tzinfo=tzlocal()))

nwbfile.add_acquisition(TimeSeries('ts_name', data=[1,2,3], unit='m', rate=100.))


with NWBHDF5IO(fpath, mode='w', manager=manager) as io:
    io.write(nwbfile)

nwbfile2 = NWBFile("a file with header data", "NB123B", datetime(2017, 5, 1, 12, 0, 0, tzinfo=tzlocal()))

nwb_read = NWBHDF5IO(fpath, mode='r', manager=manager).read()

nwbfile2.add_acquisition(nwb_read.acquisition['ts_name'])

with NWBHDF5IO(fpath2, mode='w', manager=manager) as io:
    io.write(nwbfile2)

with NWBHDF5IO(fpath2, mode='a') as io:
    io.read()

error:

Traceback (most recent call last):
  File "/Users/bendichter/Library/Preferences/PyCharmCE2019.1/scratches/scratch_22.py", line 29, in <module>
    io.read()
  File "/Users/bendichter/dev/hdmf/src/hdmf/utils.py", line 381, in func_call
    return func(self, **parsed['args'])
  File "/Users/bendichter/dev/hdmf/src/hdmf/backends/io.py", line 32, in read
    f_builder = self.read_builder()
  File "/Users/bendichter/dev/hdmf/src/hdmf/utils.py", line 381, in func_call
    return func(self, **parsed['args'])
  File "/Users/bendichter/dev/hdmf/src/hdmf/backends/hdf5/h5tools.py", line 227, in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME, ignore=ignore)
  File "/Users/bendichter/dev/hdmf/src/hdmf/backends/hdf5/h5tools.py", line 304, in __read_group
    builder = read_method(sub_h5obj)
  File "/Users/bendichter/dev/hdmf/src/hdmf/backends/hdf5/h5tools.py", line 272, in __read_group
    if sub_h5obj.name in ignore:
AttributeError: 'NoneType' object has no attribute 'name'

Please describe your environment according to the following bullet points.

Python Executable: Conda
Python Version: Python 3.6
Operating System: macOS
HDMF Version: dev

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

Add PyNWB tests to CI

Changes in HDMF may break PyNWB. It would help the developers to setup CI such that PyNWB tests are run automatically on each PR for HDMF. These tests could just be run on Linux with Python 3.7. These tests will not always succeed since changes here may be needed before corresponding changes in PyNWB can be merged into the dev branch.

Add unique identifiers to data_types

data_type should be uniquely identifiable. This will help with identifying and retrieving of objects from a HDMF dataset. The proposed solution is as follows:

Any object with a data_type and namespace attribute will also have a new attribute, called object_id
object_id will be a UUID-4 hexstring

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?
Have you checked our Contributing document?

	if spec.dtype is None:
	return value, None
	if spec.dtype == 'numeric':
	return value, None

	if spec.dtype is None or spec.dtype == 'numeric' or type(value) in cls.__no_convert:
	# infer type from value
	if hasattr(value, 'dtype'): # covers numpy types, AbstractDataChunkIterator
	return value, value.dtype

hdmf-dev / hdmf Goto Github PK

hdmf's People

Contributors

Stargazers

Watchers

Forkers

hdmf's Issues

Checklist

Description

Checklist

Checklist

Checklist

Checklist

Checklist

Checklist

Checklist

Description

Checklist

Checklist

Checklist

Checklist

2) Feature Request

Checklist

Checklist

Enhancement

Steps to Reproduce

Environment

Checklist

Checklist

Checklist

Checklist

Bug

Checklist

Checklist

Recommend Projects

Recommend Topics

Recommend Org