bhealth,rymc

Add periods of non-wear detection / removal

A nice feature of the library would be a method to detect, and remove, periods of non-wear for wearable devices.

How I have done it in the past with the EurValve wearable is:
"Periods of time where the patient is not wearing the wearable are excluded by measuring the variance in arm angle changes over 20 minute blocks of time. If the variance in a block is less than 1×10−7, then that block of time is excluded from analysis."

Though we may want to try other methods..

ValueError: setting an array element with a sequence in localisation_example.py

Running localisation_example.py results in the following error.

Traceback (most recent call last):                                                                           
  File "examples/localisation_example.py", line 149, in <module>                             
    clf_grid.fit(X_train, y_train)                                                                                     
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 722, in fit             
    self._run_search(evaluate_candidates)                                                                  
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 1191, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))                                                  
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 711, in evaluate_candidates
    cv.split(X, y, groups)))                                                                                    
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 917, in __call__
    if self.dispatch_one_batch(iterator):                                                    
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)                                                                                                                                                                                                                      
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)                                                                                                                                                                                                                                     
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)                                                                               
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()                                                                                   
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]                                                    
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]                                                            
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 528, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)                                                    
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 265, in fit       
    Xt, fit_params = self._fit(X, y, **fit_params)                                            
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 230, in _fit
    **fit_params_steps[name])                                                                  
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py", line 342, in __call__
    return self.func(*args, **kwargs)                                                             
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 614, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/base.py", line 467, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)  
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 223, in fit
    X = self._validate_input(X)  
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 197, in _validate_input                  
    raise ve                            
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 190, in _validate_input
    force_all_finite=force_all_finite, copy=self.copy)
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/utils/validation.py", line 527, in check_array
    array = np.asarray(array, dtype=dtype, order=order)
  File "/home/rmc/anaconda2/lib/python3.6/site-packages/numpy/core/numeric.py", line 538, in asarray      
    return array(a, dtype, copy=False, order=order)   
ValueError: setting an array element with a sequence.

Move examples out of the src folder

The examples should not be part of the library, but be use cases using the library. A good example to follow would be the Scikit-learn folder for example, which has its own Sphinx-Gallery.

Add a sliding window class

If the transform class is only a set of functions:

Should we have a sliding window class that applies the given transformations to the data?
Or should the transformation class use the sliding window and apply the transformations?

Consider other feature modalities apart from Acc. and RSSI

The current examples only consider accelerometer and RSSI values. However, if the library wants to be general and more useful to other research groups it would be convenient to thinkg about other modalities and see if they fit in the current framework, or how this should be modified.

Some examples of modalities that may be non-trivial to be added are RGB video data, or silouhetes.

bhealth.metric_wrappers.Wrapper with the argument csv_prep generates exception

On every example in the examples folder, if the csv_prep argument when creating a Wrapper is set, the plot_metrics will raise the following exception

Traceback (most recent call last):
  File "synthetic_long_example.py", line 210, in <module>
    figures_dict = plot_metrics(metric_container_daily, date_container_daily, labels_=labels)
  File "../bhealth/visualisations.py", line 96, in plot_metrics
    for key in proportion:
TypeError: 'float' object is not iterable

To avoid the exception I have removed the optional argument from all the examples, but needs to be investigated.

Change all references to NewLib and other names to a single option

We have been using NewLib, bHealth and digihealth as names for the library, modules and package. We need to unify all these references.

Transformations to reshape raw data (or the output of another transormation)

Classification models that consider time domain some times need an additional dimmension on the input. We need to think if this can be added as a transformation.

Should performance metrics be part of the library?

Performance metrics like accuracy, brier score and log-loss. It is not clear if this should be part of this library, as it may be as simple as calling a function from scikit-learn.

Add mathmatical & textual description to documentation

For the documentation it would be nice if we added both a mathematical description of the function (where sensible), as well as a textual description.

Currently the transform functions (e.g. spectral entropy) contain no information as to what the function does.

Add Python TSFEL library support for feature extraction

The TSFEL: Time Series Feature Extraction Library for Python was released recently. This library may incorporate very useful feature extraction methods that we do not have currently.

Is it possible to incorporate the library into some of our feature extraction functions? Maybe as a wrapper to TSFEL, or as examples on how this can be done.

readthedocs of TSFEL

Marília Barandas, Duarte Folgado, Letícia Fernandes, Sara Santos, Mariana Abreu, Patrícia Bota, Hui Liu, Tanja Schultz, Hugo Gamboa, TSFEL: Time Series Feature Extraction Library, SoftwareX, Volume 11, 2020, https://doi.org/10.1016/j.softx.2020.100456.

Issue in accelerometer_example.py

In accelerometer_example.py when using the provided example dataset:

bHealth/examples/accelerometer_example.py

Lines 51 to 58 in 858df1a

    
           while True: 
        
               windowed_raw = transform.slide(X) 
        
               if len(windowed_raw) > 0: 
        
                   try: 
        
                       windowed_features = [ts[transform.current_position][0]] 
        
                   except Exception as e: 
        
                       print(e) 
        
                       break

I get an error of out of bound access. The error might occur in transform.slide(), where "windowed_raw" is first assigned the value and then its "current_position" goes forward:

bHealth/bhealth/transforms.py

Lines 402 to 411 in 858df1a

    
           window = x[int(self.current_position-self.window_length):int(self.current_position)] 
        
           if len(window) > 0: 
        
               if len(window.shape) > 1: 
        
                   window = window[~np.isnan(window).any(axis=1)] 
        
               else: 
        
                   window = window[~np.isnan(window)] 
        
           if update: 
        
               # TODO Check that this does not break anything 
        
               self.current_position += self.step 
        
           return window

UnboundLocalError: local variable 'X_new' referenced before assignment

When running accelerometer_example.py I get the following error.

(base) rmc@gamma:~/NewLib/digihealth$ python examples/accelerometer_example.py
Found 4 house folders.
Found 4 experiment folders.
Running folder:  1
Running folder:  2
Running folder:  3
Running folder:  4
Window size of 10 seconds and overlap of 0.1%
Use number of mean crossings, spectral entropy as features...
index 69100 is out of bounds for axis 0 with size 69091
Traceback (most recent call last):
  File "examples/accelerometer_example.py", line 153, in <module>
    X, y = preprocess_X_y(ts, X, y)
  File "examples/accelerometer_example.py", line 66, in preprocess_X_y
    new_X = transform.feature_selection(new_X, new_y, 'uni')
  File "../digihealth/transforms.py", line 246, in feature_selection
    return X_new
UnboundLocalError: local variable 'X_new' referenced before assignment

It appears that the feature_selection method in transforms.py can only handle two cases, l1 and tree, while in the example file this method is called with the value 'uni'.

Addition of Metrics

We currently do not have any metrics implemented in this library.

Here are a list of metrics we currently use elsewhere:

Room Transfers - Daily average
Duration Outside - Daily average
Times Exited Home - Daily average
Typically Sleeps In - Daily
Sleep Efficiency - Daily average
Sleep Quality - Daily average
Main Sleep Length- Daily average
Total Sleep Length- Daily average
Walking - Hourly average
Sitting - Hourly average
Lying - Hourly average
Compliance (duration of wear)
Number of times bathroom visited during the night
Number of times kitchen visited during the night
Average speed walking - Daily average
Maximum speed of walking - Daily average
Speed of stand up/sit down - Daily average
Number of sit-to-stand transitions - Daily average
Number of times stairs used - Daily average
Speed travelling upstairs - Daily average
Time to go from room down stairs to upstairs - Daily average
Time to go from room upstairs to downstairs - Daily average
Number of times activites undertaken (e.g. cooking / cleaning) - Daily average

Label mappings returns different structures depending on a csv being specified

Both functions label_mappings and label_mappings_localisation return different formats depending on a CSV being specified during the instantiation of a Wrapper. I would opt to return always the same format, and convert it to the required format if necessary while exporting the CSV file, however, I am not familiar with the code and I am not sure which would be the more generic format, and the implications that this may have.

@mkoz71 could you help me with this decision?

Also, this may be related to the issue #13

bHealth/bhealth/metric_wrappers.py

Line 531 in df5dadb

def label_mappings(self, container, is_duration, label_to_extract=None):

Human friendly functions

Currently in the example script we have function names such as ' metr.average_labels_per_window(labs, times)'.

While these abstractions make a lot of sense in the design sense, from a human user of the library perspective, I don't think they are clear.

Can we create 'human friendly' wrapper functions, such as 'get_duration_walking(data, period='daily', level='hours'), for each of the metrics?

Class Transform should perform the sliding window

In the test examples the sliding window is done externally in the test code. This should be incorporated inside the transform class.

Add documentation with Sphinx

It would be great if we could add documentation using the Sphinx framework from the beginning. In that way it would be easier to maintain in the future. Some useful links:

Public Datasets needed

Hi,

I think we need to identify and use some public datasets for activity recognition and localization.

Ideally these datasets would be multi-day datasets, so we can easily calculate daily average metrics from them. Even better if they were multi-week datasets, so we can calculate weekly metrics from them, and so on.

Alternatively we could use a dataset that doesn't meet this criteria, and modify the timestamps to simulate multi-day/multi-week datasets for testing. For example, using https://www.nature.com/articles/sdata2018168

What do you think?

Issue in pre-processing data and labels

When extracting features in the code below:

bHealth/examples/accelerometer_example.py

Lines 51 to 65 in 96ce4b8

    
           while True: 
        
               windowed_raw = transform.slide(X) 
        
               if len(windowed_raw) > 0: 
        
                   try: 
        
                       windowed_features = [ts[transform.current_position][0]] 
        
                   except Exception as e: 
        
                       print(e) 
        
                       break 
        
                   for function in feature_transforms: 
        
                       windowed_features.extend((np.apply_along_axis(function, 0, windowed_raw).tolist())) 
        
                   new_X.append(windowed_features) 
        
                   windowed_raw_labels = transform.slide(y, update=False) 
        
                   most_freq_label = np.bincount(windowed_raw_labels).argmax() 
        
                   new_y.append(most_freq_label)

There seems to be a misalignment between "windowed_raw" (for X) and "windowed_raw_labels" (for y): After applying "transform.slide(X)" in line 52, "transform.current_position" is shifted forward equal to the "stride" value. Then when transform.slide(y) is applied in line 63, it uses the new "transform.current_position" which does not match with where "X" was extracted from.

Calculate 'compliance'

Often in interventions it is useful to know the compliance rate of a patient with regard to the wearable. For example, over a two week period, they were wearing the wearable for 95% of the time.

Further, this compliance value can be calculated daily, and used to exclude days where the compliance value is below a threshold.

Report generation

I think it would be useful if we had report generation functionality. That is, a method of wrapping up all the results (data quality, metrics, visualizations etc.) into a nice HTML or PDF report.

If anyone has any examples it would be great to post them below. Otherwise I can try and sketch out what I have in mind.

Add example with an RNN/LSTM

Add an example using an RNN or LSTM for training, and see if the library needs to be modified.

Unify format of descriptor maps (or not)

The localisation and activity examples use different format of descriptor_map

eg. The localisation uses arrays of integers denoting the labels

bHealth/examples/localisation_example.py

Lines 100 to 105 in 73bd95f

    
           descriptor_map = { 
        
               'foyer' : [0], 
        
               'bedroom' : [1], 
        
               'living_room' : [2], 
        
               'bathroom' : [3] 
        
           }

While the accelerometer example uses integers

bHealth/examples/accelerometer_example.py

Lines 114 to 121 in 73bd95f

    
           descriptor_map = { 
        
                       'sitting' : 77, 
        
                       'walking' : 78, 
        
                       'washing' : 79, 
        
                       'eating'  : 80, 
        
                       'sleeping': 81, 
        
                       'studying': 82 
        
                   }

What is the reasoning behind it, and which method should we choose as standard. A benefit from the list is that it could group labels, for example "upstairs" could be associated to [0, 1, 2], or "sedentary" as well.

I am not sure if there are collateral implications on this modification.

Transformation that accept NaN values

We should consider what to do with datasets that have NaN values, as the current transformations do not accept them.

	while True:
	windowed_raw = transform.slide(X)
	if len(windowed_raw) > 0:
	try:
	windowed_features = [ts[transform.current_position][0]]
	except Exception as e:
	print(e)
	break

	window = x[int(self.current_position-self.window_length):int(self.current_position)]
	if len(window) > 0:
	if len(window.shape) > 1:
	window = window[~np.isnan(window).any(axis=1)]
	else:
	window = window[~np.isnan(window)]
	if update:
	# TODO Check that this does not break anything
	self.current_position += self.step
	return window

	descriptor_map = {
	'foyer' : [0],
	'bedroom' : [1],
	'living_room' : [2],
	'bathroom' : [3]
	}

	descriptor_map = {
	'sitting' : 77,
	'walking' : 78,
	'washing' : 79,
	'eating' : 80,
	'sleeping': 81,
	'studying': 82
	}

rymc / bhealth Goto Github PK

bhealth's People

Contributors

Stargazers

Watchers

Forkers

bhealth's Issues

Recommend Projects

Recommend Topics

Recommend Org