rymc / bhealth Goto Github PK
View Code? Open in Web Editor NEWThis library is designed to be used with accelerometer and RSS data sets, collected as part of longitudinal studies in Digital Health
This library is designed to be used with accelerometer and RSS data sets, collected as part of longitudinal studies in Digital Health
A nice feature of the library would be a method to detect, and remove, periods of non-wear for wearable devices.
How I have done it in the past with the EurValve wearable is:
"Periods of time where the patient is not wearing the wearable are excluded by measuring the variance in arm angle changes over 20 minute blocks of time. If the variance in a block is less than 1×10−7, then that block of time is excluded from analysis."
Though we may want to try other methods..
Running localisation_example.py results in the following error.
Traceback (most recent call last):
File "examples/localisation_example.py", line 149, in <module>
clf_grid.fit(X_train, y_train)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 722, in fit
self._run_search(evaluate_candidates)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 1191, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 711, in evaluate_candidates
cv.split(X, y, groups)))
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 917, in __call__
if self.dispatch_one_batch(iterator):
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 549, in __init__
self.results = batch()
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
for func, args, kwargs in self.items]
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
for func, args, kwargs in self.items]
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 528, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 265, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 230, in _fit
**fit_params_steps[name])
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py", line 342, in __call__
return self.func(*args, **kwargs)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/pipeline.py", line 614, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/base.py", line 467, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 223, in fit
X = self._validate_input(X)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 197, in _validate_input
raise ve
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/impute.py", line 190, in _validate_input
force_all_finite=force_all_finite, copy=self.copy)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/sklearn/utils/validation.py", line 527, in check_array
array = np.asarray(array, dtype=dtype, order=order)
File "/home/rmc/anaconda2/lib/python3.6/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
The examples should not be part of the library, but be use cases using the library. A good example to follow would be the Scikit-learn folder for example, which has its own Sphinx-Gallery.
If the transform class is only a set of functions:
The current examples only consider accelerometer and RSSI values. However, if the library wants to be general and more useful to other research groups it would be convenient to thinkg about other modalities and see if they fit in the current framework, or how this should be modified.
Some examples of modalities that may be non-trivial to be added are RGB video data, or silouhetes.
On every example in the examples folder, if the csv_prep argument when creating a Wrapper is set, the plot_metrics will raise the following exception
Traceback (most recent call last):
File "synthetic_long_example.py", line 210, in <module>
figures_dict = plot_metrics(metric_container_daily, date_container_daily, labels_=labels)
File "../bhealth/visualisations.py", line 96, in plot_metrics
for key in proportion:
TypeError: 'float' object is not iterable
To avoid the exception I have removed the optional argument from all the examples, but needs to be investigated.
We have been using NewLib, bHealth and digihealth as names for the library, modules and package. We need to unify all these references.
Classification models that consider time domain some times need an additional dimmension on the input. We need to think if this can be added as a transformation.
Performance metrics like accuracy, brier score and log-loss. It is not clear if this should be part of this library, as it may be as simple as calling a function from scikit-learn.
For the documentation it would be nice if we added both a mathematical description of the function (where sensible), as well as a textual description.
Currently the transform functions (e.g. spectral entropy) contain no information as to what the function does.
The TSFEL: Time Series Feature Extraction Library for Python was released recently. This library may incorporate very useful feature extraction methods that we do not have currently.
Is it possible to incorporate the library into some of our feature extraction functions? Maybe as a wrapper to TSFEL, or as examples on how this can be done.
Marília Barandas, Duarte Folgado, Letícia Fernandes, Sara Santos, Mariana Abreu, Patrícia Bota, Hui Liu, Tanja Schultz, Hugo Gamboa, TSFEL: Time Series Feature Extraction Library, SoftwareX, Volume 11, 2020, https://doi.org/10.1016/j.softx.2020.100456.
In accelerometer_example.py when using the provided example dataset:
bHealth/examples/accelerometer_example.py
Lines 51 to 58 in 858df1a
I get an error of out of bound access. The error might occur in transform.slide(), where "windowed_raw" is first assigned the value and then its "current_position" goes forward:
Lines 402 to 411 in 858df1a
When running accelerometer_example.py I get the following error.
(base) rmc@gamma:~/NewLib/digihealth$ python examples/accelerometer_example.py
Found 4 house folders.
Found 4 experiment folders.
Running folder: 1
Running folder: 2
Running folder: 3
Running folder: 4
Window size of 10 seconds and overlap of 0.1%
Use number of mean crossings, spectral entropy as features...
index 69100 is out of bounds for axis 0 with size 69091
Traceback (most recent call last):
File "examples/accelerometer_example.py", line 153, in <module>
X, y = preprocess_X_y(ts, X, y)
File "examples/accelerometer_example.py", line 66, in preprocess_X_y
new_X = transform.feature_selection(new_X, new_y, 'uni')
File "../digihealth/transforms.py", line 246, in feature_selection
return X_new
UnboundLocalError: local variable 'X_new' referenced before assignment
It appears that the feature_selection method in transforms.py can only handle two cases, l1 and tree, while in the example file this method is called with the value 'uni'.
We currently do not have any metrics implemented in this library.
Here are a list of metrics we currently use elsewhere:
Room Transfers - Daily average
Duration Outside - Daily average
Times Exited Home - Daily average
Typically Sleeps In - Daily
Sleep Efficiency - Daily average
Sleep Quality - Daily average
Main Sleep Length- Daily average
Total Sleep Length- Daily average
Walking - Hourly average
Sitting - Hourly average
Lying - Hourly average
Compliance (duration of wear)
Number of times bathroom visited during the night
Number of times kitchen visited during the night
Average speed walking - Daily average
Maximum speed of walking - Daily average
Speed of stand up/sit down - Daily average
Number of sit-to-stand transitions - Daily average
Number of times stairs used - Daily average
Speed travelling upstairs - Daily average
Time to go from room down stairs to upstairs - Daily average
Time to go from room upstairs to downstairs - Daily average
Number of times activites undertaken (e.g. cooking / cleaning) - Daily average
Both functions label_mappings and label_mappings_localisation return different formats depending on a CSV being specified during the instantiation of a Wrapper. I would opt to return always the same format, and convert it to the required format if necessary while exporting the CSV file, however, I am not familiar with the code and I am not sure which would be the more generic format, and the implications that this may have.
@mkoz71 could you help me with this decision?
Also, this may be related to the issue #13
bHealth/bhealth/metric_wrappers.py
Line 531 in df5dadb
Currently in the example script we have function names such as ' metr.average_labels_per_window(labs, times)'.
While these abstractions make a lot of sense in the design sense, from a human user of the library perspective, I don't think they are clear.
Can we create 'human friendly' wrapper functions, such as 'get_duration_walking(data, period='daily', level='hours'), for each of the metrics?
In the test examples the sliding window is done externally in the test code. This should be incorporated inside the transform class.
It would be great if we could add documentation using the Sphinx framework from the beginning. In that way it would be easier to maintain in the future. Some useful links:
Hi,
I think we need to identify and use some public datasets for activity recognition and localization.
Ideally these datasets would be multi-day datasets, so we can easily calculate daily average metrics from them. Even better if they were multi-week datasets, so we can calculate weekly metrics from them, and so on.
Alternatively we could use a dataset that doesn't meet this criteria, and modify the timestamps to simulate multi-day/multi-week datasets for testing. For example, using https://www.nature.com/articles/sdata2018168
What do you think?
When extracting features in the code below:
bHealth/examples/accelerometer_example.py
Lines 51 to 65 in 96ce4b8
There seems to be a misalignment between "windowed_raw" (for X) and "windowed_raw_labels" (for y): After applying "transform.slide(X)" in line 52, "transform.current_position" is shifted forward equal to the "stride" value. Then when transform.slide(y) is applied in line 63, it uses the new "transform.current_position" which does not match with where "X" was extracted from.
Often in interventions it is useful to know the compliance rate of a patient with regard to the wearable. For example, over a two week period, they were wearing the wearable for 95% of the time.
Further, this compliance value can be calculated daily, and used to exclude days where the compliance value is below a threshold.
I think it would be useful if we had report generation functionality. That is, a method of wrapping up all the results (data quality, metrics, visualizations etc.) into a nice HTML or PDF report.
If anyone has any examples it would be great to post them below. Otherwise I can try and sketch out what I have in mind.
Add an example using an RNN or LSTM for training, and see if the library needs to be modified.
The localisation and activity examples use different format of descriptor_map
eg. The localisation uses arrays of integers denoting the labels
bHealth/examples/localisation_example.py
Lines 100 to 105 in 73bd95f
While the accelerometer example uses integers
bHealth/examples/accelerometer_example.py
Lines 114 to 121 in 73bd95f
What is the reasoning behind it, and which method should we choose as standard. A benefit from the list is that it could group labels, for example "upstairs" could be associated to [0, 1, 2], or "sedentary" as well.
I am not sure if there are collateral implications on this modification.
We should consider what to do with datasets that have NaN values, as the current transformations do not accept them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.