The psyphr from wendtke

Extra materials

What is the best location (in repo or out) for extra materials like the templates and example data from MindWare and the BIOPAC editing steps that one lab shared? Do we need a cloud folder, @iqis ?

Update description file (authors, contributors, funders, acknowledgements)

See #48 and #11

See here

Navigating authorship and contributions (from discussion with GBA)

Amanda, Audrey, rOpenSci peer-reviewers as possible contributors; acknowledge in README and Wiki
Mallory likely as third author
Brooke as either author or contributor -- TBD
NSF GRFP (and other funders) in description section (acknowledgement and disclosure)

"This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 006784-00002 [to KEW]. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation."

unconf as original spot
acknowledgements, attribution, and/or disclaimers for and from supported companies (MW; BIOPAC) and maybe CSU. Ask them what they would like to include! Their legal teams might need to be in the loop, especially IF we use proprietary code from one of the supported companies; how does this influence the license (#48) selection?

Add BSA compatibility

Leftover from #32, because BSA files have unstable format, and need to be treated specially

Follow {tidyverse} principles

In addition to reading

consider (@geanders suggestions):

1. input/output data in same format (allows functions to retain order)
2. common prefix to function name (psyphr_read_wb())
3. check tidy eval book on how to manage column-naming conventions within mutate to allow users to bring in non-MindWare data
4. create unique geom
5. check lubridate or tidyr for examples of maintaining consistency across functions and packages (e.g., verbose = TRUE option within function)
6. create umbrella package with modular packages within it to wrangle raw and output, visualize, analyze (see #52)

Generics for `psyphr_workbook`

For every data format:

print.psyphr_workbook
summary.psyphr_workbook
plot.psyphr_workbook
...

Common Data Quality Expectations

Hi @wendtke , as we've talked on the phone, it seems a valuable proposition to take some of the data QA work into the package.

Could you please make a short list of common expectations, starting with those we've talked about?

For example:

HRV Stats : Respiration Peak Frequency is expected to be within the range of Settings : HF/RSA Frequency Band
HRV Stats : Segment Duration expected to have the consistent value, also same with Settings : Segment Time

Check file before reading

check:

Whether the file is indeed from the specified vendor (MindWare)
Whether the file is indeed the specified type ("HRV")
- auto guess file type?

Visualization Schemes

We would need some examples of commonly applicable visualizations, within a subject or across a study.

Implement GBA code suggestions

@iqis I met with @geanders today. She recommended making the following changes:

1. restructure expected study directory to separate folders (e.g., subject_1/task_1 and task_2 and task_....) rather than file name with subject_task
2. Use importFrom magrittr %>% in roxygen notes to specify pipe
3. Don’t put suppressWarnings() within function; instead, use purrr::quietly or purrr::safely
4. For exported functions, put an example in roxygen notes (deferred, pending better sample data)
5. Define precise parameters (e.g., a character string that gives path to... rather than path only)
6. user-friendly output of print(psyphr_workbook); see #43

Data Profiling Visualizations

Implement quick visualization:

using ggplot2, returning a modifyable ggplot object
easily adjustable parameters
meaningful and pretty defaults on color scheme and other design languages

Sample code from @wendtke at https://drive.google.com/drive/u/0/folders/1fvFlD5CT1ZP2Bgm4eaF9ERtNgg7WxBxN
See also #14

data munging example

brief description of data munging process (from start to end with data examples)

@iqis do you still want/need this? You mentioned it in the phone call.

Compare MW output

compare output data across all MindWare applications; create example data from demo software

Can transform_* family be used for all data types/applications (e.g., HRV, EDA)?

stats (sheet 1)
editing stats (sheet N)
settings (always last sheet)

Issues #3, #4

TIMELINE: rOpenSci > CRAN > journal

See #15 for rOpenSci info
See #45 for author discussion

Proposed dissemination timeline

rOpenSci: data munging

scope presubmission

CRAN

Commitment of maintainer (one person; can transfer responsibility) depends on package (complexity, reach, and versions)
1-2 emails/month about bugs or feature requests
Biggest thing: Keep up with tidyverse and system application versions

Journal: Compare JOSS vs. The R Journal vs. content area journal (e.g., Psychophysiology)

Study Scheme: Rename/restructure files

add a function to rename/restructure a collection of files to a specified naming convention that is used in psyphr.

Helper function to access sample data

For exported functions, put examples in roxygen notes

deferred pending more stable API and better sample data

Originally posted by @iqis in #55 (comment)

We want psyphr to work on a normal laptop, which nowadays has somewhere between 4-12G's of usable memory, and R normally should not use more than half of the total memory. Currently read_study() reads everything all at once. A really big study can create a problem.

If the problem exists, there are at least two ways to mitigate the problem:

Construct a promise in lieu of reading in the data; the data is read from disk as needed.
Read the study and cache the resulting R object onto disk incrementally.

What is a likely the total size of a study? I'm looking for a figure at about the 80th percentile, and I surely hope it will be small enough.

Best way to rename/recreate repo

@iqis @ajmcoqui @almccombs

I would like to change the repo/package name to psyphyr. Do you think the best approach to this would be to (eventually) recreate a new repo, transfer the content and collaborators, and delete psyr?

Any suggestions would be helpful. This is not a vital change at the moment, but I figured changing the name earlier (before making the repo public or trying to publish the package) would be better.

consolidate read_MW_*() family functions

read_MW() ->
validate data format ->
dispatch corresponding parsing function

Automatically detect and parse workbook format, using:

Unique names of the worksheets
Unique fields in the Setting worksheet.

README Page

create a README page with:

Package name, Badges

Brief Introduction

Installation instructions

Minimal working code example (pick one use case)

TODO

License

Supporting BIOPAC output

Follow up with BIOPAC reps (Tim Cook) about possibility of supporting BIOPAC output

Survey questions

Read Issue #19 first for more ideas.

Some ideas for questions. There are a lot, so we will probably have to cut some.

What kind of data collection system do you use? [multiple choice + other?]
If you use MindWare, which analysis applications do you use? [MC]
Which physiological measures do you include in your research? [MC?]
Please describe your study design(s) in terms of number of participants and laboratory procedure sequence (e.g., 200 healthy adults; physiological baseline period of X minutes, challenge period of Y minutes, and recovery period of Z minutes).
What is the length of your segments/bins/epochs? [MC - 30s, 1m, etc.)
What is your file directory structure for data editing and compilation?
What are your file naming conventions (e.g., subjectID_task)?
What is your desired output data structure?
Do you employ data filters based on empirical guidelines? If so, which ones? [MC + open]
-- Segment length 30s+ for valid RSA [HRV]
-- Drop segments >10% of estimated R-peaks [HRV]
-- Exclude segments outside expected range of respiratory peak frequency [HRV]
What are some common visualization and exploratory data analysis techniques do you employ? (e.g., time series of RSA; average RSA per time period - baseline, challenge, recovery)

Add MW workbook format name as an attribute in workbook object

psyphr Discord channel

Discord is a very popular free software for group audio chat, available on desktop and mobile. Let's give it a try!

The following link is to our channel on Discord:
https://discord.gg/swvHChq

add HRV compatibility

MW BioLab Epoch File?

An Epoch File contains the metadata of a subject's activity period; manually tagged? How to integrate with measurement data?

incorporate other MW formats

Workbook formats:

BPV
EMG
Startle EMG : @wendtke Are you familiar with this type? In the sample data there is no information on "Right Eye". Is this expected?
IMP
BSA: Unstable format, need a closer look

Evaluate BIDS Schema

Tom Johnston on Twitter suggested "BIDS", Brain Imaging Data Structure here.

Promotes creation of portable, open analysis pipelines & software.
spec on psysiological data

Goals:

find useful fields for psychophysiology in BIDS schema,
explore higher level compatibility

Python API:

Validation Tool:

BIDS Validator

Cast data into correct type

Currently all data are read in verbatim as "character".
Make a parse_MW_() function family to address all kinds, then call from read_MW_() family

~~Use dplyr::mutate_*() family.~~
Keep categorical variables as "character" or press into "factor"? @wendtke This also begs another question, what are the possible levels of a factor? e.g. SCR Type in SCR Stats from EDA databook.

Submission to rOpenSci?

Is submitting to rOpenSci a good idea? The platform has hosted many scientific packages through a rigorous peer-review process, example here. The dev guide is very thorough and helpful.

psyphr can fall into the data munging category.

`assertthat` for program-level evaluation

User assertthat for condition checking in functions, replacing if() or stopifnot()

Resources for Git, GitHub, pkg dvlpt, etc.

R packages

Happy Git

tidyverse style

Review survey responses & reactions

See #31 for original formulation of survey.

@MalloryJfeldman and @wendtke shared survey via email (departments; colleagues) and Twitter.

Discuss findings from survey responses
Follow up as needed with businesses (MindWare, BIOPAC, etc.) and recommended organizations (e.g., Brain Imaging Data Structure standards; see GitHub repo)

MindWare info/input

I have a video chat with a MW representative on Tuesday, June 4. I had to reschedule from a few weeks back.

I will ask about

MindWare's file naming conventions for output data
Best approach for end-user to set study schema (number of subjects, tasks, and files)
MindWare's structure for the other output data files per analysis application (i.e., are all of them structured similarly to EDA and HRV? can we have a sample of each output type for our package development and testing?)
quality control criteria for Electrodermal Activity (I asked them this in the past, and they were not that helpful. I am re-reading the EDA chapter from the Handbook of Psychophysiology.)
common visualization needs for end-user (I have gotten some insight on this from my recent analyses of respiratory sinus arrhythmia for a poster.)

@iqis Do you have any other questions for MindWare?

(Suspended) Add study/subject/activity information to workbook objs

Per Discord conversation @iqis @wendtke 20190625:

A workbooks generally has three ID dimensions:

What subject(participant)...
doing what activity....
in what study.
... and may potentially be more.

This information shall be inferred from folder/name structure. See: #21.
This information is key to downstream analysis.

Issue Suspension:

Already possible to identify workbooks through read_study(), with mechanism proposed here. is it necessary to repeat on individual workbooks?

When parsing workbook, check for "Interval Stats" sheet

The "Interval Stats" sheet is an optional sheet appearing in BPV, HRV and EDA.

Design and Implement a _Study_ Object

... composed of many psyphr_workbook objects, with subject/activity identification inferred from file structure of workbooks (See #21).

Able to:

Generate high-level summaries/visualization across subjects/activities
Output all data in desired format to the file system
- save_study()
~~Helper function to saveRDS() (?)~~

S3 object, class name: psyphr_study

generics:

print.psyphr_study()
summary.psyphr_study()
...

~~Use a control file in YAML or DCF.~~

Collect dummy data for use in munging/viz/analysis examples, vignettes, tutorials

Create example data (or ask @MalloryJfeldman for samples) for end-user practice. We can check readxl repo for example data placement within package. For example,

/data could house clean dummy data in .Rdata format
/inst/extdata subfolder could contain raw data for practice manipulation

@geanders referred to a function to give path name for user to pull data; I assume it is file.path()?

Data Profile Report Template

Quickly generate high-level overview of the data in PDF form. On individual workbook and study levels.
Use RMarkdown

Mechanism to drop (HRV) segments shorter than 30 seconds

See: #19

Downstream analyses: Common approaches and use cases

Right now as I'm trying to figure out the best approach, I need to know some common characteristics in downstream analyses. Some detailed use cases will help. For example, what are some frequently used statistical models? Are modeling usually done for each and every subject, or across some kind of summation of a group?

Originally posted by @iqis in #58 (comment)

Can transform_editing_sheet be used for both HRV and EDA?

Similar to #3, will transform_editing_sheet work for both HRV and EDA? Currently, one file exists.

Consider existing resources

Look at this and other related resources for ideas -- of what NOT to do. This kind of package is to clean raw heart rate variability data, not to wrangle existing data.

psyphr is different than what this project offers.

Background on MindWare Technologies

MindWare Technologies, Ohio sells 6 analysis applications that provide output data we are interested in wrangling. These include Basic Signal Analysis (BSA), Blood Pressure Variability (BPV) Analysis, Electrodermal Activity (EDA) Analysis, Electromyography (EMG) Analysis, Heart Rate Variability (HRV) Analysis, and Impedance Cardiography (IMP) Analysis. So far, I am only familiar with EDA and HRV. Eventually, I would like psyphr to wrangle data from all MindWare analysis applications and then move to add options for data from BIOPAC Research Solutions.

Here is some more information on EDA and HRV.
EDA Analysis 3.2 Manual
HRV Analysis 3.2 Manual

Aside: BioLab is the data acquisition software, which provides the raw data files for the analysis applications. The analysis applications then export the edited output data for compilation, analysis, and visualization.

Interesting tidbit: Years ago, MindWare had its own proprietary study compilation tool for use across analysis applications. They do not offer it to clients anymore, but maybe there is content in the manual that might inform our approach in managing the file naming problem or other things. It looks they required users to enter subject ID, etc.

License

Update #11 and #45 with final decision

End-user license (MIT vs. GPL)
"Worst" case scenario: company takes psyphr, puts GUI on it, and sells it (could be with or without attribution). Let's talk through the scenario with each license and consider if we are comfortable with which/either outcome.
Can we change license after making repo public or submitting/publishing?
GBA 20190714 suggested not to do so:

Yes, I think you should be able to change licenses down the line, with all coauthors’ agreement. I’d try not to too often, though—if people ever use it within other things they make, I think a change from MIT to GPL might affect what they can do (if they’re creating under a license that isn’t open source). The general consideration, when you are maintaining a package, is to try to limit the changes you make that could break a lot of things “downstream” for people who might be building off your package. This, of course, is only a big issue when the package has a lot of users, which isn’t the case for plenty of packages (although I think yours could get a lot of downstream development, where people are using your package as a dependency in their own package). But it’s not the end of the world if you change your license later, I think.

Resources from GBA
R Packages
Understanding Open Source and Free Software Licensing

Set up continuous deployment (integration + delivery)

Continuous integration is a service that automatically checks error in your code each time a new commit is pushed to GItHub. A badge can be displayed on whether the build passes the test. At current stage, it is most likely that our code with fail the CI's stringent standards. But don't be discouraged.

Before we can implement free CI service, our repo needs to be open.

Set up:

TravisCI
AppVeyor

Hi!

Introducing myself again! My name is Siqi Zhang, I've been using R since 2012. , and am a freelance R developer. Rather than using the language for analysis, my edge is sharper on the language itself. It is my pleasure to meet to be onboard this open source project.

I'm excited that you've already made very substantial progress. I think when we're ready to take it further, we should exchange opinion on each other's thoughts and situations. Hit me up at [email protected].

In the mean time, I'm going to branch it off and start poking around. Looking forward to hearing from you!

Can transform_*_stats be used for both HRV and EDA?

Or are separate functions needed? Right now, they are separate (transform_eda_stats.R and transform_hrv_stats.R) but contain the same content.

wendtke / psyphr Goto Github PK

psyphr's People

Contributors

Stargazers

Watchers

psyphr's Issues

Package name, Badges

Brief Introduction

Installation instructions

Minimal working code example (pick one use case)

TODO

License

Recommend Projects

Recommend Topics

Recommend Org