Coder Social home page Coder Social logo

erikbjare / thesis Goto Github PK

View Code? Open in Web Editor NEW
66.0 5.0 4.0 133.48 MB

MSc thesis on: Classifying brain activity using EEG and automated time tracking of computer use (using ActivityWatch)

Home Page: https://erik.bjareholt.com/thesis/

Makefile 1.83% Python 54.90% Jupyter Notebook 43.27%
openbci activitywatch thesis muse msc-thesis eeg research neurosity machine-learning

thesis's Introduction

MSc Thesis

GitHub Actions badge Code coverage Typechecking: Mypy Code style: black

My MSc thesis on "Classifying brain activity using electroencephalography and automated time tracking of computer use".

Progress was tracked using GitHub issues and the GitHub Projects board.

Abstract

We investigate the ability of EEG to distinguish between different activities users engage in on their devices, building on previous research which showed a considerable difference in brain activity between code- and prose-comprehension, as well as differences during code- and prose-synthesis. We perform a replication study and improve upon past results using state-of-the-art machine learning classifiers based on Riemannian geometry.

Furthermore, we extend the scope of previous work by introducing the automated time tracking application ActivityWatch, to track the device activities that the user is engaging in. This lets us label EEG data with naturalistic device activity, which we then use to train classifiers to discern activities such as code writing vs prose writing, or work vs media consumption. Our results indicate that a consumer-grade EEG device can discern between different activities that a user performs at the computer. Among other results, we show that not only can code and prose comprehension be distinguished, but also code and prose writing.

Writing

The latest version of the writing can be downloaded at:

Usage

Setting it up:

  • Ensure you have Python 3.7+ and poetry installed
  • Install dependencies with poetry install

Collecting data:

  • Run eegwatch --help for instructions on how to collect EEG data
  • Run ActivityWatch to collect device activity data
  • Run the codeprose task in eeg-notebooks to collect data for the code vs prose task
    • Install eeg-notebooks with pip install git+
    • Run the codeprose task with eegnb runexp -ex visual-codeprose -subject X

Running classifier:

  • Run ./scripts/query_aw.py to collect labels from the running ActivityWatch instance
    • You probably want to adjust the categorization rules embedded in the file
  • (TODO) Run eegclassify --help for instructions on how to train and run the classifier

Devices

I've worked with multiple devices, but the experiments were performed using the Muse S, which is therefore the best-supported device.

  • Muse S
    • PPG support (experimental)
  • Neurosity Notion 1 & 2
  • Neurosity Crown
  • OpenBCI Cyton
  • In theory: any device supported by Brainflow or muse-lsl

Notebooks

Code notebooks are built in CI and available at:

  • Main - primary notebook for the thesis, where we train a classifier for the code vs prose comprehension task.
  • Signal - for signal filtering and quality checking.
  • Activity - for classification of device activities.
  • PPG - for a basic PPG analysis.

Acknowledgements

See the Acknowledgements section in the thesis.

thesis's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

thesis's Issues

Investigate classification tasks/datasets

We'll investigate previous approaches to classify EEG data that is similar to our task at hand.

See also #17.

Classifying tasks

Likely the most similar type classification.

Synchronized Brainwave Dataset (2015)

Dataset on Kaggle: https://www.kaggle.com/berkeley-biosense/synchronized-brainwave-dataset

Stimuli:

They both follow the same process:

  • Blinking
  • Relax (closed eyes, focus on breathing)
  • Arithmetic
  • Relax with music
  • Video clip
  • Come up with examples from category
  • Count squares of chosen color

A popular subset of the stimuli is relaxation vs math:

Reading prose vs code (Fucci et al)

No publicly available dataset. Ask Fucci?

Classifying sleep stages

Shares some similarities (long recordings, "organic" data).

See the excellent YASA: https://github.com/raphaelvallat/yasa

Classifying emotion

Might be somewhat similar. Often uses 1min clips of happy/sad movie scenes as stimuli.

Sometimes split into arousal/valence.

Classifying mental states (focus etc)

Classifying things like focus is sometimes considered a simpler task where acceptable classification can be achieved with a simple power band ratio.

EEG data for Mental Attention State Detection (focused, unfocused, drowsy)

Dataset on Kaggle (MATLAB files): https://www.kaggle.com/inancigdem/eeg-data-for-mental-attention-state-detection

Implement code vs prose comprehension task

Issue in eeg-notebooks: NeuroTechX/EEG-ExPy#70

Collect PPG data from Muse S

  • Basic support
  • Test resilience during longer recordings
    • There seems to be some issues, see #11.
  • Convert PPG data to actionable features (such as HR, HRV).
    • Unclear how to do this from the PPG1, PPG2, PPG3 columns in CSV.

Phase 1: Pilot study

Tasks

GQM

TODO (fetch from goaldoc)

Publicize work

Places to publicize thesis once done:

  • Personal Twitter, Facebook, LinkedIn
  • Any relevant subreddits?
  • OpenBCI channels (their LinkedIn content gets decent engagement)
  • NeuroTechX meetups/hacknights
  • Conferences (Markus might know)
  • Kaggle? (if parts of dataset can be made public)
  • Journals
    • Journal of Open Source Software: https://joss.theoj.org/
      • Probably a good fit for ActivityWatch. Should probably consider publishing a paper on it there, eventually.

Phase 3: Analysis

Tasks

  • Split phase into issues/tasks
  • Clean/align data
  • Set up a pipeline using MNE-Python and scikit-learn
  • Try different classifiers (one-vs-all and multiclass)
  • Implement classifier for codeprose (#25)
    • More or less done
  • More?
    • Get feedback from someone who knows their eeg

GQM

TODO (fetch from goaldoc)

Metrics

  • Confusion matrix (which activities are hard to classify/discern?)

Make better use of MNE

I found some tricky issues and found that MNE has tools for just the thing.

After browsing the documentation a bit again, it seems like I've duplicated a lot of work by not using MNE when I probably should.

A partial rewrite is in order.

  • See if MNE is useful for our arbitrary-length epochs and window-sliding approach.
  • Complete restructure of dataset into BIDS (#10)

Muse data is frequently -1000 for TP9 and TP10

Not sure what's up with that, or how to deal with it.

From looking at the raw data, it looks like it's -1000 exactly every 5th row. Sometimes there are 2 in a row, and then it repeats every 5th row again.

Edit: Maybe this is just powerline noise? At the sampling freq of 250Hz the powerline peak would happen roughly every 4-5th sample. Why are TP9 and TP10 so much more sensitive though?

Example:

1603711387.314,-1000.000,-44.434,-38.574,-1000.000,0.000
1603711387.318,-609.375,-29.297,-27.344,-574.707,0.000
1603711387.322,787.109,-19.531,-23.926,814.941,0.000
1603711387.325,-852.051,-27.832,-22.461,-858.887,0.000
1603711387.329,184.082,-37.598,-23.926,189.941,0.000
1603711387.333,-1000.000,-45.410,-39.062,-1000.000,0.000
1603711387.337,-836.914,-34.668,-30.762,-804.688,0.000
1603711387.341,519.043,-18.555,-11.719,561.523,0.000
1603711387.345,-801.758,-18.555,-20.508,-808.105,0.000
1603711387.349,150.391,-23.438,-27.832,155.762,0.000
1603711387.353,-1000.000,-29.785,-26.367,-1000.000,0.000
1603711387.357,-1000.000,-30.762,-27.832,-1000.000,0.000
1603711387.361,178.711,-21.484,-20.508,231.934,0.000
1603711387.365,-764.648,-22.949,-21.484,-768.555,0.000
1603711387.368,222.168,-33.203,-31.250,198.242,0.000
1603711387.372,-1000.000,-39.551,-32.715,-1000.000,0.000
1603711387.376,-1000.000,-33.203,-31.250,-1000.000,0.000
1603711387.380,-36.133,-21.484,-26.367,-6.836,0.000
1603711387.384,-789.062,-16.113,-27.832,-781.738,0.000
1603711387.388,409.668,-18.066,-33.691,378.906,0.000
1603711387.392,-909.180,-28.320,-34.180,-925.293,0.000
1603711387.396,-1000.000,-26.855,-33.203,-1000.000,0.000
1603711387.400,-213.867,-16.113,-29.785,-186.523,0.000
1603711387.404,-873.535,-15.137,-23.438,-854.492,0.000
1603711387.407,650.879,-21.484,-28.320,603.516,0.000
1603711387.411,-738.281,-66.406,-38.086,-779.785,0.000
1603711387.415,-1000.000,-78.613,-34.180,-1000.000,0.000
1603711387.419,-204.102,-25.879,-19.531,-210.449,0.000
1603711387.423,-943.848,-7.812,-18.555,-930.664,0.000
1603711387.427,878.418,-9.277,-25.879,835.938,0.000
1603711387.431,-439.453,-28.320,-37.109,-500.977,0.000
1603711387.435,-1000.000,-36.621,-27.344,-1000.000,0.000
1603711387.439,-177.734,-18.066,-20.996,-181.641,0.000
1603711387.443,-991.699,-17.578,-37.598,-982.910,0.000
1603711387.447,-974.121,-29.297,-38.574,-996.094,0.000
1603711387.450,-139.160,-31.738,-42.969,-194.824,0.000
1603711387.454,-1000.000,-18.066,-48.340,-1000.000,0.000
1603711387.458,-303.223,-12.695,-33.691,-268.066,0.000

Phase 2: Controlled multi-subject study

Tasks

  • Split phase into issues/tasks
  • Design experiment
    • Which activities?
    • Electrode placement?
    • Duration?
  • Enlist volunteers
  • Collect data (#27)

GQM

TODO (fetch from goaldoc)

Fulfill requirements for goal document

The process for CS students: http://cs.lth.se/examensarbete/hur-gaar-det-till/
General CS dep resource: http://cs.lth.se/examensarbete/
General LTH resource: http://www.student.lth.se/studieinformation/examensarbete/examensarbetsprocessen/

  • Arbetstitel, inblandades namn och kontaktuppgifter samt preliminärt start- och slutdatum.
  • Bakgrund/kontext och motiv för examensarbetet.
  • Övergripande mål och problemställningar/forskningsfrågor.
  • Angreppssätt/metodik och metoder.
  • Vetenskaplig grund och beprövad erfarenhet som examensarbetet ska bygga vidare på. Detta kan t ex beskrivas i form av ett par nyckelreferenser till artiklar eller annat underlag.
  • Hur förväntas examensarbetet bidra till kunskapsutvecklingen?
  • Preliminär beskrivning av resurser som krävs för arbetets genomförande, t ex arbetsplats och utrustning, och hur dessa ordnas och finns tillgängliga.

Switch to using BIDS as primary data structure

BIDS spec & common principles: https://bids-specification.readthedocs.io/en/stable/02-common-principles.html

Might differ for different devices. Only Muse S has been tested so far.

PM

Hello Erik

Sorry to abuse your repo, but I've emailed you on 10/dec ("Muse pipeline wrapper"), but didn't get any reply. maybe it hit your spam folder? I'd be happy to get a ping back, to know if you're in to it.

Thanks, and again - sorry for this repo-lution.
Oori

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.