This annotation process is part of a longitudinal study of elderly subjects over 10 years to study the evolution of Alzheimer's disease and how it relates to speech and brain characteristics. Some of the participants will be completely healthy, while others will have a diagnosis of an early-stage neurodegenerative disease. The language spoken by the participants is Dutch.
Spontaneous speech data is already being collected through interviews. This data must be accurately transcribed in order to perform further Natural Language Processing (NLP) analysis.
This tool uses the power of Automatic Speech Recognition (ASR) to greatly decrease the amount of time needed to transcribe audio files. The tool also uses a streamlined process to keep your annotations organized.
Video:
praat_correct_annotation.mp4
Highlights:
- Automatic data loading: unannotated files are automatically downloaded from the
server and placed in your working directory (
data/01_annotate_me
). - Data segmentation: instead of working with the entire audio file, this tool will split the interview into segments that are roughly 15 seconds long. This way each segment is manageable, and there's never a risk of losing too much progress (e.g. if you forget to save or your computer crashes).
- Transcript prediction: instead of having you, the annotator, indicate the start/stop times and transcribe each word manually, we are using Automatic Speech Recognition (ASR) to speed up the process. An ASR model will be used to automatically generate a transcript and it will be your job to fix any mistakes.
You should download the repository by running one of the following commands in a terminal (Linux) or in Git Bash (Windows).
git clone https://github.com/btamm12/fpack_webapp_client.git
Please follow the appropriate guide based on your operating system.
It is highly recommended to read the documentation before starting the annotations.
There is also more specific details available in the data/
folder, where most of
your time will be spent.
Before this application can do anything, it needs a certain file, called the "subject mapping". This file provides an extra layer of security to protect the data of the study participants.
To get access to this file, you must send me an email. Please mention "subject mapping" in the email subject.
When I send you the file subject_mapping.txt
, you must place it in the root folder
of the repository, i.e.
fpack_webapp_client/subject_mapping.txt
You should coordinate with your fellow annotators about who will annotate which data. Otherwise you might both annotate the same interviews, which would not be a great use of time!
You must set your name and select some interview sections before the tool will start working. See the collaboration folder for more details.
To run the tool, you must run the following command in a terminal (Linux) or in Git Bash (Windows).
make app