Comments (15)
Thanks for your reply, but not exactly what i'm asking, sorry..
Here i try to describe it with figure:
Let's say raw data like this, 6 channels, 2 events
We load and reshape it, get X, like the x in train_a.py line 46
In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar.
If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set.
As the dataset growing larger, it will happend for certain.
In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample.
Because for application, new input data will be more like red sample, and there is none of its channels in train_set.
My question is how your work make sure of that.
Thank you very much!
from mi-eeg-1d-cnn.
Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected]
Thanks!
from mi-eeg-1d-cnn.
Thanks all for the useful discussion. This is exactly why I'm insisting on publishing the codes that correspond to any articles I publish. Unfortunately, making mistakes in programming will happen sometimes and there's no way around it, no matter how hard you work to check your code before submission.
from mi-eeg-1d-cnn.
I'm more than sorry to hear that.
It's still a very remarkable and enlightening work despite the accuracy.
Thanks!
from mi-eeg-1d-cnn.
Hi everyone, sorry I am responding late. I integrated the fix into the main branch; find the new generator and loader in the fix folder. I also updated the readme and alerted the Journal.
I explain below the problem that @zewail-liu (whom I thank again) pointed out.
Take a single trial, the subject thinks about moving the right fist for 4 seconds, and 64 channels record the brain activity. Of these 64 channels, we take only 4 for this example: C3, C4, CP3 and CP4. The idea described in the paper is to divide this single instance into two, one consisting of C3 and C4 and one consisting of CP3 and CP4. So, now we have two arrays: one that has a size of (2, 640) composed of C3 and C4 and one that has the same size (2, 640) but is composed of the channels CP3 and CP4. The label corresponding to these two examples is the same: imagined movement of the right fist. Ideally, these two examples should both go in the same dataset; either both in the training dataset or both in the test dataset. What happens instead is that they go one on the train and one on the test. The image below clarifies this example.
from mi-eeg-1d-cnn.
Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected]
Thanks!
Hi, I just started a PhD thesis about EEG analysis and BCI so I am currently looking for papers to reproduce / compare my results with
I am quite sorry that the results of this paper were obtained with a bug in the splitting procedure because they were quite impressive and I like the 1D-CNN architecture...
As you have mentionned @leoeooe , many papers seem to claim this kind of (too) high accuracy (this one also ) but without sharing their code avoiding verification...
I am currently trying to reproduce the CSP with filterbank and time window data-processing as it seems to give really impressive results (if they are correct).
Nevertheless, I have found another paper using SVM for BCI-IV 2a with the CSP + FB + Time Windows and it achieves around 70-80% accuracy (in average) and the code seems consistant (I will share the link when I find it back) but it is not 90% yet and some subjects have poor accuracy (around 50%) so claiming >90% for all seems really surprising...
Have you achieved to reproduce some of these results since your last comment ?
The next week I will try to integrate this kind of processing, as well as the STFT proposed in this paper, with 1D-CNN and see whether it seems consistant to achieve that high accuracy on both the physionet and BCI-IV 2a datasets
from mi-eeg-1d-cnn.
Here are some papers that may have the same bug: "Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0 "DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15 According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily. This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected] Thanks!
Thanks for your sharing infomation. I am reproducing the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data, which claimed the accuracy up to 90% im MI. Have you ever taka a look at this paper? Thanks again!
from mi-eeg-1d-cnn.
here is the function
import mne
import matplotlib.pyplot as plt
from sklearn import preprocessing
import numpy as np
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split
"""
Physionet MI-EEG Dataset
64 channels EEG,160hz freq, 4 seconds MI-task
14 runs for each of the 109 subjects
runs [1, 2] is baseline
others with marker
T0 : rest,
T1/T2: left/right fist in runs [3, 4, 7, 8, 11, 12]
both fists/feet in runs [5, 6, 9, 10, 13, 14]
"""
data_path = r'D:\00-data\PhysioNet\ori\S001\\'
LR_fist_run = [3, 4, 7, 8, 11, 12]
fist_feet_run = [5, 6, 9, 10, 13, 14]
rename_mapping = {'Fc5.': 'FC5', 'Fc3.': 'FC3', 'Fc1.': 'FC1', 'Fcz.': 'FCZ', 'Fc2.': 'FC2', 'Fc4.': 'FC4',
'Fc6.': 'FC6', 'C5..': 'C5', 'C3..': 'C3', 'C1..': 'C1', 'Cz..': 'CZ', 'C2..': 'C2', 'C4..': 'C4',
'C6..': 'C6', 'Cp5.': 'CP5', 'Cp3.': 'CP3', 'Cp1.': 'CP1', 'Cpz.': 'CPZ', 'Cp2.': 'CP2',
'Cp4.': 'CP4', 'Cp6.': 'CP6', 'Fp1.': 'FP1', 'Fpz.': 'FPZ', 'Fp2.': 'FP2', 'Af7.': 'AF7',
'Af3.': 'AF3', 'Afz.': 'AFZ', 'Af4.': 'AF4', 'Af8.': 'AF8', 'F7..': 'F7', 'F5..': 'F5', 'F3..': 'F3',
'F1..': 'F1', 'Fz..': 'FZ', 'F2..': 'F2', 'F4..': 'F4', 'F6..': 'F6', 'F8..': 'F8', 'Ft7.': 'FT7',
'Ft8.': 'FT8', 'T7..': 'T7', 'T8..': 'T8', 'T9..': 'T9', 'T10.': 'T10', 'Tp7.': 'TP7', 'Tp8.': 'TP8',
'P7..': 'P7', 'P5..': 'P5', 'P3..': 'P3', 'P1..': 'P1', 'Pz..': 'PZ', 'P2..': 'P2', 'P4..': 'P4',
'P6..': 'P6', 'P8..': 'P8', 'Po7.': 'PO7', 'Po3.': 'PO3', 'Poz.': 'POZ', 'Po4.': 'PO4', 'Po8.': 'PO8',
'O1..': 'O1', 'Oz..': 'OZ', 'O2..': 'O2', 'Iz..': 'IZ'}
def get_physionet(subject: int):
"""
:param subject: SN of subject : [1,109]
:return: data shapes (-1, channels, 640)
"""
# loading from file
for r in LR_fist_run:
raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
if r == LR_fist_run[0]:
raw_LR_fist = raw_new
else:
raw_LR_fist.append(raw_new)
for r in fist_feet_run:
raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
if r == fist_feet_run[0]:
raw_fist_feet = raw_new
else:
raw_fist_feet.append(raw_new)
raw_LR_fist.rename_channels(rename_mapping)
raw_fist_feet.rename_channels(rename_mapping)
ch_pick = ["FC1", "FC2", "FC3", "FC4", "C3", "C4", "C1", "C2",
"CP1", "CP2", "CP3", "CP4"]
# get the data and labels
event_id_LR_fist = dict(T1=0, T2=1)
events, _ = mne.events_from_annotations(raw_LR_fist, event_id_LR_fist, verbose='ERROR')
epochs_LR_fist = mne.Epochs(raw_LR_fist, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
verbose='ERROR')
event_id_fist_feet = dict(T1=2, T2=3)
events, _ = mne.events_from_annotations(raw_fist_feet, event_id_fist_feet, verbose='ERROR')
epochs_fist_feet = mne.Epochs(raw_fist_feet, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
verbose='ERROR')
data = np.concatenate((epochs_LR_fist.get_data(picks=ch_pick), epochs_fist_feet.get_data(picks=ch_pick)))
scaler = preprocessing.StandardScaler()
for i in range(len(data)):
scaler.fit(data[i])
data[i] = scaler.transform(data[i])
labels = np.concatenate((epochs_LR_fist.events[:, 2], epochs_fist_feet.events[:, 2]))
labels = to_categorical(labels) # one-hot
# reshape and return
train_data_ori, test_data_ori, train_label_ori, test_label_ori = train_test_split(data, labels, test_size=0.2,
random_state=42)
train_data = np.empty((0, 2, train_data_ori.shape[2]))
train_label = np.empty((0, 4))
test_data = np.empty((0, 2, test_data_ori.shape[2]))
test_label = np.empty((0, 4))
for i in range(0, len(ch_pick), 2):
train_data = np.concatenate((train_data, train_data_ori[:, i:i + 2, :]))
test_data = np.concatenate((test_data, test_data_ori[:, i:i + 2, :]))
train_label = np.concatenate((train_label, train_label_ori))
test_label = np.concatenate((test_label, test_label_ori))
print('data loaded.')
return train_data, test_data, train_label, test_label
if __name__ == '__main__':
res = get_physionet(1)
for r in res:
print(r.shape)
from mi-eeg-1d-cnn.
In order to test your assertion: "in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at the same time, that may be cause the acc rise not for the Model cause." I used the following script, which basically tests 2 things:
- If there are duplicate instances in the entire reshaped dataset, the train set, and the validation/test set.
- If there are instances present in both train and test/valid set.
import sys
sys.path.append("/workspace")
import numpy as np
import tensorflow as tf
from data_processing.general_processor import Utils
from sklearn.model_selection import train_test_split
tf.autograph.set_verbosity(0)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print(physical_devices)
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)
#Params
source_path = "/dataset/paper/"
# Load data
channels = Utils.combinations["a"] #["FC1", "FC2"], ["FC3", "FC4"], ["FC5", "FC6"]]
exclude = [38, 88, 89, 92, 100, 104]
subjects = [n for n in np.arange(1,110) if n not in exclude]
#Load data
x, y = Utils.load(channels, subjects, base_path=source_path)
#Transform y to one-hot-encoding
y_one_hot = Utils.to_one_hot(y, by_sub=False)
#Reshape for scaling
reshaped_x = x.reshape(x.shape[0], x.shape[1] * x.shape[2])
#Grab a test set before SMOTE
def check_duplicate(element_list):
for elem in element_list:
if element_list.count(elem) > 1:
return True
return False
x_train_raw, x_valid_test_raw, y_train_raw, y_valid_test_raw = train_test_split(reshaped_x,
y_one_hot,
stratify=y_one_hot,
test_size=0.20,
random_state=42)
reshaped = reshaped_x.tolist()
x_train = x_train_raw.tolist()
x_valid = x_valid_test_raw.tolist()
print(check_duplicate(reshaped))
print(check_duplicate(x_train))
print(check_duplicate(x_valid))
for sample in reshaped_x.tolist():
if (sample in x_train) and (sample in x_valid):
print("Problems")
This simple script shows that train and test/valid instances are not present in both sets simultaneously.
If this is not the answer you are expecting, please describe more in detail your problem.
Thanks
from mi-eeg-1d-cnn.
Thanks for your reply, but not exactly what i'm asking, sorry..
Here i try to describe it with figure: Let's say raw data like this, 6 channels, 2 events
We load and reshape it, get X, like the x in train_a.py line 46
In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar. If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set. As the dataset growing larger, it will happend for certain.
In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample. Because for application, new input data will be more like red sample, and there is none of its channels in train_set.
My question is how your work make sure of that.
Thank you very much!
i have the same issue with you. maybe we can learn from eachother weixin:laoyao_023
from mi-eeg-1d-cnn.
Forgive me for responding so late. I took some time to check your hypothesis and unfortunately it is correct! Thank you for finding this serious bug in the code. I am actively working to see if the same accuracy can be achieved by removing this bug. At the moment I see it as a little difficult to achieve the same accuracy as the network is heavily overfitting. If it is not possible to solve this bug I will personally contact the journal. I will post in the next few days the resolution of this bug. In case you are working and want to share some thoughts please don't hesitate. Thanks again for the support.
from mi-eeg-1d-cnn.
Thank you so much for pointing out this error. 🥇
I'm reopening the issue because it might be helpful to others.
I will post the fix shortly.
from mi-eeg-1d-cnn.
I tried split train/test/valid set before concatenating channels, and my acc in BCI IV2a dataset turned out to be 24.6%, which means the network didn't work at all......I read your paper and I read "A_Simplified_CNN_Classification_Method_for_MI-EEG" from you references. The author's accuracy is about 97%, maybe she made the same mistake?
from mi-eeg-1d-cnn.
I tried to replicate the code from "A Simplified CNN Classification Method for MI-EEG". Unfortunately, I was unable to achieve the accuracy they claim. Be careful the problem is not reshaping the data. The problem is in the generation. Give me a few days and I'll insert the fix below.
from mi-eeg-1d-cnn.
@Kubasinska
Hello,
Be careful the problem is not reshaping the data. The problem is in the generation.
Could you please explain the bug in the generation process?
Thanks!
from mi-eeg-1d-cnn.
Related Issues (17)
- ICA on Imagery movement
- New strategy for real time classification
- Prediction Time
- Theoretical aspects, open discussion
- 2 second window, changelog HOT 13
- Original HopefullNet Architecture
- LSTM single-subject approach HOT 2
- Test HOT 4
- Test #2
- ICA on Real Movement
- Installing the packages with the environment YML HOT 3
- A problem with the path to manually download the dataset HOT 3
- Questions about the operation of script train_d.py and train_e.py HOT 1
- Questions about the SavedModel
- Question consultation HOT 1
- To-Dos week April 26th
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mi-eeg-1d-cnn.