Hi there, inspiring method and great paper have been trying to apply your work on

here is the function <div class="highlight highlight-source-python notranslate pos

Question about the train_test_spliter about mi-eeg-1d-cnn HOT 15 OPEN

ambitious-octopus commented on September 4, 2024 3

Question about the train_test_spliter

from mi-eeg-1d-cnn.

Comments (15)

zewail-liu commented on September 4, 2024 2

Thanks for your reply, but not exactly what i'm asking, sorry..

Here i try to describe it with figure:
Let's say raw data like this, 6 channels, 2 events

We load and reshape it, get X, like the x in train_a.py line 46

In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar.
If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set.
As the dataset growing larger, it will happend for certain.

In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample.
Because for application, new input data will be more like red sample, and there is none of its channels in train_set.

My question is how your work make sure of that.

Thank you very much!

from mi-eeg-1d-cnn.

leoeooe commented on September 4, 2024 2

Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected]
Thanks!

from mi-eeg-1d-cnn.

Mnaser95 commented on September 4, 2024 2

Thanks all for the useful discussion. This is exactly why I'm insisting on publishing the codes that correspond to any articles I publish. Unfortunately, making mistakes in programming will happen sometimes and there's no way around it, no matter how hard you work to check your code before submission.

from mi-eeg-1d-cnn.

zewail-liu commented on September 4, 2024 1

I'm more than sorry to hear that.
It's still a very remarkable and enlightening work despite the accuracy.
Thanks!

from mi-eeg-1d-cnn.

ambitious-octopus commented on September 4, 2024 1

Hi everyone, sorry I am responding late. I integrated the fix into the main branch; find the new generator and loader in the fix folder. I also updated the readme and alerted the Journal.
I explain below the problem that @zewail-liu (whom I thank again) pointed out.

Take a single trial, the subject thinks about moving the right fist for 4 seconds, and 64 channels record the brain activity. Of these 64 channels, we take only 4 for this example: C3, C4, CP3 and CP4. The idea described in the paper is to divide this single instance into two, one consisting of C3 and C4 and one consisting of CP3 and CP4. So, now we have two arrays: one that has a size of (2, 640) composed of C3 and C4 and one that has the same size (2, 640) but is composed of the channels CP3 and CP4. The label corresponding to these two examples is the same: imagined movement of the right fist. Ideally, these two examples should both go in the same dataset; either both in the training dataset or both in the test dataset. What happens instead is that they go one on the train and one on the test. The image below clarifies this example.

from mi-eeg-1d-cnn.

Ananas120 commented on September 4, 2024 1

Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected]
Thanks!

Hi, I just started a PhD thesis about EEG analysis and BCI so I am currently looking for papers to reproduce / compare my results with
I am quite sorry that the results of this paper were obtained with a bug in the splitting procedure because they were quite impressive and I like the 1D-CNN architecture...
As you have mentionned @leoeooe , many papers seem to claim this kind of (too) high accuracy (this one also ) but without sharing their code avoiding verification...
I am currently trying to reproduce the CSP with filterbank and time window data-processing as it seems to give really impressive results (if they are correct).
Nevertheless, I have found another paper using SVM for BCI-IV 2a with the CSP + FB + Time Windows and it achieves around 70-80% accuracy (in average) and the code seems consistant (I will share the link when I find it back) but it is not 90% yet and some subjects have poor accuracy (around 50%) so claiming >90% for all seems really surprising...
Have you achieved to reproduce some of these results since your last comment ?

The next week I will try to integrate this kind of processing, as well as the STFT proposed in this paper, with 1D-CNN and see whether it seems consistant to achieve that high accuracy on both the physionet and BCI-IV 2a datasets

from mi-eeg-1d-cnn.

dawin2015 commented on September 4, 2024 1

Here are some papers that may have the same bug: "Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0 "DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15 According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily. This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at [email protected] Thanks!

Thanks for your sharing infomation. I am reproducing the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data, which claimed the accuracy up to 90% im MI. Have you ever taka a look at this paper? Thanks again!

from mi-eeg-1d-cnn.

zewail-liu commented on September 4, 2024

here is the function

import mne
import matplotlib.pyplot as plt
from sklearn import preprocessing
import numpy as np
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split

"""
    Physionet MI-EEG Dataset
    64 channels EEG，160hz freq, 4 seconds MI-task
    14 runs for each of the 109 subjects
        runs [1, 2] is baseline
        others with marker
            T0   : rest, 
            T1/T2: left/right fist in runs [3, 4, 7, 8, 11, 12]
                   both fists/feet in runs [5, 6, 9, 10, 13, 14]

"""
data_path = r'D:\00-data\PhysioNet\ori\S001\\'
LR_fist_run = [3, 4, 7, 8, 11, 12]
fist_feet_run = [5, 6, 9, 10, 13, 14]
rename_mapping = {'Fc5.': 'FC5', 'Fc3.': 'FC3', 'Fc1.': 'FC1', 'Fcz.': 'FCZ', 'Fc2.': 'FC2', 'Fc4.': 'FC4',
                  'Fc6.': 'FC6', 'C5..': 'C5', 'C3..': 'C3', 'C1..': 'C1', 'Cz..': 'CZ', 'C2..': 'C2', 'C4..': 'C4',
                  'C6..': 'C6', 'Cp5.': 'CP5', 'Cp3.': 'CP3', 'Cp1.': 'CP1', 'Cpz.': 'CPZ', 'Cp2.': 'CP2',
                  'Cp4.': 'CP4', 'Cp6.': 'CP6', 'Fp1.': 'FP1', 'Fpz.': 'FPZ', 'Fp2.': 'FP2', 'Af7.': 'AF7',
                  'Af3.': 'AF3', 'Afz.': 'AFZ', 'Af4.': 'AF4', 'Af8.': 'AF8', 'F7..': 'F7', 'F5..': 'F5', 'F3..': 'F3',
                  'F1..': 'F1', 'Fz..': 'FZ', 'F2..': 'F2', 'F4..': 'F4', 'F6..': 'F6', 'F8..': 'F8', 'Ft7.': 'FT7',
                  'Ft8.': 'FT8', 'T7..': 'T7', 'T8..': 'T8', 'T9..': 'T9', 'T10.': 'T10', 'Tp7.': 'TP7', 'Tp8.': 'TP8',
                  'P7..': 'P7', 'P5..': 'P5', 'P3..': 'P3', 'P1..': 'P1', 'Pz..': 'PZ', 'P2..': 'P2', 'P4..': 'P4',
                  'P6..': 'P6', 'P8..': 'P8', 'Po7.': 'PO7', 'Po3.': 'PO3', 'Poz.': 'POZ', 'Po4.': 'PO4', 'Po8.': 'PO8',
                  'O1..': 'O1', 'Oz..': 'OZ', 'O2..': 'O2', 'Iz..': 'IZ'}


def get_physionet(subject: int):
    """
    :param subject: SN of subject : [1,109]
    :return: data shapes (-1, channels, 640)
    """
    # loading from file
    for r in LR_fist_run:
        raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
        if r == LR_fist_run[0]:
            raw_LR_fist = raw_new
        else:
            raw_LR_fist.append(raw_new)
    for r in fist_feet_run:
        raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
        if r == fist_feet_run[0]:
            raw_fist_feet = raw_new
        else:
            raw_fist_feet.append(raw_new)

    raw_LR_fist.rename_channels(rename_mapping)
    raw_fist_feet.rename_channels(rename_mapping)
    ch_pick = ["FC1", "FC2", "FC3", "FC4", "C3", "C4", "C1", "C2",
               "CP1", "CP2", "CP3", "CP4"]

    # get the data and labels
    event_id_LR_fist = dict(T1=0, T2=1)
    events, _ = mne.events_from_annotations(raw_LR_fist, event_id_LR_fist, verbose='ERROR')
    epochs_LR_fist = mne.Epochs(raw_LR_fist, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
                                verbose='ERROR')
    event_id_fist_feet = dict(T1=2, T2=3)
    events, _ = mne.events_from_annotations(raw_fist_feet, event_id_fist_feet, verbose='ERROR')
    epochs_fist_feet = mne.Epochs(raw_fist_feet, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
                                  verbose='ERROR')
    data = np.concatenate((epochs_LR_fist.get_data(picks=ch_pick), epochs_fist_feet.get_data(picks=ch_pick)))
    scaler = preprocessing.StandardScaler()
    for i in range(len(data)):
        scaler.fit(data[i])
        data[i] = scaler.transform(data[i])
    labels = np.concatenate((epochs_LR_fist.events[:, 2], epochs_fist_feet.events[:, 2]))
    labels = to_categorical(labels)  # one-hot

    # reshape and return
    train_data_ori, test_data_ori, train_label_ori, test_label_ori = train_test_split(data, labels, test_size=0.2,
                                                                                      random_state=42)
    train_data = np.empty((0, 2, train_data_ori.shape[2]))
    train_label = np.empty((0, 4))
    test_data = np.empty((0, 2, test_data_ori.shape[2]))
    test_label = np.empty((0, 4))
    for i in range(0, len(ch_pick), 2):
        train_data = np.concatenate((train_data, train_data_ori[:, i:i + 2, :]))
        test_data = np.concatenate((test_data, test_data_ori[:, i:i + 2, :]))
        train_label = np.concatenate((train_label, train_label_ori))
        test_label = np.concatenate((test_label, test_label_ori))
    print('data loaded.')
    return train_data, test_data, train_label, test_label


if __name__ == '__main__':
    res = get_physionet(1)
    for r in res:
        print(r.shape)

from mi-eeg-1d-cnn.

ambitious-octopus commented on September 4, 2024

In order to test your assertion: "in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at the same time, that may be cause the acc rise not for the Model cause." I used the following script, which basically tests 2 things:

If there are duplicate instances in the entire reshaped dataset, the train set, and the validation/test set.
If there are instances present in both train and test/valid set.

import sys
sys.path.append("/workspace")
import numpy as np
import tensorflow as tf
from data_processing.general_processor import Utils
from sklearn.model_selection import train_test_split
tf.autograph.set_verbosity(0)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print(physical_devices)
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)

#Params
source_path = "/dataset/paper/"

# Load data
channels = Utils.combinations["a"] #["FC1", "FC2"], ["FC3", "FC4"], ["FC5", "FC6"]]

exclude =  [38, 88, 89, 92, 100, 104]
subjects = [n for n in np.arange(1,110) if n not in exclude]
#Load data
x, y = Utils.load(channels, subjects, base_path=source_path)
#Transform y to one-hot-encoding
y_one_hot  = Utils.to_one_hot(y, by_sub=False)
#Reshape for scaling
reshaped_x = x.reshape(x.shape[0], x.shape[1] * x.shape[2])
#Grab a test set before SMOTE

def check_duplicate(element_list):
    for elem in element_list:
        if element_list.count(elem) > 1:
            return True
    return False

x_train_raw, x_valid_test_raw, y_train_raw, y_valid_test_raw = train_test_split(reshaped_x,
                                                                            y_one_hot,
                                                                            stratify=y_one_hot,
                                                                            test_size=0.20,
                                                                            random_state=42)
reshaped = reshaped_x.tolist()
x_train = x_train_raw.tolist()
x_valid = x_valid_test_raw.tolist()
print(check_duplicate(reshaped))
print(check_duplicate(x_train))
print(check_duplicate(x_valid))

for sample in reshaped_x.tolist():
    if (sample in x_train) and (sample in x_valid):
        print("Problems")

This simple script shows that train and test/valid instances are not present in both sets simultaneously.
If this is not the answer you are expecting, please describe more in detail your problem.
Thanks

from mi-eeg-1d-cnn.

huangliuzhou commented on September 4, 2024

Thanks for your reply, but not exactly what i'm asking, sorry..

Here i try to describe it with figure: Let's say raw data like this, 6 channels, 2 events

We load and reshape it, get X, like the x in train_a.py line 46

In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar. If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set. As the dataset growing larger, it will happend for certain.

In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample. Because for application, new input data will be more like red sample, and there is none of its channels in train_set.

My question is how your work make sure of that.

Thank you very much!

i have the same issue with you. maybe we can learn from eachother weixin:laoyao_023

from mi-eeg-1d-cnn.

ambitious-octopus commented on September 4, 2024

Forgive me for responding so late. I took some time to check your hypothesis and unfortunately it is correct! Thank you for finding this serious bug in the code. I am actively working to see if the same accuracy can be achieved by removing this bug. At the moment I see it as a little difficult to achieve the same accuracy as the network is heavily overfitting. If it is not possible to solve this bug I will personally contact the journal. I will post in the next few days the resolution of this bug. In case you are working and want to share some thoughts please don't hesitate. Thanks again for the support.

from mi-eeg-1d-cnn.

ambitious-octopus commented on September 4, 2024

Thank you so much for pointing out this error. 🥇
I'm reopening the issue because it might be helpful to others.
I will post the fix shortly.

from mi-eeg-1d-cnn.

DrugLover commented on September 4, 2024

I tried split train/test/valid set before concatenating channels, and my acc in BCI IV2a dataset turned out to be 24.6%, which means the network didn't work at all......I read your paper and I read "A_Simplified_CNN_Classification_Method_for_MI-EEG" from you references. The author's accuracy is about 97%, maybe she made the same mistake?

from mi-eeg-1d-cnn.

ambitious-octopus commented on September 4, 2024

I tried to replicate the code from "A Simplified CNN Classification Method for MI-EEG". Unfortunately, I was unable to achieve the accuracy they claim. Be careful the problem is not reshaping the data. The problem is in the generation. Give me a few days and I'll insert the fix below.

from mi-eeg-1d-cnn.

yosider commented on September 4, 2024

@Kubasinska
Hello,

Be careful the problem is not reshaping the data. The problem is in the generation.

Could you please explain the bug in the generation process?
Thanks!

from mi-eeg-1d-cnn.

Question about the train_test_spliter about mi-eeg-1d-cnn HOT 15 OPEN

Comments (15)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent