Coder Social home page Coder Social logo

neuspeech1's Introduction

Hi there ๐Ÿ‘‹ I'm Yiqian Yang, running this orgnisation NeuSpeech.

  • ๐Ÿ”ญ Iโ€™m currently working on different types of neural signal and unified neural model. I am really interested in this.
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on fine-grained MEG-to-speech, unified neural model.
  • ๐Ÿ“ซ How to reach me: [email protected]
  • โšก Fun fact: my cat can do back-flip.

NeuSpeech's GitHub stats

Collaborators now:

Yiqun Duan (duanyiqun) Yiqun Duan

Hyejeong Jo (girlsending0) Hyejeong Jo

Qiang Zhang (jonyzhang2023) Qiang Zhang

neuspeech1's People

Contributors

neuspeech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

neuspeech1's Issues

I Have a some questions

hi
I am a researcher studying EEG-To-Text. I recently saw your Neuspeech paper. I was impressed by your paper, and it was a great help to my research direction. thanks. But I have some kinds of questions.

Have you ever applied ZuCO data? Did your model perform poorly on EEG data?
I am also trying various things using ZuCo data, but I am having a lot of trouble because I am not getting any meaningful results.

Could I possibly get your baseline code? I would like to experiment with MEG data using your dataloader and code. I look forward to a positive response.
thank you

I am very happy to be doing the same research as you.๐Ÿ˜„๐Ÿ˜„

Some question of data split

First, thank you for sharing your experiment and code on brain waves.
Please note that my English is not very good, so I may use some incorrect English sentences.

We have done a replication experiment based on the code provided and have achieved similar results to the performance reported in the paper.
We could not get Schoffelen's data, so we only used GWilliams.

While analyzing the experimental results, we found that most of the predicted (generated) sentences either have all the words matching the correct answer or all the words are incorrect.
I am a natural language processing major. As far as I know, there are many cases where the generation model generates only some words in the whole sentence incorrectly.
However, in my experiments with the provided code, there are very few such cases.

We analyzed the data and found that all the sentences were the same in the training and evaluation data.
There were a total of 23339 training data, but only 661 unique sentences.
Similarly, the evaluation data had 651 unique sentences out of 2918.

However, all 651 unique sentences in the evaluation data were included in the train data.
Every MEG path is unique and is not shared by training data and test data.
This is probably a problem with the generation process that generates MEG data from multiple people through the same sentence.

We believe that this separation of data is hard an accurate evaluation.
Pre-trained whisper can learn patterns in word sequences.
Therefore, in this environment, if pre-trained whisper correctly guesses the first word, it can easily predict all subsequent words.

Our simple data analysis code is shown below.
Also, we were unable to get hold of the Schoffelen data, can you tell us where we can download it?

import jsonlines

train_data_path = "{data_path}/preprocess5/split1/train.jsonl"
val_data_path = "{data_path}/preprocess5/split1/val.jsonl"
test_data_path = "{data_path}/preprocess5/split1/test.jsonl"


train_data_sent = []
train_data_meg_path = []
with jsonlines.open(train_data_path, mode='r') as reader:
    for json_obj in reader:
        train_data_sent.append(json_obj["sentence"])
        train_data_meg_path.append(json_obj["eeg"]["path"])

val_data_sent = []
val_data_meg_path = []
with jsonlines.open(val_data_path, mode='r') as reader:
    for json_obj in reader:
        val_data_sent.append(json_obj["sentence"])
        val_data_meg_path.append(json_obj["eeg"]["path"])
        
test_data_sent = []
test_data_meg_path = []
with jsonlines.open(test_data_path, mode='r') as reader:
    for json_obj in reader:
        test_data_sent.append(json_obj["sentence"])
        test_data_meg_path.append(json_obj["eeg"]["path"])

print("counting unique elements")
print("train")
print("sentence", len(train_data_sent))
print("unique_sentence", len(set(train_data_sent)))
print("meg", len(train_data_meg_path))
print("unique_meg", len(set(train_data_meg_path)))
print()
print("val")
print("sentence", len(val_data_sent))
print("unique_sentence", len(set(val_data_sent)))
print("meg", len(val_data_meg_path))
print("unique_meg", len(set(val_data_meg_path)))
print()
print("sentence", len(test_data_sent))
print("unique_sentence", len(set(test_data_sent)))
print("meg", len(test_data_meg_path))
print("unique_meg", len(set(test_data_meg_path)))
print()

same_sent = 0
same_meg = 0
for i,j in zip(test_data_sent, test_data_meg_path):
    if i in test_data_sent:
        same_sent += 1
    if j in train_data_meg_path:
        same_meg += 1

print("number of test_data", len(test_data_sent))
print("counting of setence in train-data", same_sent)
print("counting of meg in train-data", same_meg)

result

counting unique elements
train
sentence 23339
unique_sentence 661
meg 23339
unique_meg 23339

val
sentence 2917
unique_sentence 647
meg 2917
unique_meg 2917

sentence 2918
unique_sentence 651
meg 2918
unique_meg 2918

number of test_data 2918
counting of setence in train-data 2918
counting of meg in train-data 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.