unimse's People
Forkers
bbtrbbt4dww4 xiaoheng-zhang99 can-it-run yangli-lab lm2233 cwp0 nlp-qmhn rsanthanagopalan ejbejaranosai lingy12 lml2468 tim08094495757 divya9878 panther1111 perpetual-pj angle404 nico-human zengdewei1 girotte-tao ayux72 qingzhong1 jinotter3 ritesh-47unimse's Issues
How do you set up the training and validation sets?
I noticed that you claim:
We integrate the training sets of MOSI,MOSEI, MELD, IEMOCAP to train the model and valid sets to select hyperparameters.
Did you fuse all the datasets into one training set, and did you use the same model weights on four benchmark reviews?
And did you also fuse the valid sets to get 1 best model for all 4 datasets?
配置环境问题
你好,我按照您给出的环境配置本地环境,模型运行到
if torch.cuda.is_available():
model = self.model.to(DEVICE)时,
报错RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM,将模型放在CPU上则没有问题,想请问一下您之前遇到过类似的吗?
关于Adapter层(多模态融合层)
您好,请问文章中将多模态融合层同时嵌入到编码器和解码器的设计原因是什么呢?如果仅将多模态层加入到编码器,效果是不如同时在编码器和解码器中吗?
感谢!
论文结果疑问
论文中mosi数据集的结果是五次随机种子取平均的结果吗
模型
感谢您的工作,请问您方便释放最终得到的模型吗?
when will you open your code, im interested about it!
Feature dimension mismatch
In all your shared feature files (like meld_data_0610.pkl
), the feature of a single utterance is two-dimensional, which means the feature of a dialogue will be three-dimensional and a batch will result in a four-dimensional tensors.
However, in the model's input part, I noticed that visual and acoustic features are directly inputted into the RNN model, which means the batch-level visual and acoustic features are three-dimensional tensors. I did not find dimensionality reduction code in this repository. Could you please explain or update the code accordingly?
Why multimodal fusion layer also used in the decoder part?
Optimal parameter
Hello ah, can you provide the optimal parameter list or config.py configuration information, I don't know the optimal parameter is carried out in adapter configuration or full tuning T5 to get, and I can't find relevant information in the paper.
Questions about model performance reproduction
Dear Author, I am trying to reproduce your code, but the current run is not optimal for performance. Here is what I have done:
- I set use_adapter, info_nce, use_cl, use_prefix_p, prefix_projection to True in the config and left the rest unchanged.
- I am using the dataset as shown in the picture. I did not run preprocess.py on the mosi,mosei dataset because I did not find the corresponding train/dev/test csv.
- I have only run 50 epochs so far.
I think this is because I didn't run enough epochs or my dataset processing is wrong. I wonder if you can provide some suggestions? Much appreciated!
请问一下最终输入至T5模型是只用两个模态:语音和视频吗,我看你的dataloader, 在文本这一项输入至模型的是一个空值
How would this apply to the Chinese dataset
Hello, I am very interested in this project.
1)When using the T5, do you initialize the T5 with the pre-training weight and then fine-tune the T5 weight with the multi-modal training? Or freezing the parameters of T5 all the time?
2)If I want to apply it to Chinese dataset, can I use mT5 instead of T5 for training?
3)And how long did the training take in the paper?
Thank you very much!
ABSA数据集使用
看到Sim_Process_v3中有用到ABSA的Restaurants和Laptops数据集,但是在论文以及后续代码中并没有继续使用,并且在readme**享的数据集链接中没有找到这两个数据集文件,请问是可以直接删除相关代码吗,或者能提示如何下载使用吗,感谢
Sharing the features via Google drive
Hi again @LeMei, I see that you have made the features available via baidu and was wondering if it is possible to share the data via Google drive since I'm having issues with baidu services.
请问一下这段代码该如何
'../datasets/Laptops/all_convert.json' not found
main.py中的文件缺失
请问当dataset为moseld,也就是使用mosi、mosei以及meld三个数据集时,new_moseld_train_align_0424.pkl、new_moseld_dev_align_0424.pkl这几个文件是如何生成的?
No such file or directory: 't5-large/pytorch_model.bin'
请问一下pytorch_model.bin这个文件是需要自己在哪里下载吗?
Can you give the code that can be easily reproduced?
Has anyone successfully reproduced it?
Has anyone successfully reproduced it?
Reproduction in 2023
Hello
I tried to run this code repo and I end up in some issues
How to star
- Datasets
I downloaded files for MOSI based on links provided in README.
In MOSI.zip i have found already created U-labelsnew_MOSI-label-v3.csv
I changed paths in config to fit my paths/to/data
(csv, mosi.pkl, etc....)
- T5
For each../t5-base
I renamed tot5-base
due to fact that this can be taken from hugging face repo ( automatically)
For PyTorch model.bin I opened corresponding T5 hugging face repo and download .bin file
Changed path/to/bin
- U can as well clone hugging face repo
So after this I ran
python main.py --dataset=mosi --multi=False
And...
in file data_loader.py
line 423
encoding = tokenizer( [task_prefix + sequence for sequence in inputs_seq], return_tensors="pt", padding=True )
this code throw issue due to fact that task_prefix + sequence
trying to do str + list[str] -> to fix this I did
task_prefix + ' '.join(sequence)
next....
in modeling_t5_prefix.py
line 1848
else: input_ids = self._prepare_decoder_input_ids_for_generation( input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id )
an error occur because what Im assume we put tensor as batch and func return expect tensor.ones((batch_size, 1)........)
and if input_ids
is here as batch_size as type tensor -> tried to fix this like -> input_ids.shape[1]
but than next lines start to fail
line 1900
logits_processor = self._get_logits_processor()
expect Optional argument logits_processor
which I assigned as None
and I finally end with fail
File "modules/modeling_t5_prefix.py", line 524, in forward
scores += position_bias
RuntimeError: The size of tensor a (32) must match the size of tensor b (55) at non-singleton dimension 3
I stopped here because I don't want fix this any more -> I assume code need refactor by author
Cannot install transformers==4.14.5 because such version did not exist and Im using transformers==4.16.0
最终使用的pickle数据集各部分分别代表什么?
碰到一个问题?
缺少变量的定义
meld_data_0610.pkl has only length 2
Hi, nice work!
I want to reproduce your work but I have some issues in reproducing.
I got meld_data_0610.pkl from the link https://drive.google.com/file/d/1pWH2xPVZFymxeJUrd6gF37qYbvmhh32s/view?usp=sharing in your readme
I tried to execute simcse/Sim_process_v3.py but I do not have Laptops directory.
So I tried to execute preprocess.py and create_dataset.py instead.
However, in preprocess.py, an error is occured
Traceback (most recent call last): File "src/preprocess.py", line 44, in <module> train_emotion_f, dev_emotion_f, test_emotion_f = emotion_features[0], emotion_features[1], emotion_features[2] KeyError: 0
(I have changed all string '0424' into '0610')
so I checked the length of emotion_features and it is 2
What should I do in this case?
Thank you.
Error when running evaluation function for MELD and IEMOCAP
Hi @LeMei, I am encountering problems when attempting to run the evaluation functions for the IEMOCAP and MELD datasets when running the model for MOSELDMP. Specifically when using the "eval_emotionlines(meld_results, meld_truths)" function. Hope to hear from you soon.
配置环境问题
Hello,
在您的Readme文件里即提到了用pytorch1.7又有用tensorflow-gpu,所以是这两个环境都需要安装吗?还是二选一呢?Tensorflow的版本可以是2.0吗?
关于训练的Loss
您好,
(1)请问训练loss中,除了两个对比损失,生成任务的L(task)是仅指交叉熵损失吗?(即torch.nn.CrossEntropyLoss)
(2)T5原文中提到了类似spanBert的bert-style的mask损失,请问论文中是否应用了这种目标函数呢?还是仅使用seq2seq的目标函数呢?
非常感谢!
按照readme执行main.py 发生错误,错误如下。能否解答一下,感谢
FileNotFoundError: [Errno 2] No such file or directory: '/home/dwh/unimse/datasets/MOSELDMP/new_moseldmp_train_align_v4_0424_a_6c_contexts.pkl'
提示没有new_moseldmp_train_align_v4_0424_a_6c_contexts.pkl文件,我没找到生成此文件的代码
您好,请问提取的视/音频特征是否包含了上下文,还是只有当前这句话的情况?
请问一下mosei数据集mosei_data_0610.pkl文件中的视觉部分和语音部分可以还原成对应的图片和语音文件吗,是否有相应的转换脚本?
如何使用这些文件
preprocess中的KeyError: 0
帮助,缺少文件和数据集。
"new_train_por_v4_0610_6c_sep_contexts.pkl"
"new_dev_por_v4_0610_6c_sep_contexts.pkl"
"new_test_por_v4_0610_6c_sep_contexts_v.pkl"
都没有找到,不清楚main.py中使用了哪些文件,也不清楚config.py如何工作,MOSELDMP数据集没有找到。你能给出一个逐步的方法来运行IEMOCAP数据集的代码吗?
Reproduction
Hello,
Can you provide a complete and reproducible code for researchers? Thanks!
请问Laptops在哪里下载
您好,感谢您的代码贡献。
我在运行Sim_Process_v3.py时遇到了问题;
提示找不到Laptops/all_convert.json;
请问这个文件在哪里下载呢,感谢您的帮助。
When will the paper be released?
Link to features is expired
Hi, I wanted to access the features for the different datasets, but the link says it is expired. Is it possible to make them available again? Thank you.
我把pickle 的keys 做了个输出,没有 0,1,2 可以解释下吗,感谢
ABSA数据集没有提供啊
求助 这个key 0 error怎么解决 谢谢!
如何生成iemocap_data_0610.pkl、meld_data_0610.pkl、mosei_data_0610.pkl、mosi_data_0610.pkl文件
作者你好:
请问iemocap_data_0610.pkl、meld_data_0610.pkl、mosei_data_0610.pkl、mosi_data_0610.pkl这四个文件是如何生成的,我在代码里没有找到生成上述四个文件的地方,可以提供一下生成的代码吗?感谢 感谢
是否需要将 modules/generation_utils.p覆盖掉transformers/generation_utils.py?
在modules/modeling_t5.py modeling_t5_prefix.py等文件里,您重写了_prepare_encoder_decoder_kwargs_for_generation函数(使用self._prepare_encoder_decoder_kwargs_for_generation调用),但代码input_ids = self._prepare_decoder_input_ids_for_generation(input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id) 仍会调用transformers/generation_utils.py里的_prepare_decoder_input_ids_for_generation【备注:其中第一个参数应为int型的batchsize】。我看到您在modules/generation_utils.py里将该函数的第一个参数改为了longtensor型【备注:符合input_ids的类型】,但modules/modeling_t5.py文件并没有使用modules/generation_utils.py而是使用了默认的transformers/generation_utils.py。
也就是运行代码时,会因为调用的函数参数类型不一致报错,请问我该如何修改使之适配呢?
非常感谢!
文件问题
Hello,May I ask which file is the one where these. json files are generated.
Unable to find '../t5-base/pytorch_model.bin'
Hi again @LeMei,
I'm getting this error when I run the main.py. I don't see any t5-base folder in the repository. Where can I find this file?
请问怎么复现,我照着readme运行,会报错
OSError: Can't load config for 'princeton-nlp/sup-simcse-bert-base-uncased'. Make sure that:
-
'princeton-nlp/sup-simcse-bert-base-uncased' is a correct model identifier listed on 'https://huggingface.co/models'
(make sure 'princeton-nlp/sup-simcse-bert-base-uncased' is not a path to a local directory with something else, in that case) -
or 'princeton-nlp/sup-simcse-bert-base-uncased' is the correct path to a directory containing a config.json file
报错结果如上
请问怎么解决
Do I need to download both BaiDu and Google Drive link?
两个链接内容是一样的吗?为什么我看见文件名是一样大的,但是文件大小不同?
非常感谢您的解答。
FileNotFoundError: [Errno 2] No such file or directory: PycharmProjects\\pythonProject4\\UniMSE-main\\datasets\\MOSELDMP/new_moseldmp_train_align_v4_0424_a_6c_contexts.pkl' Start loading the data....
How do I reproduce the numbers?
Hi,
Could you please add more documentation and steps to reproduce the numbers you have reported in the paper? At the moment, based just on the code it's not clear what scripts to run first. It seems like a few essential details are missing.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.