talhanai / redbud-tree-depression Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 30.0 23 KB

scripts to model depression in speech and text

Python 100.00%

redbud-tree-depression's People

Contributors

Stargazers

Watchers

redbud-tree-depression's Issues

Question about X_{train, dev} shape and content

Hi @talhanai,

The input tenor to the LSTM model is of shape [Nexamples, Ntimesteps, Nfeatures]. Nfeatures is the feature dimension (audio=279/text=100), but how do I make sense of Nexamples and Ntimesteps? I am guessing that they have to do with the parameters timestep and stride mentioned in the paper (Audio was 30 timesteps, with stride 1. Text was 7 timesteps, and stride 3). I would appreciate it if you could elaborate on how you used (timestep, stride) parameters to reshape your feature set to the LSTM input tensor.

Initially, I thought Nexamples referred to the number of responses, e.g. 8050 (from Section 4.1.2, 4.1.3), and thus, the number of responses for audio and the number of responses for text should be the same. But then this part in Section 4.3.2 confused me,
The audio and text inputs for each LSTM branch had different strides and timesteps yielding a different number of training (and development) examples, therefore we needed to equalize the number of examples (Audio was 30 timesteps, with stride 1. Text was 7 timesteps, and stride 3). This step was performed by padding the number of training examples in the smaller set (text) to match that larger set (audio) by mapping examples together that appeared in the same window of the interview..

Thanks in advance!

data/audio/x_train.npy

Hello, if there is any code that can generate training data, or if you want to write your own code based on your own articles

How to exclude some features ?

"From the initial set of 553 features, we excluded all features without a statistically significant univariate correlation with outcomes on the training set (|ρ| < 1e-01, p > 1e-02) nor a significant L1 regularized logistic regression model coefficient (|β| < 1e-04), thus resulting in a subset of 279 features and 8,050 examples (responses)"

How to exclude some features to get a subset of 279 features ?

No such file or directory: 'data/audio/X_train.npy'

Hello, when I surfaced your code, I found No such file: No such file or directory: 'data/audio/ x_train.npy '. May I ask what file it is and how it was generated

Feature processing

How do you deal with the problem of the audio files containing the interviewer voice?How to get rid of the interviewer's voice ？how to extract the higher-order statistics features of 79 convarep features?

Question about validation and test data

Hi @talhanai,

I hope you can help me out with a question about your trainLSTM.py code.

# train model
model.fit(X_train, Y_train,
			batch_size=batch_size,
			epochs=epochs,
			validation_data=(X_dev, Y_dev),
			class_weight=cweight,
			callbacks=callbacks_list)

# load best model and evaluate
model.load_weights(filepath=filepath_best)

# gotta compile it
model.compile(loss=loss,
			optimizer=sgd,
			metrics=['accuracy'])

# return predictions of best model
pred        = model.predict(X_dev,   batch_size=None, verbose=0, steps=None)
pred_train  = model.predict(X_train, batch_size=None, verbose=0, steps=None)

return pred, pred_train

# 5. evaluate performance
f1 = metrics.f1_score(Y_dev, np.round(pred), pos_label=1)

Particularly, I am having trouble understanding why you are using X_dev and Y_dev as both validation data and test data. By using them for both validating and testing would result in data leakage.

From reading your paper, I understand that you were only working with the training set and development set of the DAIC dataset. So here, I am assuming that X_train, Y_train are from the training set and X_dev and Y_dev are from the development set.

Any insights would be very much appreciated!

Python environment version

Hi, I'm going to reproduce your code, how much of the Python version do you use

Some problem for accuracy

Hi, sorry for bothering you. I try to reproduce the experimental results about your work, but I came across a problem. Without any change for your code, the predictions for train and validation are all negative samples:

dev_pred:[[0.30978903],........,[0.30978903]]
pred_train_adudio:[[0.30978903],........,[0.30978903]].

In particular, as for the training process, 1) the loss is optimized from 0.6555 to 0.5990; 2) the accuracy is fixed to 0.7143 the rate between negative samples and all samples.

Can you help me to solve this problem? Thank you so much~~^_^

How to calculate the mae, rmse metric?

Hi, i have read your paper. I am doubt that why the metrics (mae,rmse) value in your experiment >1? I am using the mae metric in Keras to train my model, but its value is between 0 and 1? Could you tell me how do you calculate the metrics in your paper?

How to get 8050 training examples (subject responses)?

I really appreciate that you published your code!

I am currently trying to replicate your feature generation process. Could you please elaborate on your process of narrowing down to the 8,050 examples from the training set? My understanding is that they are only the subject's responses to Ellie's queries. I am having difficulty in getting the exact number of training examples that you have.

Thanks in advance!

talhanai / redbud-tree-depression Goto Github PK

redbud-tree-depression's People

Contributors

Stargazers

Watchers

Forkers

redbud-tree-depression's Issues

Question about X_{train, dev} shape and content

data/audio/x_train.npy

How to exclude some features ?

No such file or directory: 'data/audio/X_train.npy'

Feature processing

Question about validation and test data

Python environment version

Some problem for accuracy

How to calculate the mae, rmse metric?

How to get 8050 training examples (subject responses)?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent