florentf9 / deeptemporalclustering Goto Github PK

View Code? Open in Web Editor NEW

219.0 219.0 58.0 189 KB

:chart_with_upwards_trend: Keras implementation of the Deep Temporal Clustering (DTC) model

License: MIT License

Python 100.00%

clustering deep-learning dtc time-series

deeptemporalclustering's Introduction

deeptemporalclustering's People

Contributors

Stargazers

Watchers

Forkers

njust-taoye hedgefair kenuku amazing961001 xinding136 woody5962 joseph8923 yiyongzhang han-so1omon leonbai liuchen-er edtrochim jzyee jlabhishek oldlipe yarson kmarwah mtyhon danielrzapatas krooner samumantha denalist q138ben javierclb wtwong316 ren-1247 rexsyc ibrahim85 jhuang2023 momith darkflamexmk kopalgarg whn09 liwenqiang1990 abhipaiangle zhenlongsong kelvintao wangxueyuan2020 rqym johncruyff14 h-jiaqi sajinpgupta eliasca93 achillessanger alsaeedhasnaa s1790947473 ebresso oriolaguilar henri-xyu02 joost6196 ottoguy heart1999 zplzmzmpl sandy4321 blue-coconut jennejjj yzh-dot

deeptemporalclustering's Issues

how to load model.h5

Hi Florent, I've got some doc vectors，and tried to fit the model.
Now I got some trained models( named like 'model_40.h5'). I want to see the details of clustering or predict new doc vectors, how could I load these DTC models?
Many thanks.

Practical Use

While the theory is interesting and has some application, using what you've made is difficult. I would like to see an example

Loss interpretation

Hi,
Isnt the order of losses like loss[0] is reconstruction loss, loss[1] clustering loss and loss[2] heatmap loss looking at the compile function for loss?

The below order in the code looks incorrect. Please confirm
logdict['L'] = loss[0]
logdict['Lr'] = loss[1]
logdict['Lc'] = loss[2]

Heatmap issue

I am getting below heatmap for the above data window where there is no event but the heatmap value is close to 15.

When there is an event in the data window the heatmap threshold is 15 which is same as the above threshold when there is no event in the data window

Any suggestions in generating the heatmap graph?

CuDNNLSTM not found

Hey,

actually I'm using the newest Version of Keras (2.4.0) and Tensorflow (2.4), in this case I can’t load CuDNNLSTM (TAE.py).

I also tried to install an older version of Keras (2.3.0) and Tensorflow which includes CuDNNLSTM, but in this case there are new errors as well.

Can you create a new Requirements.txt which include all libraries with their specific version you use.

I hope you can help me

Agglomerative Clustering without n_clusters

I am testing this out for a music-similarity dataset, which does not have a defined number of clusters. Would your DTC library work the same for use with Agglomerative Clustering where {n_clusters=None, distance_threshold=d, compute_full_tree=True}?

It would seem that TSClusteringLayer and heatmap generation require n_clusters.

Perform clustering without ground-truth?

Can you share a sample code of using DTC which doesn't have ground-truth in the data?
i.e. we have unlabeled dataset and we need to cluster them

input shape

hello ,
I am trying to replicate this DTC with my data off 25000 time steps of single series and 17 features, but when I pass it to the encoder the time steps are reducing but not the features, i tried to transpose the input data but I get dimension error.

can anyone guide me with what is the correct input dimension for the encoder.

data shape = (25000,17),
reshaped = (25000,1 17)??

Heatmap use

Could you please explain how to visualize the heatmap weights on a time series ?

About the loss value

When I usepython DeepTemporalClustering.py --heatmap False --dist_metric cid --dataset CBF --pool_size 8to train the DTC, no matter what dataset used, loss value always suddenly increases on the 8th epoch, looks like weight of Lr and Lc also change with epoch, which really confuses me

Looking forward to your reply,thank you!

Dimension Reduction

Hi, Thank you very much for your implementation. While looking into the autoencoder architecture I didn't understand how this is doing dimension reduction. As the encoder is passing the same vector because the return sequence value is True. I am sorry If I didn't fully understand your implementation
As I am working on an encoder to encode the time series data my data is somehow looking like 200 numbers of samples and each sample has 1500 points (acceleration signal from measurement ). (200,1500) I want to encode these 200 responses to latent space let say 2 or 4 dim. The output of the latent variable should look like this (2,1500).
Can you help me out here on how to use this architecture?
Zohaib

ValueError: Input 0 is incompatible with layer AE: expected shape=(None, 5210, 6), found shape=(None, 6)

Hello,
Thank you very much for the code.
I am having some issues using your classes with my own program (also when running the main code in DeepTemporalClustering.py).
My code so far:

dataSource = web.DataReader('S68.SI', 'yahoo', start=start_date, end=end_date)
X_train = dataSource.to_numpy()
# Some constant values
n_clusters = 2
pretrain_optimizer = 'adam'
optimizer = 'adam'
batch_size = 64
# Initialize model
dtc.initialize()
dtc.model.summary()
dtc.compile(gamma=1.0, optimizer=optimizer, initial_heatmap_loss_weight=0.1, final_heatmap_loss_weight=0.9)
# Pre train
dtc.pretrain(X=X_train, optimizer=pretrain_optimizer,
                     epochs=10, batch_size=batch_size,
                     save_dir='results/tmp')

At this point I'm getting the above mentioned error.

The X_train shape is 5210 by 6. i.e. 5210 timesteps and 6 features.

Upon investigation it seems that this line of code in TAE.py is causing the problem:
x = Input(shape=(timesteps, input_dim), name='input_seq')
Is it necessary for the Input shape to include the timesteps?
I checked online and it seems that only the features should be part of the input shape. Is this correct?

Thank you and regards.

Training and Validation Losses

Can you please explain the losses: L, Lc, Lr and T

Nan: Predicted Value

Hi,
I got all nan predicted values when running your code. In DTC.fit, everything goes well when calculating p and q for the first time just after "init_cluster_weights". But when it reached "model.fit(X_train, [X_train, p]" for the first time and then got predicted value in the next iteration, all predicted values turned to be nan.

I have tried several ways to modify it, including:
enlarge batchsize
reduce learning rate of Adam
grad clip
redefine the KL loss function to avoid log(0)

I am sure there is no nan or inf in the input data. Could you please help me to solve the problem?

Looking forward to your reply and HAPPY CHRISTMAS HOLIDAY!

Dependency Problems with cudnn and Tensorflow

Hi, I have been trying to use this implementation to cluster some time series data in education domain. Keras in my machine is using tensorflow backend (1.13.1) instead of Theano.My Cudnn version was 7.6 and cuda toolkit was 10.0.
Running the code with such configuration in an Anaconda environment is giving me the following error (while it prints epoch 1/10 on the terminal):
"Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"
Can you please tell me whether the implementation has any version dependencies of tensorflow and cuda toolkit. It tried to downgrade tensorflow to 1.12.0. But then I had to downgrade cudda toolkit as well and it still did not work.

variable time step

Hi in the documentation for the DTC object, found in DeepTemporalClustering.py, it is indicated that the timesteps param can be variable. However when I instantiate as follows:

dtc = DTC(n_clusters=3,
input_dim=X_train.shape[-1],
timesteps=None,
n_filters=50,
kernel_size=10,
strides=1,
pool_size=None,
n_units=[50, 1],
alpha=1,
dist_metric='eucl',
cluster_init='kmeans',
heatmap=False)

I get an error. There is an assert which brings up a typeError.

TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

Should I be using 0 instead of None?

Heatmap

Hello, how can I see the generated heatmap? I set --heapmap to true, but I don't see the generated heatmap.

Assertion error

Hello, I have an assertion error when I run your code. I don't know how to modify it, and I don't know the function of this assertion. Can you explain it?

Problem with Autoencoder Dimensions

Hello, I'm trying to replicate your examples but keep getting this error on the output dimensions of the autoencoder.

Pretraining...
Traceback (most recent call last):
  File "DeepTemporalClustering.py", line 535, in <module>
    save_dir=args.save_dir)
  File "DeepTemporalClustering.py", line 313, in pretrain
    self.autoencoder.fit(X, X, batch_size=batch_size, epochs=epochs, verbose=verbose)
  File "C:\Users\Computer\Anaconda3\lib\site-packages\keras\engine\training.py", line 1154, in fit
    batch_size=batch_size)
  File "C:\Users\Computer\Anaconda3\lib\site-packages\keras\engine\training.py", line 621, in _standardize_user_data
    exception_prefix='target')
  File "C:\Users\Computer\Anaconda3\lib\site-packages\keras\engine\training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking target: expected output_seq to have shape (6400, 1) but got array with shape (128, 1)

The autoencoder output is expecting 6400 = 128 (timesteps) x 50 (n_filter). I know its in the autoencoder because I checked the output dimensions of encoder, decoder and autoencoder:

I tried replacing it with the

output = Conv1D(1, kernel_size, strides=strides, padding='same', activation='linear', name='output_seq')(decoded)

line that was commented out in TAE.py but that just returned another error:

ValueError: Input 0 is incompatible with layer output_seq: expected ndim=3, found ndim=4

I also tried using temporal_autoencoder_v2 in TAE.py but that just returned another shape error:

ValueError: Input 0 is incompatible with layer dense: expected shape=(None, 16, 100), found shape=(None, 16, 2)

I am very cautious of playing with the architecture too much as I want to be able to replicate the results. Any suggestions on what to try?

ValueError: The name "reshape" is used 2 times in the model. All layer names should be unique.

Hi,

Running the code in terminal resulted in the following error. Do you happen to know what the problem is? Thanks.

(DeepTemporalClustering) e:\DeepTemporalClustering>python DeepTemporalClustering.py --heatmap true --n_clusters 2 --pool_size 8
Namespace(ae_weights=None, alpha=1.0, batch_size=64, cluster_init='kmeans', dataset='CBF', dist_metric='eucl', epochs=100, eval_epochs=1, final_heatmap_loss_weight=0.9, finetune_heatmap_at_epoch=8, gamma=1.0, heatmap=True, initial_heatmap_loss_weight=0.1, kernel_size=10, n_clusters=2, n_filters=50, n_units=[50, 1], patience=5, pool_size=8, pretrain_epochs=10, save_dir='results/tmp', save_epochs=10, strides=1, tol=0.001)
128
0
2021-01-31 18:22:27.823495: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:AutoGraph could not transform <bound method TSClusteringLayer.call of <TSClusteringLayer.TSClusteringLayer object at 0x000001EAF288B9D0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Traceback (most recent call last):
File "DeepTemporalClustering.py", line 516, in
dtc.initialize()
File "DeepTemporalClustering.py", line 113, in initialize
self.model = Model(inputs=self.autoencoder.input,
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\keras\engine\training.py", line 242, in new
return functional.Functional(*args, **kwargs)
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\training\tracking\base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 115, in init
self._init_graph_network(inputs, outputs)
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\training\tracking\base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 190, in _init_graph_network
nodes, nodes_by_depth, layers, _ = _map_graph_network(
File "E:\miniconda3_64\envs\DeepTemporalClustering\lib\site-packages\tensorflow\python\keras\engine\functional.py", line 941, in _map_graph_network
raise ValueError('The name "' + name + '" is used ' +
ValueError: The name "reshape" is used 2 times in the model. All layer names should be unique.

Requirements are hard to find out

I'm having trouble using your implementation, if you could add a requirements.txt would save a lot of time!
thanks in advance.