artem-oppermann / deep-autoencoders-for-collaborative-filtering Goto Github PK

Using Deep Autoencoders for predictions of movie ratings.

License: Apache License 2.0

Python 100.00%

tensorflow collaborative-filtering deep-learning autoencoder

deep-autoencoders-for-collaborative-filtering's Introduction

Deep-Autoencoders-For-Collaborative-Filtering

Collaborative Filtering is a method used by recommender systems to make predictions about an interest of an specific user by collecting taste or preferences information from many other users. The technique of Collaborative Filtering has the underlying assumption that if a user A has the same taste or opinion on an issue as the person B, A is more likely to have B’s opinion on a different issue.

In this project I predict the ratings a user would give a movie based on this user's taste and the taste of other users who watched and rated the same and similar movies.

Datasets

The current version support only the MovieLens ml-1m.zip dataset obtained from https://grouplens.org/datasets/movielens/.

Model Training

Download the ml-1m.zip dataset from https://grouplens.org/datasets/movielens/.
Devide the ratings.dat file from ml-1m.zip into training and testing datasets train.dat and test.dat. by using the command
```
 python src\data\train_test_split.py 
```
Use shell to make TF_Record files out of the both train.dat and test.dat files by executing the command:
```
 python src\data\tf_record_writer.py 
```
Use shell to start the training by executing the command (optionally parse your hyperparameters):
```
  python training.py 
```

Training Results

During the training after each epoch the loss on the training and testing data set is shown. The loss is a root mean squared error loss (MSE). The mean absolute error (mean_abs_error) is a better metric to validate the performance however.mean_abs_error tells the differences between predicted ratings and true ratings. E.g. a mean_abs_error of 0.923 means that on an average the predicted rating deviates from the actual rating by 0.923 stars.

   epoch_nr: 0, train_loss: 1.421, test_loss: 0.967, mean_abs_error: 0.801
   epoch_nr: 1, train_loss: 0.992, test_loss: 0.961, mean_abs_error: 0.797
   epoch_nr: 2, train_loss: 0.987, test_loss: 0.962, mean_abs_error: 0.798
   epoch_nr: 3, train_loss: 0.981, test_loss: 0.965, mean_abs_error: 0.801
   epoch_nr: 4, train_loss: 0.969, test_loss: 0.974, mean_abs_error: 0.808
   epoch_nr: 5, train_loss: 0.949, test_loss: 0.988, mean_abs_error: 0.822

deep-autoencoders-for-collaborative-filtering's People

Contributors

Stargazers

Watchers

Forkers

rebcabin whiteg5 l-seven jiaofusen ccyf00 bigdata06 loveproject engineero kspook shivanandroy tunjiup sarthikadhawan tiffen lies0zeta zhiyongwu2015 mmnnlight raptormai rishibarve sakthikap caoqingmvp cxxzc stjordanis sumitsidana zdqf kummar jlbbj111 baominglee cordelia-git xrosliang jonimatix hninthant kaiw1125 reach-xx robinnx zhipingwen jpablocardona vincentpun123 zwl199606 iamtracyfu jaihernandez mxa256

deep-autoencoders-for-collaborative-filtering's Issues

Performance issues in src/data/dataset.py(P2)

Hello,I found a performance issue in the definition of _get_training_data ,
artem-oppermann_Deep-Autoencoders-For-Collaborative-Filtering/src/data/dataset.py,
dataset = dataset.map(parse) was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in dataset2 = dataset2.map(parse) ,
dataset = dataset.map(parse),

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

error when running training.py

Hello, I run the training.py, but get this error:

WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/ming/Desktop/Msc Project/Code/Deep-Autoencoders-For-Collaborative-Filtering/src/model/train_model.py:26: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[{{node IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training.py", line 114, in
tf.app.run()
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "training.py", line 92, in main
, loss=sess.run((train_op, train_loss_op))
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[node IteratorGetNext (defined at training.py:66) ]]

Caused by op 'IteratorGetNext', defined at:
File "training.py", line 114, in
tf.app.run()
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "training.py", line 66, in main
x_train= iter_train.get_next()
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 414, in get_next
output_shapes=self._structure._flat_shapes, name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1685, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): corrupted record at 0
[[node IteratorGetNext (defined at training.py:66) ]]

error running training.py

Getting the following errors:

2019-02-08 14:52:20.870854: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,3952]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training.py", line 110, in
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "training.py", line 88, in main
, loss=sess.run((train_op, train_loss_op))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0
[[node IteratorGetNext (defined at training.py:62) = IteratorGetNextoutput_shapes=[[?,3952]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'IteratorGetNext', defined at:
File "training.py", line 110, in
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "training.py", line 62, in main
x_train= iter_train.get_next()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 421, in get_next
name=name)), self._output_types,
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): corrupted record at 0
[[node IteratorGetNext (defined at training.py:62) = IteratorGetNextoutput_shapes=[[?,3952]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

record a website

This is tutorial about this code in Chinese. Copy it here in case forgetting the website.
https://mp.weixin.qq.com/s/mXJthyETebtww5TvEljuoQ

How to get train.dat and test.dat?

Hi
I try to study your demo, but my result is bad, so I guess there is a problem with my data.
so How did you generate training sets and test sets from raw data?

@LucRyan

@LucRyan
Thanks for your reply. I've tried the approach #2 mentioned before but still get overfitting.
split.py
It's this possible sharing your train/test.dat?

Originally posted by @kalashnlkov in #5 (comment)

Test loss is increasing?

I'm getting a weird behavior where training loss is decreasing and test loss is increasing. Any idea what might be happening?

epoch_nr: 0, train_loss: 1.323, test_loss: 0.964
epoch_nr: 1, train_loss: 0.990, test_loss: 0.960
epoch_nr: 2, train_loss: 0.988, test_loss: 0.964
epoch_nr: 3, train_loss: 0.990, test_loss: 0.963
epoch_nr: 4, train_loss: 0.986, test_loss: 0.965
epoch_nr: 5, train_loss: 0.974, test_loss: 0.968
epoch_nr: 6, train_loss: 0.963, test_loss: 0.975
epoch_nr: 7, train_loss: 0.951, test_loss: 0.985
epoch_nr: 8, train_loss: 0.942, test_loss: 0.996
epoch_nr: 9, train_loss: 0.930, test_loss: 0.997
epoch_nr: 10, train_loss: 0.926, test_loss: 0.999
epoch_nr: 11, train_loss: 0.921, test_loss: 1.004
epoch_nr: 12, train_loss: 0.920, test_loss: 1.005
epoch_nr: 13, train_loss: 0.919, test_loss: 1.007
.
.
.
epoch_nr: 265, train_loss: 0.589, test_loss: 1.291
epoch_nr: 266, train_loss: 0.589, test_loss: 1.292
epoch_nr: 267, train_loss: 0.588, test_loss: 1.293
epoch_nr: 268, train_loss: 0.588, test_loss: 1.293
epoch_nr: 269, train_loss: 0.588, test_loss: 1.292
epoch_nr: 270, train_loss: 0.586, test_loss: 1.294
epoch_nr: 271, train_loss: 0.586, test_loss: 1.294
epoch_nr: 272, train_loss: 0.585, test_loss: 1.293
epoch_nr: 273, train_loss: 0.586, test_loss: 1.293
epoch_nr: 274, train_loss: 0.585, test_loss: 1.294
epoch_nr: 275, train_loss: 0.585, test_loss: 1.294
epoch_nr: 276, train_loss: 0.586, test_loss: 1.297

Numbers of the movies and users

Hello:

Why I count the number of movies the users rated is 3695, not 3952,
and the number of training samples is 5954, not 5953.
how to count it?

Thank you~

Error running tf_record_writer

Getting the following errors:

Traceback (most recent call last):
  File "data/tf_record_writer.py", line 81, in <module>
    main()
  File "data/tf_record_writer.py", line 57, in main
    with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/tf_record.py", line 106, in __init__
    compat.as_bytes(path), compat.as_bytes(compression_type), status)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: TRAIN/train_001.tfrecord; No such file or directory

Performance issues in /src/data (by P3)

Hello! I've found a performance issue in dataset.py: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

dataset.batch(FLAGS.batch_size)(line 17) should be called before dataset.map(parse)(line 14).
dataset2.batch(1)(line 24) should be called before dataset2.map(parse)(line 21).
dataset.batch(1)(line 39) should be called before dataset.map(parse)(line 36).

Besides, you need to check the function called in map()(e.g., parse called in dataset.map(parse)) whether to be affected or not to make the changed code work properly. For example, if parse needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).