changun / collmetric Goto Github PK

View Code? Open in Web Editor NEW

160.0 160.0 61.0 2.87 MB

A Tensorflow implementation of Collaborative Metric Learning (CML)

License: GNU General Public License v3.0

Python 100.00%

collmetric's People

Contributors

Stargazers

Watchers

Forkers

wubinzzu taolian songbo729066989 haocoder hikylemorris yushuai wanfengren hintonfung zhangwj0101 tandychao hyzcn kimiqq benjamesbabala leezqcst dreadlord1984 ylongqi wuzl xun-yang blancyin jia-honghenrylee statml chensul zshwuhan itgtg byzhang scheeloong yimingyou eycab hebatef xuetf quyuanhang chunchenlin wentaotao duankai mauriziofd faychu songfgh wxj630 mvijaikumar legendtianjin jackwangsysu melinda315 emilywangattri zscdumin seeker1943 ethanyeoh himankckalal cse-ljl djcdjc qianrenjian lancifollia anonymousresearchlab wxyhhh neenerrh mvisionai hanzheli jiangyueyu zinhyeok sttopzero is0910635

collmetric's Issues

Loss wouldn't decrease when training on GPU while everything is OK on CPU

Hello,
thanks for open-sourcing your code and for it being so nicely written!

We've bumped into the following issue: everything works fine when we train the model on CPU (loss decreases very fast), however, when we run that very same code on GPU, the loss would stay approximately the same and the method wouldn't converge at all.

Batch size/learning rate modification didn't help.

Checked the issue on tensorflow 1.4.x and 1.7.0 and on different CUDA versions.

Did you have the same problems with CML? Do you have any ideas why that happens and how to fix it?

Many thanks in advance.

Validation recall is not improving during training

I just ran a baseline code by typing
python CML.py
I just get validation recall 0.001952619578277473 every epoch.
Although training loss is slightly decreasing, I cannot see any changes in validation recall.
Is the baseline code wrong?
How can i modify it to get the right result?

How to save the embedding matrix on CPU instead of GPU?

Because the embedding matrix is too large and there are two embedding matrix.
Every time I ran the scripts it return OOM error.
How can I save the embedding matrix on CPU instead of GPU?

Is it a bug not to eliminate first element of a line in `tag-item.dat` ?

tag-item.dat has number of tags related to an article at first element of any line.
So i think when counting number of tags first element should be eliminated.
But utils.py seems to regard the first element as a tag according to below code.

CollMetric/utils.py

Line 29 in d9026cf

if len(items) >= tag_occurence_thres:

CollMetric/utils.py

Line 37 in d9026cf

features[[int(i) for i in items], feature_index] = 1

Is it a bug or do I have any misunderstanding?

Multiprocessing queue memory release

Hi,
Thank you for the clear implementation.

I just notice that the Queue in the sampler is not explicitly closed after running, it may cause the memory leak and occupy the memory all the time.

Sir,Can you share me the original paper (CML),[email protected],thank u!

run your code but there is an error

I downloaded your code, and there was an error in running your program using pycharm in Windows.
The error is as follows:
Traceback (most recent call last):
File "", line 1, in
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\dok.py", line 244, in setitem
Traceback (most recent call last):
File "C:/Users/lenovo/Desktop/CollMetric-master/CML.py", line 331, in
if (isintlike(i) and isintlike(j) and 0 <= i < self.shape[0]
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\base.py", line 576, in getattr
sampler = WarpSampler(train, batch_size=BATCH_SIZE, n_negative=N_NEGATIVE, check_negative=True)
File "C:\Users\lenovo\Desktop\CollMetric-master\sampler.py", line 61, in init
self.processors[-1].start()
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 223, in _Popen
raise AttributeError(attr + " not found")
AttributeError: shape not found
return _default_context.get_context().Process._Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

how to work

请问一下，这个源代码如何运行，是需要搭建服务器的嘛

is the model code currently in the repository the same as the one you used to produce the paper results?

Great paper. I am really interested in replicating the results before using the model for a project involving transfer learning. Using the hyperparameters that are discussed here #15 (comment) along with the model code currently in the repo, I am unfortunately not able to reproduce the results. I've attached a screenshot of the model running for nearly 1000 epochs with the accompanied loss and recall validation.

Can you please advise? Thank you very much for your time!

Unsupported feed

Hi Changun,

I've been trying to start this project for a few days now.
But I'm running into the same error over and over again : Unsupported feed type.
I've tried it both with tensorflow GPU and CPU and on two different tf versions.
I'm using Windows and Python 3.5.2 and have numpy and all other dependecies named installed.

Every hit you could give me would be extremely appreciated.

10403 features over tag_occurence_thres (5)
Split data into train/valid/test: 100%|################################################| 7947/7947 [00:03<00:00, 2202.89it/s]
79527/25186/31555 train/valid/test samples
2017-07-30 13:19:08.928815: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_gu
ard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Recall on (sampled) validation set: 0.0
Optimizing...:   0%|                                                                                 | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
27, in _do_call
    return fn(*args)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
06, in _run_fn
    status, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\CML.py", line 397, in <module>
    optimize(model, sampler, train, valid)
  File ".\CML.py", line 361, in optimize
    model.negative_samples: neg})
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 89
5, in run
    run_metadata_ptr)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 11
24, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
21, in _do_run
    options, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
40, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type

SIR

Movielens 20m

I was trying to reproduce the results you had with Movielens20m.
I failed to reproduce them and assume it's cause of the way the dataset is filtered.

While I was able to reproduce the recall mention in the paper for movielens100k.

Could you let us know a bit more about your filtering of the Movielens20m dataset?
If you still have the dataset I would be extremely greatful if you could make it available such that I can compare my results better to yours.
I could bind it into the algorithm myself

Covariance loss is different from paper

Hi,
Thanks for great research. However, it seems that covariance loss is implemented different from the description in your paper.
(https://github.com/changun/CollMetric/blob/master/CML.py#L174) According to your paper, covariance loss is defined as 1/N(||C||_f - ||diag(C)||_2^2), where C is a covariance matrix between all pairs of dimensions. But you implemented it as summation of off-diagonal elements of covariance matrix, which may result in negative scale. Could you provide some more explanation about covariance loss?

Ignoring the user when number of items less than 5

I found that in the function split_data, when the number of user's item less than 5, the user just being ignored. However in the paper you said that Users who have less than 5 "ratings" are only included in the training set. Isn't that a problem?

What parameters do you use to achieve slightly better performance than the number reported in the paper?

Great Approach. Besides, I am wondering what parameters you use to achieve slightly better performance than the number reported in the paper. I change the learning rate to 0.001 and it achieve 29% recall in the citeulike dataset, which is lower than 33% recall reported in the paper. The parameters is as follows. Hope for your help soon.

model = CML(n_users,
n_items,
features=dense_features,
embed_dim=100,
margin=2.0,
clip_norm=1.1,
master_learning_rate=0.001,
hidden_layer_dim=512,
dropout_rate=0.3,
feature_projection_scaling_factor=1,
feature_l2_reg=0.1,
use_rank_weight=True,
use_cov_loss=True,
cov_loss_weight=1
)