changun / collmetric Goto Github PK
View Code? Open in Web Editor NEWA Tensorflow implementation of Collaborative Metric Learning (CML)
License: GNU General Public License v3.0
A Tensorflow implementation of Collaborative Metric Learning (CML)
License: GNU General Public License v3.0
Hello,
thanks for open-sourcing your code and for it being so nicely written!
We've bumped into the following issue: everything works fine when we train the model on CPU (loss decreases very fast), however, when we run that very same code on GPU, the loss would stay approximately the same and the method wouldn't converge at all.
Batch size/learning rate modification didn't help.
Checked the issue on tensorflow 1.4.x and 1.7.0 and on different CUDA versions.
Did you have the same problems with CML? Do you have any ideas why that happens and how to fix it?
Many thanks in advance.
I just ran a baseline code by typing
python CML.py
I just get validation recall 0.001952619578277473 every epoch.
Although training loss is slightly decreasing, I cannot see any changes in validation recall.
Is the baseline code wrong?
How can i modify it to get the right result?
Because the embedding matrix is too large and there are two embedding matrix.
Every time I ran the scripts it return OOM error.
How can I save the embedding matrix on CPU instead of GPU?
tag-item.dat
has number of tags related to an article at first element of any line.
So i think when counting number of tags first element should be eliminated.
But utils.py
seems to regard the first element as a tag according to below code.
Line 29 in d9026cf
Line 37 in d9026cf
Is it a bug or do I have any misunderstanding?
Hi,
Thank you for the clear implementation.
I just notice that the Queue in the sampler is not explicitly closed after running, it may cause the memory leak and occupy the memory all the time.
I downloaded your code, and there was an error in running your program using pycharm in Windows.
The error is as follows:
Traceback (most recent call last):
File "", line 1, in
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\dok.py", line 244, in setitem
Traceback (most recent call last):
File "C:/Users/lenovo/Desktop/CollMetric-master/CML.py", line 331, in
if (isintlike(i) and isintlike(j) and 0 <= i < self.shape[0]
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\base.py", line 576, in getattr
sampler = WarpSampler(train, batch_size=BATCH_SIZE, n_negative=N_NEGATIVE, check_negative=True)
File "C:\Users\lenovo\Desktop\CollMetric-master\sampler.py", line 61, in init
self.processors[-1].start()
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 223, in _Popen
raise AttributeError(attr + " not found")
AttributeError: shape not found
return _default_context.get_context().Process._Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
请问一下,这个源代码如何运行,是需要搭建服务器的嘛
Great paper. I am really interested in replicating the results before using the model for a project involving transfer learning. Using the hyperparameters that are discussed here #15 (comment) along with the model code currently in the repo, I am unfortunately not able to reproduce the results. I've attached a screenshot of the model running for nearly 1000 epochs with the accompanied loss and recall validation.
Can you please advise? Thank you very much for your time!
Hi Changun,
I've been trying to start this project for a few days now.
But I'm running into the same error over and over again : Unsupported feed type.
I've tried it both with tensorflow GPU and CPU and on two different tf versions.
I'm using Windows and Python 3.5.2 and have numpy and all other dependecies named installed.
Every hit you could give me would be extremely appreciated.
10403 features over tag_occurence_thres (5)
Split data into train/valid/test: 100%|################################################| 7947/7947 [00:03<00:00, 2202.89it/s]
79527/25186/31555 train/valid/test samples
2017-07-30 13:19:08.928815: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_gu
ard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Recall on (sampled) validation set: 0.0
Optimizing...: 0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
27, in _do_call
return fn(*args)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
06, in _run_fn
status, run_metadata)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\CML.py", line 397, in <module>
optimize(model, sampler, train, valid)
File ".\CML.py", line 361, in optimize
model.negative_samples: neg})
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 89
5, in run
run_metadata_ptr)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 11
24, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
21, in _do_run
options, run_metadata)
File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
40, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type
I was trying to reproduce the results you had with Movielens20m.
I failed to reproduce them and assume it's cause of the way the dataset is filtered.
While I was able to reproduce the recall mention in the paper for movielens100k.
Could you let us know a bit more about your filtering of the Movielens20m dataset?
If you still have the dataset I would be extremely greatful if you could make it available such that I can compare my results better to yours.
I could bind it into the algorithm myself
Hi,
Thanks for great research. However, it seems that covariance loss is implemented different from the description in your paper.
(https://github.com/changun/CollMetric/blob/master/CML.py#L174) According to your paper, covariance loss is defined as 1/N(||C||_f - ||diag(C)||_2^2), where C is a covariance matrix between all pairs of dimensions. But you implemented it as summation of off-diagonal elements of covariance matrix, which may result in negative scale. Could you provide some more explanation about covariance loss?
I found that in the function split_data
, when the number of user's item less than 5, the user just being ignored. However in the paper you said that Users who have less than 5 "ratings" are only included in the training set. Isn't that a problem?
Great Approach. Besides, I am wondering what parameters you use to achieve slightly better performance than the number reported in the paper. I change the learning rate to 0.001 and it achieve 29% recall in the citeulike dataset, which is lower than 33% recall reported in the paper. The parameters is as follows. Hope for your help soon.
model = CML(n_users,
n_items,
features=dense_features,
embed_dim=100,
margin=2.0,
clip_norm=1.1,
master_learning_rate=0.001,
hidden_layer_dim=512,
dropout_rate=0.3,
feature_projection_scaling_factor=1,
feature_l2_reg=0.1,
use_rank_weight=True,
use_cov_loss=True,
cov_loss_weight=1
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.