chenchongthu / enmf Goto Github PK

This is our implementation of ENMF: Efficient Neural Matrix Factorization (TOIS. 38, 2020). This also provides a fair evaluation of existing state-of-the-art recommendation models.

License: MIT License

Python 100.00%

deep-learning recommender-system efficient-algorithm recommendation evaluation state-of-the-art collaborative-filtering reproducibility reproducible-research sigir

enmf's People

Contributors

Stargazers

Watchers

enmf's Issues

After training by the default args, the result isn't good

Excuse me, after training by the default arguments, I get the recall and NDCG score, but the result isn't as good as the report in the paper.
Here is my result after 500 epochs:

recall@50, ndcg@50
0.0912251655629139 0.04411650302936786
recall@100, ndcg@100
0.28807947019867547 0.08591003366756622
recall@200, ndcg@200
0.43559602649006623 0.10980291110041669

Why is the result of pretty lower than the report of the paper?

ml-1m数据集的validation data

您好，看完ENMF的论文后有两个疑问：
1.ml-1m的数据集的validation data是ml.train.txt中每个用户交互序列的最后一个吗
2.另外，我发现和官网的数据集有不一样，这个数据集是经过什么方法处理过的吗

The nDCG formula may not consistent between your paper and implementation?

Hello, I read your paper(TOIS) and your implementation.
I have a question about your implementation of the formula of nDCG.

In your paper, you write the formula of nDCG using log2 (logarithmic base is 2), but in your implementation I think you use np.log (logarithmic base is natural logarithm). (c.g., L209 or L303 in code/ENMF.py)

Should you use np.log2 in your implementation for consistency?

Anyway, thank you for putting together a good paper and implementation!

Recall指标对齐

您好，我看到ENMF与LightGCN 和NBPO等方法，但我发现ENMF代码中recall指标的计算与其他几种方法没有对齐。我想确定您给定的ENMF结果是使用下面第一种还是第二种的结果？

第一种：ENMF中使用的是len(hit_items) / min(topk, len(ground_truth))

第二种使用的是 len(hit_items) / len(ground_truth)，如下
NBPO: https://github.com/Wenhui-Yu/NBPO/blob/master/Library.py#L14
LightGCN: https://github.com/kuandeng/LightGCN/blob/master/evaluator/python/evaluate_foldout.py#L20

Can not reproduce the results of ENMF on ml-lcfn dataset as claimed in README

Hi! You have done a good work! I have been trying your code these days and got some expected results, but I found it hard to reproduce the results on ml-lcfn dataset as claimed in README.

Here are my trials with the provided code:

When using default hyperparameters of provided code, i.e., dropout keep_prob=0.7 and negative weight=0.1, the best results I got were: NDCG@5=0.22135453305703484, NDCG@10=0.22871178869000672, NDCG@20=0.2525169010557999.
When using the suggested hyperparameters in README, i.e., dropout keep_prob=0.5 and negative weight=0.5, the best results I got were: NDCG@5=0.24160408294952565, NDCG@10=0.24239649929731227, NDCG@20=0.25935423043524214.
When setting the hyperparameters as dropout keep_prob=0.7 and negative weight=0.5 (which is the best pair I have tried), the best results I got were: NDCG@5=0.24156951242563474, NDCG@10=0.24269257187356102, NDCG@20=0.26141558703625023.

Note that none of above meets the promising results in README, i.e., NDCG@5=0.2457, NDCG@10=0.2475, NDCG@20=0.2656. Could you help me figure out how to reproduce them?

想请问下您有没有考虑实现过非负采样+lightGCN这样的模型呢？实验效果会比MF为基础模型好吗？

ml-1m数据集结果不一致

Hi，thanks for sharing the code.
With your source code unmodified (dropout: 0.5, neg-weight: 0.5), I have tried on ml-1m and get the following results:
First col: Recall
Second col: NDCG

loss,loss_no_reg,loss_reg -20357.158033288044 -20357.158033288044 0.0
TopK: [10, 20, 50]
Recall@10: 0.1 NDCG@10: 0.04893161658667473
Recall@20: 0.16258278145695365 NDCG@20: 0.06466930639306495
Recall@50: 0.29817880794701984 NDCG@50: 0.09132394256584671

Which is much lower than in readme:
NDCG@5, 10, 20
0.2457 0.2475 0.2656

May I have your help to reproduce your results on ml1m. Thanks.

DHCF实现细节讨论

看到你们有比较Dual Channel Hypergraph Collaborative Filtering这篇文章提出的DHCF方法，想来请教一下。
这篇文章似乎存在一些问题。首先公式6和公式16对不上，到底\Theta是应该乘在哪里不明确，按描述似乎公式16是符合图2的。公式8这个min函数比较的对象似乎不对，标量1和矩阵进行比较，个人觉得这个形式有问题。我的理解是k>1返回power，不大于1返回单位矩阵。其次item的2阶可达用户构造的incidence matrix (i.e. HH^TH)已经十分密集了，就文章所用到的movielens数据集而言，2阶Incidence matrix基本上非0的已经很少了。item数量稍大的情况下，需要存储一个size为M^2的dense matrix，没有扩展性可言，运算量也较大。为每个batch单独构造一个incidence matrix虽然可行，但对比LightGCN这样的方法，效率低下。
文章构造hyperedge的方法也非常heuristic，divide-and-conquer只在摘要引言和结论出现过，怎么体现到方法中的，感觉也没有讲。我个人按论文复现后按作者的实验设置放到LastFM数据集上运行，效果也远逊NGCF。不知道您怎么看这篇文章以及DHCF这个方法。

chenchongthu / enmf Goto Github PK

enmf's People

Contributors

Stargazers

Watchers

Forkers

enmf's Issues

After training by the default args, the result isn't good

ml-1m数据集的validation data

The nDCG formula may not consistent between your paper and implementation?

Recall指标对齐

Can not reproduce the results of ENMF on ml-lcfn dataset as claimed in README

想请问下您有没有考虑实现过非负采样+lightGCN这样的模型呢？实验效果会比MF为基础模型好吗？

ml-1m数据集结果不一致

DHCF实现细节讨论

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent