According to item2idx, we can know that the index corresponding to is max_item_num. M

The problem with the <UNK> in the oitems variable returned by the skip-gram function in the Item2Vec model about daisyrec HOT 3 CLOSED

amazingdd commented on May 26, 2024

The problem with the in the oitems variable returned by the skip-gram function in the Item2Vec model

from daisyrec.

Comments (3)

AmazingDD commented on May 26, 2024

daisyRec/daisy/utils/data.py

Line 192 in 421d16a

self.wc = {self.unk: 1}

I think you can check this code
I just let it to be one of the items and give it an index, e.g. 1
then I let the other actual items to be categorized subsequently into the item2idx dict.
P.S. In dev branch, we donnot focus on Item2Vec model,
but you can see the demo of item2vec in my master branch
here is the link (https://github.com/AmazingDD/daisyRec/blob/master/test_kit/run_item2vec.py)

from daisyrec.

ACnoWA commented on May 26, 2024

I think you have forgotten your code a bit.
The value of the dictionary self.wc represents the number of occurrences of the item.
The index corresponding to the item in item2idx corresponds to its position sorted in descending order of the number of occurrences. So the index of self.unk will be max_item_num, which will cause errors in subsequent processing.
And can I ask you why to add this item to the return value oitems of skip-gram?
Please don't hesitate to enlighten me！Thank you very much

from daisyrec.

AmazingDD commented on May 26, 2024

I just saw my code again, I found the self.wc represents the word count(number of appearance frequency of each item)
at first, I think this model might be used in a wider range, so the items existed in the known dataset may not be enough for any new item appearing in the future. Therefore, I think it's reasonable to create an unknown fake item in order to depict this situation. Anyway, it only count for 1 forever.

daisyRec/daisy/utils/data.py

Line 199 in 421d16a

self.idx2item = sorted(self.wc, key=self.wc.get, reverse=True)[:max_item_num]

I guess this code will depict my original thought if it change like this:
self.idx2item = sorted(self.wc, key=self.wc.get, reverse=True)[:max_item_num + 1]
so that it can not only contain all the known items, but also this fake item. Then the following code is just similar to Word2Vec, like the other repositories.
Besides, as I mentioned before, I didn't focus on item2vec in our paper, so this code is only a toy implementation and even have no interface in main.py. To be honest, I didn't delete these code just because I think it might be regrettable XD.
But if you have any idea or optimization, I'd really like to merge your push request!

from daisyrec.

The problem with the <UNK> in the oitems variable returned by the skip-gram function in the Item2Vec model about daisyrec HOT 3 CLOSED

Comments (3)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent