leavingseason / xdeepfm Goto Github PK

View Code? Open in Web Editor NEW

739.0 739.0 223.0 22.46 MB

Python 100.00%

xdeepfm's People

Stargazers

Watchers

Forkers

wubinzzu shafiahmed chenjie04 hhh920406 tandychao preacle zgcgreat tornadozou leonids2012 xiongfeihtp timovehvilainen chrispher syncworld ghdeng1992 zzzzzigzag duoan eisber wutenghu heymind xuelun magicsen 2012fang wind-meta gstanding alecs12 niuyechangge mvijaikumar chenghu17 abnering jerrycatleung shixw1991 xxyy1 zhouyonglong leempan blancyin frankfqchen jxz542189 yipeng5 zirenzhixin jufangshen zhqgui running-bad-ai ericyue shaoguangcheng wangkanger yueyedeai mauriziofd weidong8405347 caibinbupt lengzi brianlv gegetang mysqlsc theoneac githubbayes zoumingithub gccrpm fword zoeson demonsong recomddn wzhang1 dsivaji shuiliwanwu rugby0823 awalkinclouds databill86 zscdumin xiaodasun yangyingxiang 605883732 soonhwan-kwon ubear fengyue95 cedo00 haoyawhl suyangshuo uzeroj huizou3 ai-awesome-repos azizilyosov shubhampachori12110095 1508816494 zhuangjiayue sh1ng uctoronto fishexpert ningshiqi kummar sumitsidana excuses123 shyjin fancycheung gavinwb cnfsll yanggang12311 adangadang yuyichen09 cxtjjcz codemanyep

xdeepfm's Issues

Bug for `_build_embedding` in class ExtremeDeepFMModel

@Leavingseason
Seems there's a bug for function _build_embedding:

def _build_embedding(self, hparams):
    fm_sparse_index = tf.SparseTensor(self.iterator.dnn_feat_indices,
                                      self.iterator.dnn_feat_values,
                                      self.iterator.dnn_feat_shape)
    fm_sparse_weight = tf.SparseTensor(self.iterator.dnn_feat_indices,
                                       self.iterator.dnn_feat_weights,
                                       self.iterator.dnn_feat_shape)
    w_fm_nn_input_orgin = tf.nn.embedding_lookup_sparse(self.embedding,
                                                        fm_sparse_index,
                                                        fm_sparse_weight,
                                                        combiner="sum")
    embedding = tf.reshape(w_fm_nn_input_orgin, [-1, hparams.dim * hparams.FIELD_COUNT])
    embedding_size = hparams.FIELD_COUNT * hparams.dim
    return embedding, embedding_size

You do a reshape after the tf.nn.embedding_lookup_sparse, let me imitate a simple example:

import tensorflow as tf
import numpy as np
dim=3
FIELD_COUNT=4
batch_size=3
ids = tf.SparseTensor(indices=[[0,0], [0,3], [1,1],[2,1]], values=[1, 3, 6, 3], dense_shape=[batch_size,FIELD_COUNT])
sp_weights = tf.SparseTensor(indices=[[0,0], [0,3], [1,1],[2,1]], values=[1,1,1,1], dense_shape=[batch_size,FIELD_COUNT])
params = tf.constant([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[10,20,30],[40,50,60],[70,80,90]])
embed = tf.nn.embedding_lookup_sparse(params, ids, sp_weights, combiner="sum")
sess = tf.Session()
sess.run(embed)

# array([[14, 16, 18],
#       [70, 80, 90],
#       [10, 11, 12]], dtype=int32)

embedding = tf.reshape(embed, [-1, dim * FIELD_COUNT])
sess.run(embedding)

We will get the following errors when running embedding with tf.Session:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 9 values, but the requested shape requires a multiple of 12
	 [[Node: Reshape_3 = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_lookup_sparse_2, Reshape_3/shape)]]

About the utils/util.py

I can't find the util.py in the repo. Only in the utils/pycache there is pyc file . Thanks~

about experiment

I try to use Criteo on the deepFM with Adjusting the parameters, but the logloss is 0.46.

So I want to ask, whether the Criteo need pretreatment before use convert_to_ffm.py.

Such as preprocessing when generate input.csv、train.csv、eval.csv

Thank you very much

some questions

Can you explain the meaning of the dataset. I am a little confused

how to run this model in the movieslen25m dataset？

About experiment in paper

About Criteo Dataset, how to preprocessing the dataset when you experimented. I try to build a model for Criteo Dataset, how to deal with the dataset in the form of filed-wise format. Thank you!

IO/din_cache is mssing

it complained above module could not be found when trying to run main.py.

requirements.txt

We should add a requirements.txt which lists all the ~24 dependencies needed to run the code. If you're using a virtual environment, you can create it by running pip freeze > requirements.txt.

About _cross_l_loss(self, hparams)

In base_model.py
def _cross_l_loss(self, hparams):
cross_l_loss = tf.zeros([1], dtype=tf.float32)
for param in self.cross_params:
cross_l_loss = tf.add(cross_l_loss, tf.multiply(hparams.cross_l1, tf.norm(param, ord=1)))
cross_l_loss = tf.add(cross_l_loss, tf.multiply(hparams.cross_l2, tf.norm(param, ord=1)))
return cross_l_loss

I wonder on Why hparams.cross_l2 doesn't multiply with tf.norm(param, ord=2)?

function _build_extreme_FM_quick in exDeepFM.py file

hparams.logger.info("split connect")
if idx != len(hparams.cross_layer_sizes) - 1:
next_hidden, direct_connect = tf.split(curr_out, 2 * [int(layer_size / 2)], 1)
final_len += int(layer_size / 2)
else:
direct_connect = curr_out
next_hidden = 0
final_len += layer_size
field_nums.append(int(layer_size / 2))

I want to know why does "next_hidden, direct_connect = tf.split(curr_out, 2 * [int(layer_size / 2)], 1)" this action

Some problem of code with paper

In the code exDeepFM line 227- line 229, I find the curr_out tensor will be splited to hidden_layer and direct_connect. Hidden_layer
will be used in next layer, but not used in softmax function for generating output. But this is not mentioned in the paper.

error of tensorflow

what's the version of tensorflow?
it raised an error "InvalidArgumentError (see above for traceback): Segment id -1 out of range [0, 3583), possibly because 'segment_ids' input is not sorted."

CIN Con1d problems

In paper， section 3.2 CIN analysis,

the number of weight at k-th layer is h_k * h_{k-1} *m.
Howerver in you code, you use the con1d which has the weight of h_k * h_{k-1} *m * d.

should we use the tf.nn.depthwise_conv1d but the con1d, hope your responses

How to train the whole criteo dataset

The criteo training set(http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) has 45,000,000 examples. I found many github deepfm codes only sample small amount of training examples. How many training examples do you use to achieve the auc in your paper? If training all of them, it's not possible to load all data into memory.

there is a problem on the function "check_tensorflow_version()"

def check_tensorflow_version():
if tf.version < "1.2.0":
raise EnvironmentError("Tensorflow version must >= 1.2.0,but version is {0}".
format(tf.version))

when the version about TensorFlow is "1.10.0", the function doesn't work well

A question about multi-GPU training

According to your paper, you used 4 GPUs together for the training. Did you update the model parameters across GPUs synchronously or asynchronously?

what does the arguments 'reduce_D' and 'f_dim' mean?

sorry, i don't understant the meaning of arguments 'reduce_D' and 'f_dim', can you give me any help? thanks a lot!
def _build_extreme_FM(hparams, nn_input, res=True, direct=False, bias=False, reduce_D=True, f_dim=2)

About get_feat in convert_ffm_process.py

Hi, I want to know whether the code below is a general solution for continuous value, such as train_df[field_name] = [-8.9, -7.3, -2, -1.9, -0.1, 0.6, 2, 3, 8.8, 12.8]. Thank you so much!
if val == '': featSet.add(str(key) + '#' + 'absence') else: val = int(float(val)) if val > 2: val = int(math.log(float(val)) ** 2) else: val = 'SP' + str(val)

how to predict?

got some troubles about the progress of prediction, someone help me out, plz

leavingseason / xdeepfm Goto Github PK

xdeepfm's People

Stargazers

Watchers

Forkers

xdeepfm's Issues

Recommend Projects

Recommend Topics

Recommend Org