Comments (4)
Each row of the Avito data csv file contains:
<target label><target one-hot fts><target mul-hot fts><ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>
There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 2*5)*(1+5+5+5) = 561 columns in the csv file.
The 25 one_hot fts are:
bias, ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_bin
The 2 multi-hot fts are:
search_params, ad_params
from dstn.
'bias' is simply 1. It is redundant if your model has a bias parameter.
from dstn.
Each row of the Avito data csv file contains:
<ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 25)(1+5+5+5) = 561 columns in the csv file.The 25 one_hot fts are:
ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_binThe 2 multi-hot fts are:
search_params, ad_params
Thanks for your patience!:)
from dstn.
Each row of the Avito data csv file contains:
<ctxt1 one-hot fts, ctxt2 one-hot fts, ..., ctxt5 one-hot fts><ctxt1 multi-hot fts, ctxt2 multi-hot fts, ..., ctxt5 multi-hot fts><clk1 one-hot fts, clk2 one-hot fts, ..., clk5 one-hot fts><clk1 multi-hot fts, clk2 multi-hot fts, ..., clk5 multi-hot fts><unclk1 one-hot fts, unclk2 one-hot fts, ..., unclk5 one-hot fts><unclk1 multi-hot fts, unclk2 multi-hot fts, ..., unclk5 multi-hot fts>There are n_one_hot_slot = 25 one-hot fts and n_mul_hot_slot = 2 multi-hot fts (with max_len_per_slot = 5; i.e., each multi-hot ft has 5 values; e.g., the multi-hot ft "search_params" may contain param a, param b, ..., param e) for each ad.
Therefore, there are totoally 1 + (25 + 25)(1+5+5+5) = 561 columns in the csv file.The 25 one_hot fts are:
bias, ad_id, position, ip_id, user_id, is_user_logged_on, search_query_keyword, search_loc_id, search_loc_level, search_region_id, search_city_id, search_cate_id, search_cate_level, search_par_cate_id, search_sub_cate_id, user_agent_id, user_agent_os_id, user_device_id, user_agent_family_id, ad_title_keyword, ad_cate_id, ad_cate_level, ad_parent_cate_id, ad_sub_cate_id, hist_ctr_binThe 2 multi-hot fts are:
search_params, ad_params
One more quetion...what does the 'bias' (the first one-hot feature) refer to? It seems not included in the Avito dataset.
from dstn.
Related Issues (11)
- 关于tf.exp问题 HOT 1
- 关于配置文件中参数的问题 HOT 1
- 关于论文contextual ads的思考
- How to generate unclicked sequence for a user? HOT 1
- I'm wondering what the ”tradeoff“ is HOT 2
- What is the meaning of search_loc_level and search_cate_level in one-hot fts? Which attribute is in the Avito data file? HOT 2
- Can you share the code for processing the data set HOT 1
- 请问Search advertising dataset和News feed advertising dataset,需要怎么处理才能在dstn中使用? HOT 1
- 我在读这篇论文的时候有一些疑惑,希望您能在百忙之中解答一下 HOT 2
- 您好,不好意思打扰到您了,我想问一下数据集的问题 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dstn.