dl-rerank (alpha)

Deep learning powered personalized re-ranking solution

User Interest Modeling Strategy

Given item list needed for ranking, we use DIN (deep interest network) modeling user diverse interest, DIEN is another good solution for this problem, the problem with this solution is we need to do lots of engineering optimization to get good performance when we use RNN, may be SRU is a candidate solution.

Reference

DIN: Deep Interest Network for Click-Through Rate Prediction
DIEN: Deep Interest Evolution Network for Click-Through Rate Prediction
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

Item Modeling Strategy

After modeling user interest, given item targeted user vectorized representation and item list vectorized representation, and item click or not click label info. To precisely model (personalized user representation, item representation, context, label) relation, we need to consider item list info.

With item list info, we can compute each (personalized user representation, item representation)'s precise vectorized representation. Considering the computation budget we can apply dense tranformation before apply Transformer to do self-attention. We could use transformer to do user interest modeling also (BST).

Convolutional kernel give us another path to do self attention, we can finish this with Convolution, or Light Weight Convolution, or use Transformer and Light Convolution together which named by Long-Short Range Attention.

Reference

Transformer: Attention Is All You Need
PRM-Rerank: Personalized Re-ranking for Recommendation
BST: Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
ConvSeq2Seq: Convolutional Sequence to Sequence Learning
LightConv: Pay Less Attention with Light Weight and Dynamic Convolutions
LSRA: Lite Transformer with Long-Short Range Attention
GLU: GLU Variants Improve Transformer

Query & Item text Modeling

We modeling query item text field matching with Convolutional Neural Network

Reference

TextCNN: Convolutional Neural Networks for Sentence Classification
RankCNN: Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks

Multi-task learning

When developing a complicated machine learning application system, we need to consider multiple objectives, such as: click, add basket, buy etc. Multi-task learning give us a solution to simultaneously learn multiple objectives.

There exists two type of multi-task learning: hard parameter sharing, soft parameter sharing. Here we use MMoE2, which is a soft parameter sharing method, and considering we use transformer to modeling inter-item relation, transformer is computation very costly, we use transformer as shared bottom layer, this architecture has also been tested by MT-DNN.

For regression objectives, such as dwell time, considering its range isn't between 0 and 1, we have two methods to cope with it:

do log10 transformation on dwell time, then norm it with min-max normalization
bucketize dwell time, and transform regression problem to classification problem, and use predicted probabilities as class weight, and compute class's weighted sum value, take this value as the final result, then normalize it with the largest bucket's class. This method somehow similar to McRank

Performance (3 tasks)
hidden_size=256, kernel_size=3, batch_size=32, layer_num=3, filter_size=1024
qtxt_filters=32, qtxt_kernel_sizes='2,3', ttxt_filters=32, ttxt_kernel_sizes='2,3', ctxt_filters=16, ctxt_kernel_sizes='2,3'
hardware: (os) macos 10.13.4; (cpu) core i7 2.3 GHZ; (mem) 16GB

transformer	flatten transformer	lite transformer	light conv
21ms/sample	19.3ms/sample	20.8ms/sample	19.2ms/sample

Reference

Survey: An Overview of Multi-Task Learning in Deep Neural Networks
MMoE: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts
MMoE2: Recommending What Video to Watch Next: A Multitask Ranking System
SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning
MT-DNN: Multi-Task Deep Neural Networks for Natural Language Understanding
McRank: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Important Details

Position Bias Modeling

Training Phase: randomly mask 10% item's show position as unknown
Evaluation Phase: set item's show position as unknown
Modelling Strategy: using shallow tower do position bias modelling

Ranking Position Modeling

Item position: given by rank phase
Modelling Strategy: sum item position embedding to other item features

Embedding

support share embedding

Mini-batch aware Regularization

support mini-batch aware regularization for sparse categorical feature

Dimension Reduction
When modeling user behavior or item info, we usually use billions of categorical features, considering training & serving cost, we can do feature selection or use hash tricks to reduce each type of categorical feature dimension, or use them together. Here we implemented feature selection based modelling strategy, if we want to use hash tricks for feature reduction, we can use categorical_column_with_hash_bucket.

Engineering Related

XLA: support xla
Mixed Precision: support mixed precision, this feature can only be used with tf >=2.2.0
Distributed Training: support parameter-server distributed training strategy

wjj5881005 / dl-rerank Goto Github PK

dl-rerank's Introduction

dl-rerank (alpha)

User Interest Modeling Strategy

Item Modeling Strategy

Query & Item text Modeling

Multi-task learning

Important Details

Engineering Related

dl-rerank's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent