Coder Social home page Coder Social logo

ijcai18-mama-ads-competition's Introduction

天池阿里妈妈搜索广告初赛方案

比赛介绍

依赖包

sklearn、lightgbm、catboost、pandas、numpy、matplotlib、pickle、h5py、tqdm

运行环境

jupyter + python3.5

运行准备

创建目录:mkdir input && mkdir cache && mkdir feats && mkdir rests

数据预处理:

添加时间维度特征合并数据后存为pkl格式。源码

特征工程:

  • 1、通用点击率 user源码 item源码 shop源码
  • 2、各维度时间加权后的点击量 源码
  • 3、平滑后的CVR 源码
  • 4、target特征处理,(均值、方差、标准差等描述性统计特征)源码
  • 5、item、user、shop维度下各个原始level特征(如item_price_level)的描述性统计特征 源码
  • 6、user最后1~2次的行为特征 源码

模型

使用了LigthGBM和CatBoost(训练太慢,performance很好),因为自己懒,这两个模型都可以很好的自动处理类别型特征。每个算法训练2个模型,第2个模型是删除掉在第一个模型评估出不重要的特征进行训练。 具体见源码

总结

感谢组委会提供这么好的竞赛机会通大神们一起学习。虽然是自己第一次参加这类数据竞赛成绩也不是很理想,不过真实的学到了很多知识,有读了很多paper,CTR预估模型千千万(LR、GBDT、Wide&Deep、FM&FFM、DeepFM...),但是后来还是只用了简单的模型,精力太有限了。我把所有的功夫都花在了特征处理上,如果没有好的特征和数据处理喂进去,即使再牛逼的模型也是白瞎。关于特征,一开始我就用原始的特征,然后一点点加进去,看feature重要度,然后再根据重要度进行延伸,同时也考虑特征的多样性。用模型进行验证后去掉没有用的特征,不断迭代。

相关链接

ijcai18-mama-ads-competition's People

Contributors

duoan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ijcai18-mama-ads-competition's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.