Coder Social home page Coder Social logo

2016ccf-sougou's Introduction

2016CCF-SouGou-大数据精准营销下的用户画像精准识别

在本次比赛中TNT_000团队获得了二等奖

团队方案如下

1.特征工程

TFIDF特征:对分词后的用户搜索词列表计算TFIDF矩阵,并通过卡方检验筛选出Top10万维特征

LDA特征:对分词后的用户搜索词列表以5为步长设置10~100的主题数目来训练LDA特征,通过连接得到用户向量

Word2vec特征:利用word2vec模型对分词后的用户搜索词列表训练100维的词向量,把每个用户搜索词的词向量做均值得到用户向量

Doc2vec特征:把分词后的用户搜索词列表当作一个文档,利用doc2vec模型训练100维的文档向量向量,得到用户向量

统计型特征:统计用户搜索词数量、英文搜索词数量等

2.模型

level1:使用TFIDF特征stack方式用LR、LinearSVC、MNB、BNB(Navie Bayes)模型得到result1

level2:使用result1和LDA、W2V、D2V和统计类型特征训练第二层模型,使用模型有CNN、XGBoost

文件目录 souGou

--1.src:源码文件存放文件夹

------excute.py:执行python文件构造count、pattern、lsi、lda、word2vec、doc2vec等特征并训练模型预测结果

------function.py:工程涉及函数逻辑python文件

------model.py:工程涉及模型逻辑python文件

------sougou_cnn_model.py:工程涉及CNN模型逻辑的python文件

------cnn.py 构造CNN stacking 特征

------fasttext.py 构造fasttext stacking特征

------tfidf.py 文件构造LR LinearSVC mnb bnb stacking特征

--2.data:存放工程数据文件夹

------data:存放原始文件和预处理后的文件

------word2vec_models:存放word2vec模型文件

--3.feature:存放特征文件夹

--4.result:存放预测结果文件夹

--5.model:存放模型结果文件夹

--6.importance:存放模型特征重要性文件夹

#---执行流程

1.配置好excute.py 里的config后 执行excute.py 先处理原始数据后再构造特征

2.执行cnn.py文件构造CNN stacking特征

3.执行fasttext.py文件构造fasttext stacking特征

4.执行tfidf.py文件构造LR LinearSVC mnb bnb stacking特征

5.构造count、pattern、lsi、lda、word2vec、doc2vec等特征

6.本地验证结果用cross-validation

7.在线预测结果提交

2016ccf-sougou's People

Contributors

abneryang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

2016ccf-sougou's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.