Coder Social home page Coder Social logo

iap's Introduction

IAP

该项目使用爬虫得到的语料进行聚类 语料中包含关键词,关键句,关键段

关键词的聚类

该部分首先使用Word2vec训练词向量,之后使用训练得到的词向量进行聚类,由于给的语料较少,故使用搜狗实验室的语料来进行训练,算法使用ap和kmeans,使用新的语料和ap算法得到的效果很好 其中改进了sklearn中ap算法的部分,该部分在cluster_algos.py文件中展示,将向量之间的求距离由欧式距离改为了余弦距离 后期使用ap算法来衡量聚类的效果

句的聚类

该部分首先将关键句中的停用词去掉,之后使用tf-idf筛选出关键词,筛选出的关键词数可指定,将筛选出的关键词词向量相加之后求均值作为该关键句的句向量 聚类时,使用句向量来进行,取得了不错的效果 同时也使用one-hot向量来表示每个句子,并使用one-hot向量来进行聚类,但效果不佳

段落的聚类

段落的聚类和句的聚类做法相同

后续的计划

后续打算使用CNN为句子和段落提取特征,CNN参数由另一个数据训练得到,使用训练得到的CNN迁移学习来进行对本项目的语料进行聚类,还可以增加attention来增强特征提取效果 此外,还将使用ap进行再聚类,对聚类完毕之后的词提取摘要,并增加自己定义的评估标准来衡量聚类效果

iap's People

Contributors

jingchunzhen avatar

Watchers

 avatar  avatar

Forkers

nickjames21119

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.