roshanson / textinfoexp Goto Github PK
View Code? Open in Web Editor NEW自然语言处理实验(sougou数据集),TF-IDF,文本分类、聚类、词向量、情感识别、关系抽取等
自然语言处理实验(sougou数据集),TF-IDF,文本分类、聚类、词向量、情感识别、关系抽取等
如题
楼主能把完整的文件发上来吗?运行的时候都缺少文本文件。
hi, 你好
sogou的开放语料质量不错,wikidata也不错,下面是我做的一个word2vec模型。
https://github.com/huyingxi/Synonyms
欢迎对比和使用,一起优化,谢谢!
对此处给出的相似度计算方法:
https://github.com/Roshanson/TextInfoExp/tree/master/Part4_Word_Similarity/get_similarity
我们可以一起评测一下:
Synonyms使用https://github.com/fssqawj/SentenceSim/blob/master/train.txt 来寻找最佳的模型参数,然后在 https://github.com/fssqawj/SentenceSim/blob/master/dev.txt 达到了 88%的准确度。
详见:chatopera/Synonyms#6
我想找点其它类别的数据来训练,请问在哪找啊?或者有什么标注的方法?
谢谢。
想问下采用的是哪里的搜狗数据集作为训练用的,谢谢
新手,有点搞晕了
part4 词向量训练的语料完全木有说明,语料方便的话你上传一下,不方便的话,你好歹说明一下啊,比如用的什么语料,下载连接之类的?
本章 获取数据和标记中代码如下:
data = pd.read_table('Art.txt', header=None, sep=',')
data2 = pd.read_table('Computer.txt', header=None, sep=',')
data3 = pd.read_table('Sports.txt', header=None, sep=',')
但是在代码和相关资源中并未发现art.txt等三个文件,请问这三个文件是否可以上传一下?谢谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.