Coder Social home page Coder Social logo

w-zm / python-sentence2vec Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 1.0 27 KB

This tool provides some implementations of sentence to vector. (sentence2vec)

License: Apache License 2.0

Python 100.00%
sentence-embeddings embeddings sentence2vec python nlp sif word-embedding sentence

python-sentence2vec's Introduction

sentence2vec

一个将句子转化为向量表征的工具库,并集成一些常用的算法。参考sklearn库的用法,尽可能地做到简单使用,后续会持续更新。

输入:句子组成的list,如:['I like natural language processing', ..., 'This is an example']

输出:[[0.1, 0.1, ..., 0.1], ..., [0.1, 0.1, ..., 0.1]]

依赖

  • python 3.6
  • numpy 1.17.0
  • gensim 3.6.0
  • scikit-learn 0.21.2

上述版本号仅供参考。

当前实现

Model Year Status Reference
SIF[1] (smooth inverse frequency) 2016 Finished https://github.com/PrincetonML/SIF
CPM[2] (concatenated power mean) 2018 Plan None

实例

见example_sif.py

example_sif.py:

from sentence2vec.utils import glove2w2v
from sentence2vec.SIF import SIF

######## 转换向量格式 ########
# 由于使用gensim的api进行转换,因此请填写绝对路径
glove_file = 'C:/data/glove.840B.300d.txt'    # download from https://nlp.stanford.edu/projects/glove/
w2v_file = 'C:/data/glove_w2v.840B.300d.txt'
glove2w2v(glove_file, w2v_file)
################################

sentences = ['I like natural language processing', 'This is an example']   # 所有句子list
weight_file = './data/weight_file.txt'   # 权重存储路径
weight_para = 1e-3   # 参考论文
rmpc = 1   # 参考论文

sif = SIF(sentences, w2v_file, weight_file, weight_para, rmpc)
sentences_embedding = sif.transform()
print(len(sentences_embedding), len(sentences_embedding[0]))

Reference

[1] Arora S, Liang Y, Ma T. A simple but tough-to-beat baseline for sentence embeddings[J]. 2016.

[2] Rücklé A, Eger S, Peyrard M, et al. Concatenated power mean word embeddings as universal cross-lingual sentence representations[J]. arXiv preprint arXiv:1803.01400, 2018.

To-Do

  • pip install
  • more models

Other

python-sentence2vec's People

Contributors

w-zm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sparkingarthur

python-sentence2vec's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.