Coder Social home page Coder Social logo

word-embedding's Introduction

Word Embedding in Go

Build Status GoDoc Go Report Card

Wego is the implementations for word embedding (a.k.a word representation) models in Go. Word embedding makes word's meaning, structure, and concept mapping into vector space with low dimension. For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, models generate word vectors that could calculate word meaning by arithmetic operations for other vectors.

Wego provides CLI that includes not only training model for embedding but also similarity search between words.

Models

๐ŸŽƒ Word2Vec: Distributed Representations of Words and Phrases and their Compositionality [pdf]

๐ŸŽƒ GloVe: Global Vectors for Word Representation [pdf]

๐ŸŽƒ LexVec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations [pdf]

Why Go?

Data Science in Go @chewxy

Installation

$ go get -u github.com/ynqa/wego
$ bin/wego -h

Demo

Run the following command, and start to download text8 corpus and train them by Word2Vec.

$ sh demo.sh

Usage

Usage:
  wego [flags]
  wego [command]

Available Commands:
  glove       GloVe: Global Vectors for Word Representation
  help        Help about any command
  lexvec      Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
  repl        Search similar words with REPL mode
  search      Search similar words
  word2vec    Word2Vec: Continuous Bag-of-Words and Skip-gram model

Flags:
  -h, --help   help for wego

For more information about each sub-command, see below: word2vec, glove, lexvec, search, repl

File I/O

Input

Input corpus requires the format that is divided by space between words like text8 since wego parse with scanner.Split(bufio.ScanWords).

Output

Wego outputs a .txt file that is described word vector is subject to the following format:

<word> <value1> <value2> ...

Example

It's also able to train word vectors using wego APIs. Examples are as follows.

package main

import (
	"os"

	"github.com/ynqa/wego/builder"
	"github.com/ynqa/wego/model/word2vec"
)

func main() {
	b := builder.NewWord2vecBuilder()

	b.Dimension(10).
		Window(5).
		Model(word2vec.CBOW).
		Optimizer(word2vec.NEGATIVE_SAMPLING).
		NegativeSampleSize(5).
		Verbose()

	m, err := b.Build()
	if err != nil {
		// Failed to build word2vec.
	}

	input, _ := os.Open("text8")

	// Start to Train.
	if err = m.Train(input); err != nil {
		// Failed to train by word2vec.
	}

	// Save word vectors to a text file.
	m.Save("example.txt")
}

word-embedding's People

Contributors

chewxy avatar mistidoi avatar mrngm avatar ynqa avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.