ynqa / wego Goto Github PK
View Code? Open in Web Editor NEWWord Embeddings in Go!
License: Apache License 2.0
Word Embeddings in Go!
License: Apache License 2.0
Hello,
I have not tried this out yet (I am excited to) but was wondering how this dealt with very large datasets, particularly those larger than RAM. Can wego handle this use case?
Thanks,
Glen
Could you make Neighbor word and similarity accessible?
Do you think it is possible to use word-embedding
for French? The difficulty is to find a corpus in French.
What do you think ?
I raised this issue on the gorgonia repo but the fix does not solve the problem. Any help appreciated.
..\gorgonia.org\gorgonia\walker.go:43:43: cannot use g (type *ExprGraph) as type graph.Directed in argument to topo.SortStabilized:
*ExprGraph does not implement graph.Directed (wrong type for Edge method)
have Edge(graph.Node, graph.Node) graph.Edge
want Edge(int64, int64) graph.Edge
..\gorgonia.org\gorgonia\walker.go:53:33: cannot use g (type *ExprGraph) as type graph.Directed in argument to topo.Sort:
*ExprGraph does not implement graph.Directed (wrong type for Edge method)
have Edge(graph.Node, graph.Node) graph.Edge
want Edge(int64, int64) graph.Edge
I ran the example with the text8 file and i get the following error
"read 17005207 words 2.4695522s
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc00029a008)
c:/go/src/runtime/sema.go:56 +0x49
sync.(*WaitGroup).Wait(0xc00029a000)
c:/go/src/sync/waitgroup.go:130 +0x6b
github.com/ynqa/wego/pkg/model/word2vec.(*word2vec).batchTrain(0xc00007c100, 0x2710, 0xa)
D:/FMI/golang_workspace/src/github.com/ynqa/wego/pkg/model/word2vec/word2vec.go:176 +0x14c
github.com/ynqa/wego/pkg/model/word2vec.(*word2vec).Train(0xc00007c100, 0x2c32e0, 0xc000006028, 0x0, 0xc000006028)
D:/FMI/golang_workspace/src/github.com/ynqa/wego/pkg/model/word2vec/word2vec.go:128 +0x5af
main.main()
D:/FMI/golang_workspace/src/project/main.go:71 +0x1cb
goroutine 50 [chan receive]:............................................."
and so on...
I have tried my own files with the same format as the one in text8 but I get the same mistake at some time during the execution.
Am I doing something wrong or there is some kind of issue with the example?
Along the way this also means fixing up the serialization strategy
go get -u github.com/ynqa/word-embedding
package github.com/chewxy/gorgonia/tensor: cannot find package "github.com/chewxy/gorgonia/tensor" in any of:
.../chewxy/gorgonia/tensor (from $GOROOT)
.../src/github.com/chewxy/gorgonia/tensor (from $GOPATH)
Currently, it is not possible to flush the output to other sinks than file by providing its name as an argument of Save(outputFile string)
. Would it make a sense to provide a brother Save signature like the following one?
func (w *<Model>) Save(output io.Writer) error
This would handle files as well.
I would love to help with the implementation.
Could you add also doc2vec (aka paragraph vectors)?
I was noticing some numerical issues yesterday while doing some profiling on this code. I seriously think we can get a speed boost by pre-calculating sigmoids and using a sigmoid table.
I used python to train a word2vec package. I want to use golang for servitization, but I don't know how to load the package I have trained,Anybody have any experience with that? pls help me,thanks very much
Dear,
is it possibile to do sentence similarity using embedding ?
Current implementation of Cbow use a channel of type []float64 for data exchange, but every trainOne function call needs to send and receive two []float64 vectors. Such communications are non-atomic. When multi-threads race at this channel, a pair of such vectors could be separated by another pair of vectors from other threads.
The Cbow training Example of this package could freeze due to this issue. It suspends during training and cannot proceed.
Actually, all Cbow trainings using this package could freeze.
pkg/model/word2vec/model.go, line 80 ~ 83 (the data structure):
type cbow struct {
ch chan []float64
window int
}
pkg/model/word2vec/model.go, line 103 ~ 107 (the go routine):
agg, tmp := <-mod.ch, <-mod.ch
defer func() {
mod.ch <- agg
mod.ch <- tmp
}()
See, these sends and receives are non-atomic.
We could define an auxiliary struct to avoid such race conditions, such as this:
type cbowToken struct {
agg []float64
tmp []float64
}
Revise the data structure of Cbow as:
type cbow struct {
ch chan cbowToken
window int
}
Then in the go routine:
token := <-mod.ch
agg, tmp := token.agg, token.tmp
defer func() {
token := cbowToken{agg, tmp}
mod.ch <- token
}()
In this way, all sends and receives at this channel is atomic. Race conditions are eliminated.
I am tryign to save my model every epoch, how can I achieve this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.