Coder Social home page Coder Social logo

snowball's Introduction

Snowball Stemmer for Go

Go Reference Test

Usage

package snowball_test

import (
	"fmt"

	"github.com/tebeka/snowball"
)

func Example() {
	stemmer, err := snowball.New("english")
	if err != nil {
		fmt.Println("error", err)
		return
	}
	defer stemmer.Close()

	fmt.Println(stemmer.Stem("worked"))
	fmt.Println(stemmer.Stem("working"))
	fmt.Println(stemmer.Stem("works"))
	// Output:
	// work
	// work
	// work
}

This project was mostly a learning exercise for me, I don't consider it production quality.

Development

If you want to update the underlying C library, run update-c.sh. Make sure to run the tests after.

snowball's People

Contributors

tebeka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

snowball's Issues

It is not working for linux or docker

This is an amazing implementation, congrats!
But i'm trying to use it in a project and everything works perfect in MacOS, but when i try to use it in linux or docker i receive the following error

/usr/local/go/src/runtime/internal/sys/consts.go:18:7: DefaultPhysPageSize redeclared in this block
	/usr/local/go/src/runtime/internal/sys/arch_amd64.go:10:2: other declaration of DefaultPhysPageSize
/usr/local/go/src/runtime/internal/sys/consts.go:22:7: PCQuantum redeclared in this block
	/usr/local/go/src/runtime/internal/sys/arch_amd64.go:11:2: other declaration of PCQuantum
/usr/local/go/src/runtime/internal/sys/consts.go:25:7: Int64Align redeclared in this block
	/usr/local/go/src/runtime/internal/sys/arch_amd64.go:12:2: other declaration of Int64Align
/usr/local/go/src/runtime/internal/sys/consts.go:32:7: MinFrameSize redeclared in this block
	/usr/local/go/src/runtime/internal/sys/arch_amd64.go:13:2: other declaration of MinFrameSize
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:9:5: len8tab redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:76:7: other declaration of len8tab
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:28:5: ntz8tab redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:25:7: other declaration of ntz8tab
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:48:6: Len64 redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:99:6: other declaration of Len64
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:66:7: m0 redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:117:7: other declaration of m0
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:67:7: m1 redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:118:7: other declaration of m1
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:68:7: m2 redeclared in this block
	/usr/local/go/src/runtime/internal/sys/intrinsics.go:119:7: other declaration of m2
/usr/local/go/src/runtime/internal/sys/intrinsics_common.go:68:7: too many errors
# math
/usr/local/go/src/math/acosh.go:43:6: Acosh defined in both Go and assembly
/usr/local/go/src/math/asin.go:20:6: Asin defined in both Go and assembly
/usr/local/go/src/math/asin.go:58:6: Acos defined in both Go and assembly
/usr/local/go/src/math/asinh.go:40:6: Asinh defined in both Go and assembly
/usr/local/go/src/math/atan.go:96:6: Atan defined in both Go and assembly
/usr/local/go/src/math/atan2.go:30:6: Atan2 defined in both Go and assembly
/usr/local/go/src/math/atanh.go:48:6: Atanh defined in both Go and assembly
/usr/local/go/src/math/cbrt.go:26:6: Cbrt defined in both Go and assembly
/usr/local/go/src/math/erf.go:189:6: Erf defined in both Go and assembly
/usr/local/go/src/math/erf.go:274:6: Erfc defined in both Go and assembly
/usr/local/go/src/math/erf.go:274:6: too many errors
github.com/tebeka/snowball: build constraints exclude all Go files in /home/ubuntu/go/pkg/mod/github.com/tebeka/[email protected] ```

Various fatal error panics when using in goroutines

Hey there. I've been trying to use this snowball stemmer implementation and with linear code it works great. But, unfortunately when I'm trying to wrap it up into goroutines I get these seemingly random panics with different sorts of errors. Perhaps, I'm doing something wrong, but the issue is really easy to reproduce, which I've done by cloning project and writing the following example next to the existent one:

import (
	"fmt"
	"sync"

	"github.com/tebeka/snowball"
	"github.com/zhexuany/wordGenerator"
)

func ExampleStemInGoroutines() {
	var wg sync.WaitGroup

	stemmer, err := snowball.New("english")
	if err != nil {
		fmt.Println("error", err)
		return
	}

	words := wordGenerator.GetWords(500, 20)
	stemmas := make([]*string, len(words))

	for i, word := range words {
		wg.Add(1)

		go func(i int, word string) {
			defer wg.Done()
			stemma := stemmer.Stem(word)
			stemmas[i] = &stemma
		}(i, word)
	}

	wg.Wait()
	fmt.Println("test")

	// Output:
	// test
}

Notice: on a smaller number of words the issues won't reveal themselves, so I had to use github.com/zhexuany/wordGenerator to demonstrate when and how it emerges.

Here are just some errors I'm getting:

fatal error: unexpected signal during runtime execution     
snowball.test(2553,0x700001f1f000) malloc: *** set a breakpoint in malloc_error_break to debug                          
[signal SIGSEGV: segmentation violation code=0x1 addr=0x5afffea pc=0x7fff70574b49]                                      

runtime stack:                                              
runtime.throw(0x416fb4b, 0x2a)                              
        /usr/local/Cellar/go/1.14.1/libexec/src/runtime/panic.go:1114 +0x72                                             
runtime.sigpanic()                                          
        /usr/local/Cellar/go/1.14.1/libexec/src/runtime/signal_unix.go:679 +0x46a
snowball.test(4038,0x70000950a000) malloc: Region cookie corrupted for region 0x9c00000 (value is 7277)[0x9c0407c]                                                                                                                              
snowball.test(4038,0x70000950a000) malloc: *** set a breakpoint in malloc_error_break to debug                                                                                                                                                  
SIGABRT: abort                                                                                                                                                                                                                                  
PC=0x7fff704c633a m=7 sigcode=0     
panic: runtime error: gobytes: length out of range

goroutine 121 [running]:
github.com/tebeka/snowball._Cfunc_GoBytes(...)
        _cgo_gotypes.go:63
github.com/tebeka/snowball.(*Stemmer).Stem.func4(0x5c04338, 0xfffffffe, 0xc000099a00, 0x9, 0x5c04338)
        /Users/smileart/Sync/Projects/snowball/snowball.go:73 +0x59
github.com/tebeka/snowball.(*Stemmer).Stem(0xc00008e020, 0xc000099a00, 0x9, 0x0, 0x0)
        /Users/smileart/Sync/Projects/snowball/snowball.go:73 +0xcb
github.com/tebeka/snowball_test.ExampleStemGoroutines.func1(0xc000098020, 0xc00008e020, 0xc0000cc000, 0x1f4, 0x1f4, 0x56, 0xc000099a00, 0x9)
        /Users/smileart/Sync/Projects/snowball/example_test.go:41 +0x8b
created by github.com/tebeka/snowball_test.ExampleStemGoroutines
        /Users/smileart/Sync/Projects/snowball/example_test.go:38 +0x19a
exit status 2
FAIL    github.com/tebeka/snowball      0.207s

And so on. Also I don't provide all the stack traces cause the issue is really consistent and easy to reproduce with the code I provided above, so I guess it'd be easier for you to get them yourself. I'm not a C expert by any means, but I've seen somewhat similar symptoms discussed in many places (here are just a couple of them: link, link), so my guess would be that either it's my usage or C code which is wrong.

And also here's my Go ENV, JIC:

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/smileart/Library/Caches/go-build"
GOENV="/Users/smileart/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/smileart/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/smileart/Sync/Projects/snowball/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3x/b7d_4f997rxfrv1qb81hscp80000gn/T/go-build699026935=/tmp/go-build -gno-record-gcc-switches -fno-common"

Thanks in advance. And thank you for the project. ๐Ÿ‘Œ๐Ÿ™

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.