Coder Social home page Coder Social logo

go-pinyin's Introduction

Hi there 👋😆

go-pinyin's People

Contributors

bors-homu avatar dependabot-preview[bot] avatar gitter-badger avatar homu avatar huacnlee avatar levinit avatar mozillazg avatar wdscxsj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-pinyin's Issues

多音解析有问题

多音解析有问题
hans := "重"
a := pinyin.NewArgs()
a.Heteronym = true
fmt.Println(pinyin.Pinyin(hans, a))

结果:zhong chong tong 多出了一个tong

多音字判断是否正确

使用go-pinyin库做了json接口,测试了几个汉字,感觉不太对
config := pinyin.NewArgs()
config.Heteronym = true
config.Style = pinyin.Tone

{"han":"我她","pinyin":[["wǒ"],["tā","jiě","chí"]]}
{"han":"我大","pinyin":[["wǒ"],["dà","dài","tài"]]}
{"han":"我太","pinyin":[["wǒ"],["tài","tā"]]}

建议过滤相同的拼音返回

applyStyle时建议过滤相同的拼音,比如 不要音标时一个 '要'字会返回 [yao,yao,yao] ,建议返回[yao] 即可

许多多音字错误

setting := pinyin.NewArgs()

setting.Heteronym = true

setting.Style=pinyin.Tone2

fmt.Println(pinyin.Pinyin("广州深圳",setting))


返回值为

[[gua3ng ya3n a1n] [zho1u] [she1n] [zhe4n qua3n cho2u hua2i]]

很显然 ,多音字是不正确的

字典不匹配希望保留

希望对于字典没有匹配的字符,不要返回空切片,而是原样返回,这样对于一些汉字和其他字符混合的情况可以输出较全的信息

beego引入包之后debug编译不过

系统 macos10.15.3 go1.12.1

/usr/local/Cellar/go/1.12.1/libexec/bin/go build -o /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go -gcflags "all=-N -l" /Users/xuhaoxian/go/src/douyin-api/main.go #gosetup
"/Users/xuhaoxian/Library/Application Support/JetBrains/Toolbox/apps/Goland/ch-0/193.6015.58/GoLand.app/Contents/plugins/go/lib/dlv/mac/dlv" --listen=localhost:51018 --headless=true --api-version=2 --check-go-version=false exec /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go -- #gosetup
API server listening at: 127.0.0.1:51018
debugserver-@(#)PROGRAM:LLDB PROJECT:lldb-1100.0.28..1
for x86_64.
Got a connection, launched process /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go (pid = 46349).
dyld: malformed mach-o image: segment __DWARF has vmsize < filesize
Exiting.

Rewriting this module

Hi everyone, FYI, I'm planning to rewrite this module, so that

  • Lib: smaller footprint
    • separate the lib and CLI tool
    • moving all part unrelated the core go lib out of the package to somewhere else
  • CLI: Aiming more user friendliness
    • will support input from command line, or pipe, or file
    • allowing options that can be combined to be combined. E.g., in current pinyin/main.go allowing FirstLetter will then unable to choose Tone2, etc.
    • output will be more convenient for full text conversion -- this sample can illustrate clearly what I mean. I.e., for an input of say,

名著:《红楼梦》〖清〗曹雪芹 著、高鹗 续/『人民文学』出版社/1996—9月30日/59.70【元】,《三国演义》〖明〗罗贯中。

The output will be more like normal text thus more easy to read:

MingZhu:《HongLouMeng》〖Qing〗CaoXueQin Zhu、GaoZuo Xu/『RenMinWenXue』ChuBanShe/1996—9Yue30Ri/59.70【Yuan】,《SanGuoYanYi》〖Ming〗LuoGuanZhong。

Of course, the versatile command line option can allow people to easily choose what their final format is.

Once again, this is a great pinyin conversion library, which is why I'm choose to to base my work on, just I thought about submitting patches and re-write and realize that the changes will be so dramatic that a re-writing would be most appropriate.

I shall be finished within a month. For anyone interested in my re-writing, please let me know. Thanks!

繁体字 鉆 返回 chān

谢谢大大无私分享的项目,我使用时,数据主要是**繁体字
还没有完全测试,只是测了几个,发现
鉆 对应的拼音 "shuǎ",
我是在pinyin_dict.go里查的
0x29246: "shuǎ"
但是我程序输出的却是 “chān”
为啥嘞???

烦请作者添加英文原样输出支持,谢谢~

package converter

import "github.com/stretchr/testify/assert"

func (e *EncoderSuite) Test_ChineseToPinYinString() {
	grids := []struct {
		in       string
		sep      string
		expected string
	}{
		{in: "hello, **", sep: "", expected: "hello,zhongguo"},
	}

	for _, grid := range grids {
		actual := ChineseToPinYinString(grid.in, grid.sep)
		assert.Equal(e.T(), grid.expected, actual, "has error, want:%+v, but got: %+v, resource data value of the: %+v\n", grid.expected, actual, grid.in)
	}
}

undefined: gojieba.NewJieba

Getting this error running the example code:

go: finding github.com/mozillazg/go-pinyin v0.16.0
go: downloading github.com/mozillazg/go-pinyin v0.16.0
go: extracting github.com/mozillazg/go-pinyin v0.16.0
go: downloading github.com/yanyiwu/gojieba v1.1.0
go: extracting github.com/yanyiwu/gojieba v1.1.0
go: finding github.com/yanyiwu/gojieba v1.1.0

github.com/mozillazg/go-pinyin

../gopath503249456/pkg/mod/github.com/mozillazg/[email protected]/phrase.go:10:10: undefined: gojieba.NewJieba

配置多音字后发现有不属于该汉字的发音

我的代码如下:

hans := "一余五哎欧额"              
hans2 := "刘茜"         
hans3 := "王"
py := pinyin.NewArgs()
py.Heteronym = true 
fmt.Println(pinyin.Pinyin(hans, py))
fmt.Println(pinyin.Pinyin(hans2, py))  
fmt.Println(pinyin.Pinyin(hans3, py))

输出如下:

[[yi yi yi] [yu tu xu yu] [wu] [ai] [ou] [e]] //  余的读音?
[[liu] [qian xi]]
[[wang wang yu]] // 王有yu的读音吗?

我简单查了下资料【余】【王】好像没有上述那么多的发音?
请大佬看看,是我代码有问题还是需要做什么?

性能提升

有计划提升性能吗? 比如toFixed中的正则匹配效率偏低之类

重写Fallback Bug

func main() {
	a := pinyin.NewArgs()
	a.Separator = ""
	a.Style = pinyin.FIRST_LETTER
	a.Fallback = func(r rune, a pinyin.Args) []string {
		return []string{string(r)}
	}
	var s string = "重。,a庆"
	p := pinyin.Pinyin(s, a)
	fmt.Println(p)
}

输出:[[z] [�] [,] [a] [q]]
重写Fallback返回原值后中文标点符号会受到音调处理等影响变成乱码,对于Fallback产生的值不应该进行风格化处理

使用拼音库初始化的问题

问题:我们的服务是编译完之后放到特定服务器上运行,其中引用的外部数据配置文件都是单独在一个指定的文件下的。在引入拼音库以后,服务编译都没问题但是放到服务器运行失败就 core dump了,查原因发现是拼音库使用了jieba分词库,而对jieba分词初始化的时候,使用了默认参数,“jieba = gojieba.NewJieba()”, 什么都没传,这样的话结巴分词默认倒入的是 “/github.com/yanyiwu/gojieba/dict”这个路径下的数据,我放到服务器上的程序就找不到对应 数据文件了,所以提一个 建议 是否应该提供一种方式 暴漏给 拼音库的使用者 倒入特定路径下数据呢?不知道 您是否能理解我说的意思 希望得到您的回应

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.