mozillazg / go-pinyin Goto Github PK
View Code? Open in Web Editor NEW汉字转拼音
Home Page: https://godoc.org/github.com/mozillazg/go-pinyin
License: MIT License
汉字转拼音
Home Page: https://godoc.org/github.com/mozillazg/go-pinyin
License: MIT License
多音解析有问题
hans := "重"
a := pinyin.NewArgs()
a.Heteronym = true
fmt.Println(pinyin.Pinyin(hans, a))
结果:zhong chong tong 多出了一个tong
$ pinyin -s Initials 五个
g
$ pinyin -s FinalsTone 文武
uén uǔ
个人建议:当输入为非汉字时,返回原始内容
如:“**人hello66” 转成 “zhongguorenhello66”
使用go-pinyin库做了json接口,测试了几个汉字,感觉不太对
config := pinyin.NewArgs()
config.Heteronym = true
config.Style = pinyin.Tone
{"han":"我她","pinyin":[["wǒ"],["tā","jiě","chí"]]}
{"han":"我大","pinyin":[["wǒ"],["dà","dài","tài"]]}
{"han":"我太","pinyin":[["wǒ"],["tài","tā"]]}
applyStyle时建议过滤相同的拼音,比如 不要音标时一个 '要'字会返回 [yao,yao,yao] ,建议返回[yao] 即可
setting := pinyin.NewArgs()
setting.Heteronym = true
setting.Style=pinyin.Tone2
fmt.Println(pinyin.Pinyin("广州深圳",setting))
返回值为
[[gua3ng ya3n a1n] [zho1u] [she1n] [zhe4n qua3n cho2u hua2i]]
很显然 ,多音字是不正确的
希望对于字典没有匹配的字符,不要返回空切片,而是原样返回,这样对于一些汉字和其他字符混合的情况可以输出较全的信息
系统 macos10.15.3 go1.12.1
/usr/local/Cellar/go/1.12.1/libexec/bin/go build -o /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go -gcflags "all=-N -l" /Users/xuhaoxian/go/src/douyin-api/main.go #gosetup
"/Users/xuhaoxian/Library/Application Support/JetBrains/Toolbox/apps/Goland/ch-0/193.6015.58/GoLand.app/Contents/plugins/go/lib/dlv/mac/dlv" --listen=localhost:51018 --headless=true --api-version=2 --check-go-version=false exec /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go -- #gosetup
API server listening at: 127.0.0.1:51018
debugserver-@(#)PROGRAM:LLDB PROJECT:lldb-1100.0.28..1
for x86_64.
Got a connection, launched process /private/var/folders/1z/n48jxz9d3pdggngj8xs48dwr0000gn/T/___go_build_main_go (pid = 46349).
dyld: malformed mach-o image: segment __DWARF has vmsize < filesize
Exiting.
预期是chang,实际是zhang
Hi everyone, FYI, I'm planning to rewrite this module, so that
名著:《红楼梦》〖清〗曹雪芹 著、高鹗 续/『人民文学』出版社/1996—9月30日/59.70【元】,《三国演义》〖明〗罗贯中。
The output will be more like normal text thus more easy to read:
MingZhu:《HongLouMeng》〖Qing〗CaoXueQin Zhu、GaoZuo Xu/『RenMinWenXue』ChuBanShe/1996—9Yue30Ri/59.70【Yuan】,《SanGuoYanYi》〖Ming〗LuoGuanZhong。
Of course, the versatile command line option can allow people to easily choose what their final format is.
Once again, this is a great pinyin conversion library, which is why I'm choose to to base my work on, just I thought about submitting patches and re-write and realize that the changes will be so dramatic that a re-writing would be most appropriate.
I shall be finished within a month. For anyone interested in my re-writing, please let me know. Thanks!
如题
$ pinyin 重庆
zhòng qìng
谢谢大大无私分享的项目,我使用时,数据主要是**繁体字
还没有完全测试,只是测了几个,发现
鉆 对应的拼音 "shuǎ",
我是在pinyin_dict.go里查的
0x29246: "shuǎ"
但是我程序输出的却是 “chān”
为啥嘞???
重量级的重庆市 使用go-pinyin转换为[zhong][liang][ji][de][zhong][qing][shi]
使用php 的https://github.com/overtrue/pinyin 返回zhongliangjidechongqingshi
我看两个底层都是用的https://github.com/mozillazg/pinyin-data
请问是否可以修复
exec: "gcc": executable file not found in %PATH%
System: windows 10
IDE: goland
Go version: 1.13.4
能否支持更改字库,可以传入自己的map[int][string]
Hi @mozillazg,
I noticed something is wrong with 声母风格&韵母风格 while doing #26.
(Ref: https://github.com/go-cc/cc2py/blob/master/cc2py_test.go#L32-L39)
Please take a look at
https://github.com/go-cc/cc-table/blob/master/text/tools/go-pinyin.go
You can see that the output for "yín
" in 声母风格 is blank and in 韵母风格 it is "iín
".
I.e., both cases are wrong.
Would you look into it please? Thx.
package converter
import "github.com/stretchr/testify/assert"
func (e *EncoderSuite) Test_ChineseToPinYinString() {
grids := []struct {
in string
sep string
expected string
}{
{in: "hello, **", sep: "", expected: "hello,zhongguo"},
}
for _, grid := range grids {
actual := ChineseToPinYinString(grid.in, grid.sep)
assert.Equal(e.T(), grid.expected, actual, "has error, want:%+v, but got: %+v, resource data value of the: %+v\n", grid.expected, actual, grid.in)
}
}
Getting this error running the example code:
go: finding github.com/mozillazg/go-pinyin v0.16.0
go: downloading github.com/mozillazg/go-pinyin v0.16.0
go: extracting github.com/mozillazg/go-pinyin v0.16.0
go: downloading github.com/yanyiwu/gojieba v1.1.0
go: extracting github.com/yanyiwu/gojieba v1.1.0
go: finding github.com/yanyiwu/gojieba v1.1.0
../gopath503249456/pkg/mod/github.com/mozillazg/[email protected]/phrase.go:10:10: undefined: gojieba.NewJieba
直接加在 readme 中, 对应关系见:#19 (comment)
我的代码如下:
hans := "一余五哎欧额"
hans2 := "刘茜"
hans3 := "王"
py := pinyin.NewArgs()
py.Heteronym = true
fmt.Println(pinyin.Pinyin(hans, py))
fmt.Println(pinyin.Pinyin(hans2, py))
fmt.Println(pinyin.Pinyin(hans3, py))
输出如下:
[[yi yi yi] [yu tu xu yu] [wu] [ai] [ou] [e]] // 余的读音?
[[liu] [qian xi]]
[[wang wang yu]] // 王有yu的读音吗?
我简单查了下资料【余】【王】好像没有上述那么多的发音?
请大佬看看,是我代码有问题还是需要做什么?
汉字不能转换拼音,它的拼音是yan3
Hi 闲耘,
Was it you who answered my question at hotoo regarding the 多音字字典?
I.e., do you realize that the pinyin dictionary used in this go-pinyin
package is different than the hotoo's? Thx.
长沙拼成zhang sha
[[zhang] [sha] }
有计划提升性能吗? 比如toFixed中的正则匹配效率偏低之类
测试字符串"测1试"
,测试函数go-pinyin.Slug
详见: #19 (comment)
func main() {
a := pinyin.NewArgs()
a.Separator = ""
a.Style = pinyin.FIRST_LETTER
a.Fallback = func(r rune, a pinyin.Args) []string {
return []string{string(r)}
}
var s string = "重。,a庆"
p := pinyin.Pinyin(s, a)
fmt.Println(p)
}
输出:[[z] [�] [,] [a] [q]]
重写Fallback返回原值后中文标点符号会受到音调处理等影响变成乱码,对于Fallback产生的值不应该进行风格化处理
请问的支持多音字吗?
go 1.17+开始,独立的二进制文件安装使用go install
go install github.com/mozillazg/go-pinyin/cmd/pinyin@latest
问题:我们的服务是编译完之后放到特定服务器上运行,其中引用的外部数据配置文件都是单独在一个指定的文件下的。在引入拼音库以后,服务编译都没问题但是放到服务器运行失败就 core dump了,查原因发现是拼音库使用了jieba分词库,而对jieba分词初始化的时候,使用了默认参数,“jieba = gojieba.NewJieba()”, 什么都没传,这样的话结巴分词默认倒入的是 “/github.com/yanyiwu/gojieba/dict”这个路径下的数据,我放到服务器上的程序就找不到对应 数据文件了,所以提一个 建议 是否应该提供一种方式 暴漏给 拼音库的使用者 倒入特定路径下数据呢?不知道 您是否能理解我说的意思 希望得到您的回应
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.