Comments (1)
自己重写了一下这个方法net.sourceforge.pinyin4j.PinyinHelper#toHanYuPinyinString
你可以用这个试试。自测还可以。
/**
* 替换原来的toHanYuPinyinString
* 原来的有bug:
* 原来的比如 "一二三" 用#分割后,结果为yi#ersan,最后二和三之间的#丢失了。有的会丢,有的不会丢,取决于词组中是否有包含一的词组
* add by zhaiwei5
* @param str
* @param outputFormat
* @param separate
* @param retain
* @return
* @throws BadHanyuPinyinOutputFormatCombination
*/
static public String toHanYuPinyinString(String str, HanyuPinyinOutputFormat outputFormat,
String separate, boolean retain) throws BadHanyuPinyinOutputFormatCombination {
ChineseToPinyinResource resource = ChineseToPinyinResource.getInstance();
//装拼音的list
List list = new ArrayList<>();
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
//匹配到的最长的结果
String result = null;
char currentChar = chars[i];
Trie root = resource.getUnicodeToHanyuPinyinTable();
int index = i;
//当前字符的编码
String hexStr = Integer.toHexString(currentChar).toUpperCase();
//当前字符对应的根节点对象
Trie nowTrie = root.get(hexStr);
//判断有没有在配置文件中设置拼音
if (nowTrie == null || nowTrie.getPinyin() == null) {
if (retain) {
list.add(Character.toString(chars[i]));
}
//没有设置则跳出当前循环,继续找下一个
continue;
}
result = nowTrie.getPinyin();
if (i + 1 == chars.length) {
//是最后一个字符了,解析拼音
String[] pinyinStrArray = resource.parsePinyinString(result);
//多音字默认取第一个
list.add(PinyinFormatter.formatHanyuPinyin(pinyinStrArray[0], outputFormat));
} else {
//是否是词组
boolean isMulti = false;
//当前字符的下一个节点
Trie nextMap = nowTrie.getNextTire();
while (true) {
if (index + 1 == chars.length) {
//尽最大努力匹配到最后一个字符了
break;
}
//取下一个字符
char nextChar = chars[++index];
//前一个字符有匹配的词组
if (nextMap != null) {
//匹配字符的下一个字符对应的节点对象
Trie nextTrie = nextMap.get(Integer.toHexString(nextChar).toUpperCase());
if (nextTrie == null) {
break;
}
if (nextTrie.getPinyin() != null) {
//是词组,尽最大努力匹配最多字的词组
result = nextTrie.getPinyin();
isMulti = true;
//index前面已经加1了
i = index;
}
//下一个节点
nextMap = nextTrie.getNextTire();
} else {
break;
}
}
String[] pinyinStrArray = resource.parsePinyinString(result);
if (!isMulti) {
//如果当前字符及其后的字符没有匹配上词组,则取当前字符的拼音
//如果是多音字取第一个读音
list.add(PinyinFormatter.formatHanyuPinyin(pinyinStrArray[0], outputFormat));
} else {
//词组
for (String SinglePinyin : pinyinStrArray) {
list.add(PinyinFormatter.formatHanyuPinyin(SinglePinyin, outputFormat));
}
}
}
}
String collect = list.stream().collect(Collectors.joining(separate == null ? "" : separate));
return collect;
}
from pinyin4j.
Related Issues (20)
- 返回值存在一点小问题 HOT 4
- 韩语识别报错
- "葉 "这个字应该返回时“YE”,实际上却返回了“XIE”
- 绿,女转错了 HOT 2
- 张字转拼音 是我使用问题嘛?
- 单个字转换的时候 为什么出现三个相同的...拼音 HOT 1
- “膀胱”翻译不正确
- “听” 字为什么有两个音
- 长chang转成了zhang怎么办?
- 哦字转拼音变成了 e HOT 1
- "乐器"转换成拼音有错误
- 作者还维护么 HOT 1
- "葉" toWadeGilesPinyinStringArray 翻譯有誤,"HSIEH" 應為 "YEH"
- 嗯toHanyuPinyinStringArray有误 HOT 1
- "鼓"字拼音生成了两个,"hu" 和 "gu",但是"鼓"不是多音字
- 拼音翻译错误,长寿路,chang shou lu,被翻译为了 zhang shou lu HOT 1
- 有没有那种可以转名字的呀,把 曾 转成了 ceng,不确定还有没有这样的问题
- 繁體的“葉”翻錯了 HOT 1
- 增加语言学会粤拼的转换
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pinyin4j.