Coder Social home page Coder Social logo

Comments (1)

zhaiwei3000 avatar zhaiwei3000 commented on August 12, 2024

自己重写了一下这个方法net.sourceforge.pinyin4j.PinyinHelper#toHanYuPinyinString

你可以用这个试试。自测还可以。
/**
* 替换原来的toHanYuPinyinString
* 原来的有bug:
* 原来的比如 "一二三" 用#分割后,结果为yi#ersan,最后二和三之间的#丢失了。有的会丢,有的不会丢,取决于词组中是否有包含一的词组
* add by zhaiwei5
* @param str
* @param outputFormat
* @param separate
* @param retain
* @return
* @throws BadHanyuPinyinOutputFormatCombination
*/
static public String toHanYuPinyinString(String str, HanyuPinyinOutputFormat outputFormat,
String separate, boolean retain) throws BadHanyuPinyinOutputFormatCombination {
ChineseToPinyinResource resource = ChineseToPinyinResource.getInstance();
//装拼音的list
List list = new ArrayList<>();
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
//匹配到的最长的结果
String result = null;
char currentChar = chars[i];
Trie root = resource.getUnicodeToHanyuPinyinTable();
int index = i;
//当前字符的编码
String hexStr = Integer.toHexString(currentChar).toUpperCase();
//当前字符对应的根节点对象
Trie nowTrie = root.get(hexStr);
//判断有没有在配置文件中设置拼音
if (nowTrie == null || nowTrie.getPinyin() == null) {
if (retain) {
list.add(Character.toString(chars[i]));
}
//没有设置则跳出当前循环,继续找下一个
continue;
}
result = nowTrie.getPinyin();
if (i + 1 == chars.length) {
//是最后一个字符了,解析拼音
String[] pinyinStrArray = resource.parsePinyinString(result);
//多音字默认取第一个
list.add(PinyinFormatter.formatHanyuPinyin(pinyinStrArray[0], outputFormat));
} else {
//是否是词组
boolean isMulti = false;
//当前字符的下一个节点
Trie nextMap = nowTrie.getNextTire();
while (true) {
if (index + 1 == chars.length) {
//尽最大努力匹配到最后一个字符了
break;
}
//取下一个字符
char nextChar = chars[++index];
//前一个字符有匹配的词组
if (nextMap != null) {
//匹配字符的下一个字符对应的节点对象
Trie nextTrie = nextMap.get(Integer.toHexString(nextChar).toUpperCase());
if (nextTrie == null) {
break;
}
if (nextTrie.getPinyin() != null) {
//是词组,尽最大努力匹配最多字的词组
result = nextTrie.getPinyin();
isMulti = true;
//index前面已经加1了
i = index;
}
//下一个节点
nextMap = nextTrie.getNextTire();
} else {
break;
}
}
String[] pinyinStrArray = resource.parsePinyinString(result);
if (!isMulti) {
//如果当前字符及其后的字符没有匹配上词组,则取当前字符的拼音
//如果是多音字取第一个读音
list.add(PinyinFormatter.formatHanyuPinyin(pinyinStrArray[0], outputFormat));
} else {
//词组
for (String SinglePinyin : pinyinStrArray) {
list.add(PinyinFormatter.formatHanyuPinyin(SinglePinyin, outputFormat));
}
}
}
}
String collect = list.stream().collect(Collectors.joining(separate == null ? "" : separate));
return collect;
}

from pinyin4j.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.