Coder Social home page Coder Social logo

pinyin's People

Contributors

iccanobif avatar lxyu avatar pmitros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pinyin's Issues

pinyin ._compat.py 无法解码成unicode

_compat.py 无法解码成unicode 当用CMD运行时

error“UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte”

solution:

s = unicode(s, "utf-8") ===> s = unicode(s, "gbk")

UnicodeDecodeError when pip install

When trying to pip install pinyin, error occurs: UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 571: character maps to <undefined> How to resolve this issue? Thanks

发音错了

In [3]: pinyin.get(u"猎神")
Out[3]: 'xishen'

发音错了

夏->sha4, 莆->fu3

夏门的'夏‘ should be 'xia'
莆田的‘莆’ should be 'pu'

Is there any way to fix it?

Thanks in advance.

Please document that words are not accounted for

Thank you for the work on this library. A big and easy improvement to its usefulness is to prominently document that it currently performs a character-level translation, which is incorrect for many common words and sentences.

>>> import pinyin
>>> pinyin.get('了解', format='numerical')
'le5jie3'

The output should be

'liao3jie3'

The ü sound from 女 is shown as a v

I think that's how mandarin.dat represents it, but it would be nice to translate that to the more common ü (obviously with the proper tone markers, as well).

incorrect diacritical placement in v 0.3 of the package

Version 0.3 is producing incorrect diacritical marks:

In [3]: pinyin.get("绝")
Out[3]: 'júe'

This issue is caused by a problem in Line 41 of pinyin.py:
pinyin = pinyin[:vowel] + tonemarks[tone] + pinyin[vowel:]

Which if changed to the following:
pinyin = pinyin[:vowel] + pinyin[vowel:vowel+1] + tonemarks[tone] + pinyin[vowel+1:]

Produces correct output for more cases:

In [2]: pinyin.get("绝")
Out[2]: 'jué'

In [3]: pinyin.get("小")
Out[3]: 'xiǎo'

In [4]: pinyin.get("许")
Out[4]: 'xǔ'

However this fails with words like 操 and 被, which places marks incorrectly on the second vowel. So the maybe the following:

        elif format == "diacritical":
            # Find first vowel -- we should put the diacritical mark
            # just after
            vowels = [x for x in pinyin if x in "aeiou"]
            vowel = pinyin.index(vowels[1]) if len(vowels) > 1 else pinyin.index(vowels[0])
            if vowels[0] in "aeo" and len(vowels) > 1:
                pinyin = pinyin[:vowel] + tonemarks[tone] + pinyin[vowel:]
            else:
                pinyin = pinyin[:vowel] + pinyin[vowel:vowel+1] + tonemarks[tone] + pinyin[vowel+1:]

Which produces the best-seeming output of all:

In [16]: pinyin.get("他")
Out[16]: 'tā'

In [17]: pinyin.get("绝")
Out[17]: 'jué'

In [18]: pinyin.get("小")
Out[18]: 'xiǎo'

In [19]: pinyin.get("被")
Out[19]: 'bèi'

u -> v的奇怪现象,如何理解

测试

In [4]:  pinyin.get('战略')
Out[4]: 'zhànlvè'


In [13]: pinyin.get("战略", format='strip', delimiter=" ")
Out[13]: 'zhan lve'

我想得到zhan lue
我应该如何改正?

Using 'String'.translate()

python 3 has a function for strings ('string'.translate()) which takes a dictionary and uses it to "translate" character by character to the mapping specified in the dictionary.

Its seems to me that your code would by more pythonic if you took advantage of this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.