Coder Social home page Coder Social logo

python-romkan's Introduction

python-romkan

Build Status PyPI version

python-romkan is a Romaji/Kana conversion library for Python, which is used to convert a Japanese Romaji (ローマ字) string to a Japanese Kana (仮名) string or vice versa.

It is the Pythonic port of Ruby/Romkan, originally authored by Satoru Takabayashi and ported by Masato Hagiwara.

python-romkan works on Python 2 and Python 3 (fully tested on Python 2.6, 2.7, 3.2, 3.3 and PyPy). It handles both Katakana (片仮名) and Hiragana (平仮名) with the Hepburn (ヘボン式) romanization system, as well as the modern Kunrei-shiki (訓令式) romanization system.

Project homepage: http://www.soimort.org/python-romkan

Fork me on GitHub: https://github.com/soimort/python-romkan

Installation

1. Install via Pip:

$ pip install romkan

2. Install via EasyInstall:

$ easy_install romkan

3. Install from Git:

$ git clone git://github.com/soimort/python-romkan.git
$ python setup.py install

Usage

Python 3.x:

$ python
>>> import romkan
>>> print(romkan.to_roma("にんじゃ"))
ninja
>>> print(romkan.to_hepburn("にんじゃ"))
ninja
>>> print(romkan.to_kunrei("にんじゃ"))
ninzya
>>> print(romkan.to_hiragana("ninja"))
にんじゃ
>>> print(romkan.to_katakana("ninja"))
ニンジャ

Python 2.x:

$ python2
>>> import romkan
>>> print romkan.to_roma(u"にんじゃ")
ninja
>>> print romkan.to_hepburn(u"にんじゃ")
ninja
>>> print romkan.to_kunrei(u"にんじゃ")
ninzya
>>> print romkan.to_hiragana("ninja")
にんじゃ
>>> print romkan.to_katakana("ninja")
ニンジャ

API Reference

  • to_katakana(string)

Convert a Romaji (ローマ字) to a Katakana (片仮名).

  • to_hiragana(string)

Convert a Romaji (ローマ字) to a Hiragana (平仮名).

  • to_kana(string)

Convert a Romaji (ローマ字) to a Katakana (片仮名). (same as to_katakana)

  • to_hepburn(string)

Convert a Kana (仮名) or a Kunrei-shiki Romaji (訓令式ローマ字) to a Hepburn Romaji (ヘボン式ローマ字).

  • to_kunrei(string)

Convert a Kana (仮名) or a Hepburn Romaji (ヘボン式ローマ字) to a Kunrei-shiki Romaji (訓令式ローマ字).

  • to_roma(string)

Convert a Kana (仮名) to a Hepburn Romaji (ヘボン式ローマ字).

License

python-romkan is licensed under the BSD license.

python-romkan's People

Contributors

melissaboiko avatar soimort avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-romkan's Issues

fails on double 'n'

In both traditional and modified Hepburn romanization, "annai" should be rendered as あんない, but romkan renders it as あんあい.

import romkan
romkan.to_hiragana("annai")
'あんあい'

Failed to install when default encoding is not UTF-8

Traceback (most recent call last):
  File "setup.py", line 12, in <module>
    README = open(os.path.join(here, 'README.rst')).read()
  File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 181: ordinal not in range(128)

Exceptions for "kannji" and "kannzi"

Test cases for the following functions currently check the output of "kannji" and "kannzi":

to_katakana(...) - Convert a Romaji (ローマ字) to a Katakana (片仮名).
to_kana(...) - Convert a Romaji (ローマ字) to a Katakana (片仮名). (same as to_katakana)
to_hiragana(...) - Convert a Romaji (ローマ字) to a Hiragana (平仮名).
to_hepburn(...) - Convert a Kana (仮名) or a Kunrei-shiki Romaji (訓令式ローマ字) to a Hepburn Romaji (ヘボン式ローマ字).

These two forms ("kannji" and "kannzi"), are valid in neither Hepburn nor Kunrei romanization schemes therefore they cannot be considered valid "Romaji" input. when When given these inputs, instead of returning values ("カンジ", "かんじ", and "kanji"), the four above functions should raise an exception.

If we wanted to preserve the ability to convert from these forms, support for a third form of romanization should be added - "Wāpuro rōmaji" (word processor romaji):
http://en.wikipedia.org/wiki/W%C4%81puro_r%C5%8Dmaji

Macrons for some long vowels

The Hepburn romaniser seems to follow the Modified Hepburn version.
http://en.wikipedia.org/wiki/Hepburn_romanization

e.g. It produces 'shinpai' instead of 'shimpai'.

Modified Hepburn has complex rules for placing macrons to indicate that some vowels are long, which the romaniser does not currently do.

e.g.
Expected results

ちゅうい  chūi
みずうみ  mizuumi

Actual results:

ちゅうい  chuui
みずうみ  mizuumi

Another example is with 'tōkyō', which currently gets output as 'toukyou'.

Add reversible romanization method

Neither of the current romanization methods are fully reversible.

Effort seems to have been put towards making to_kunrei reversible, even though this forces it to not strictly follow the Kunrei scheme:
to_kunrei:

ち -> ti
てぃ  -> texi (should also be 'ti')

(See "ティーム" vs. "チーム", http://en.wikipedia.org/wiki/Kunrei-shiki_romanization)

A reversible method is a useful thing to have, however:

  1. The current function probably should be named differently because it doesn't follow the Kunrei scheme exactly (how about "reversible", or "romkan"?)
  2. The current to_kunrei function is nearly there, but still not perfectly reversible:

to_kunrei:

ぢ -> dyi
でぃ  -> dyi

The Hepburn function is also not reversible, but this is a known property of the scheme.

to_hepburn:

ず -> zu
づ -> zu

Install of romkan 0.2.1 fails

pip install romkan failed in windows 10 with python 3.9.6 with the following error message:

README = open(os.path.join(here, 'README.rst')).read()
       File "C:\Python396\lib\encodings\cp1250.py", line 23, in decode
         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     UnicodeDecodeError: 'charmap' codec can't decode byte 0x83 in position 182

I could overcome this issue by setting PYTHONUTF8=1 environment variable.
(In the code, encoding="utf-8" should be added to open().)

Also, there was a deprecation notice:
DeprecationWarning: the imp module is deprecated in favour of importlib

to_kana() doesn't consistently return Hepburn or Kunrei

Hello,

I have already reported a couple of other issues and a PR, but I haven't yet even taken the time to thank you for this neat package... Thank you!!

I am opening this issue because I am a bit confused with which inverse romanization I should expect to_kana(str) to return.

These lines suggest that your intent was for it to return the Hepburn version if possible, otherwise the Kunrei version:
https://github.com/soimort/python-romkan/blob/master/src/romkan/common.py#L373-376

Later however, ROMKAN.update( {"ti": "チ"} ) explicitly prescribes Kunrei over Hepburn:
https://github.com/soimort/python-romkan/blob/master/src/romkan/common.py#L382-383
( "チ" is Kunrei, "ティ" is Hepburn)

What is the rationale behind this?
Is the intent to emulate keyboard input method ("wapuro" style) inverse romanization?

Thanks!
Baptiste

importlib load_source Error while running setup.py

image
Hi
I am facing this error after running the setup.py file. Please help

python setup.py install
Traceback (most recent call last):
File "C:\Users\lahhe\Documents\python-romkan-master\setup.py", line 17, in
VERSION = imp.load_source('version', os.path.join(here, 'src/%s/version.py' % PACKAGE_NAME)).version
^^^^^^^^^^^^^^^
AttributeError: module 'importlib' has no attribute 'load_source'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.