Coder Social home page Coder Social logo

katahiromz / imestudy Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 1.0 5.76 MB

Windows IME/IMM Study by katahiromz

License: MIT License

Batchfile 0.65% C 87.58% C++ 7.53% Assembly 1.73% CMake 2.51%
ime articles chinese-language input-method japanese-language korean-language study win32 win32api windows

imestudy's Introduction

Windows IME/IMM Study by katahiromz

このレポジトリは、WindowsのIME/IMMに関する研究をまとめたものです。

  • ARTICLES - 日本語の記事です。
  • STUDY - 研究らしきものです。

What is IME?

IME is Input Method Editor, that is text conversion software for Far East Asian Windows users. IME allows the Far East Asian users to input Far East Asian text.

The body of an IME is a DLL file that is registered as an IME in the operating system. An IME can provide the functions whose names begin with "Ime", and NotifyIME function.

Usually, the IME software has to provide its IME installer to register itself into the system.

For more details, see IME Hackerz (REF011 and REF012).

What is IMM?

IMM is Input Method Manager for Windows, that provides the interface between the user and an IME software.

What is IMM32?

IMM32 is 32-bit IMM for Windows. That is a DLL file that is located in C:\Windows\system32 as the name of "imm32.dll".

IMM32 provides API functions whose names begin with "Imm". For more details, see IME Hackerz.

How to use IMM32 in my program?

First, please include <imm.h> and link to imm32.dll as follows:

#include <windows.h>
#include <imm.h>
#pragma comment(lib, "imm32.lib")

And then use the following basic form:

HIMC hImc = ImmGetContext(hWnd);
...(Use IMM functions against hImc)...
ImmReleaseContext(hWnd, hImc);

The IMM function has a name that begins with Imm, those are listed at https://katahiromz.web.fc2.com/colony3rd/imehackerz/en/functions.html .

What is "keyboard layout"?

The key code system differs depending on the keyboard or the keyboard language. The position of the physical key may change. Also, the key code mapping may change.

Usually, Japanese people uses Japanese keyboards. Chinese people uses Chinese or English keyboards. Korean people uses Korean keyboards. Japanese user doesn't want to use English keyboard to input Japanese text.

What is Chinese?

That is Asian people and/or languages, that is mainly living in the continent of Far East. They use Chinese characters. Many Chinese are not Japanese nor Korean.

See also REF020.

What is Japanese?

That is Asian people and/or languages, that is mainly living in the islands of Far East. They use Hiragana, Katakana, Kanji and English characters. Many Japanese are not Chinese nor Korean.

See also REF021.

What is Korean?

That is Asian people and/or language, that is mainly living in the peninsula of Far East. They use Hangul characters (like 김정은) and English characters. Many Korean are not Chinese nor Japanese.

See also REF022.

What is Zenkaku or Hankaku?

A Hankaku (半角) character is a single-byte character in Shift_JIS (except UTF-8). A non-Hankaku character is a Zenkaku (全角) character (in Traditional Japanese). Traditionally, a non-single-byte character (in Shift_JIS encoding) has double width of a Hankaku character. A Japanese fixed-width font should follow this traditional rule.

What is Hiragana and Katakana?

  • Hiragana (ひらがな) is a Japanese phonetic character collection (like あいうえお etc.).
  • Katakana (カタカナ) is a Japanese phonetic character collection (like アイウエオ etc.).

Why so many characters in Japanese?

Japanese text was born as a domestic international language through exchanges with China and neighboring countries.

  1. Japanese didn't have any characters in the initial time. Japanese was sounds only.
  2. The character of Katakana is derived from part(s) of Chinese Kanji, to write pronunciation of foreign word.
  3. The character of Hiragana is derived from a deformation of Chinese Kanji, to write Japanese word.
  4. Many Kanji characters also continued to be used frequently.

What is Kana?

Kana (かな/カナ) is Hiragana and/or Katakana.

What is Kanji?

The Kanji (漢字) characters in Japanese originate on the Chinese characters (like 亜阿唖).

What is Kanji radical?

The Kanji radical (部首) is a systematic component of the Chinese character (or a Kanji character).

What is Romaji conversion?

Romaji conversion (ローマ字変換) is a translation from English alphabet text into Kana character text.

What is Kana-Kanji conversion?

Kana-Kanji conversion (かな漢字変換) is a conversion from Kana text into Kanji or something text. A normal Japanese keyboard cannot type the Kanji characters directly.

How is the Japanese keyboard?

The current Japanese keyboard standard is 109-keyboard. It can type English alphabet and Hiragana characters, and some Japanese symbols and punctuations. Additionally it has the VK_KANJI, VK_KANA, VK_CONVERT, and VK_NONCONVERT virtual keys.

  • VK_KANJI (半角/全角) key is Hankaku/Zenkaku key to toggle Hankaku input mode and Zenkaku input mode.
  • VK_KANA (かな) key is Kana key to begin the Kana (Hiragana and Katakana) input or toggle the Hiragana mode and the Katakana mode. These two modes are exclusive.
  • VK_CONVERT (変換) is Convert key to convert the text.
  • VK_NONCONVERT (無変換) is Don't-Convert key to revert conversion.

Some punctuation key mapping differs from English key mapping.

How is the Chinese keyboard?

It can type English alphabet and the Chinese radical symbols. Chinese keyboard is usually compatible to English keyboard. Optionally, the Chinese radical symbols are printed on key tops.

How is the Korean keyboard?

It can type English alphabet and the Hangul radical symbols. Additionally, it has VK_HANGUL, VK_JUNJA and VK_HANJA virtual keys.

  • VK_HANGUL is the Hangul input mode key (same as VK_KANA).
  • VK_JUNJA is the Junja mode key.
  • VK_HANJA is the Hanja mode key.

How to input Japanese text?

A normal Japanese keyboard cannot type the Kanji characters directly. The Japanese user inputs the Hiragana text (or Romaji-converted text) into IME and converts into Kanji or something text by the IME.

There is Romaji input mode and Kana input mode. These modes are exclusive. You can toggle these modes by Alt+VK_KANA key. In Romaji input mode, typing Alphabet key makes translation from English Alphabet to Kana. In Kana input mode, typing actual Hiragana key makes Hiragana character input.

To begin Japanese text, press Alt+VK_KANJI (or simply VK_KANJI in new Windows). It enables Zenkaku mode and "あ" will be displayed on Taskbar or IME bar. Pressing Alt+VK_KANJI again, it disables Zenkaku mode and "A" will be displayed on Taskbar or IME bar (Hankaku mode).

In Zenkaku mode, the Zenkaku characters that the user typed is displayed with underlined text (indeterminated composition text). Then Space key or VK_CONVERT key makes Kana-Kanji conversion of that text. Then the conversion candidates will be displayed with highlighted text. Pressing Space or VK_CONVERT key again makes next Kana-Kanji conversion (it might show the list of the candidates). Pressing Enter key commits the conversion text and that the selected candidate text will be actually entered the text box. Esc key in Zenkaku mode makes cancellation of conversion.

How to input Chinese text?

The Chinese user enters the Kanji radicals into IME. The IME in Kanji mode automatically converts them into Kanji characters.

Where is free Japanese IME?

See REF013 (mzimeja).

What is DefWindowProc?

DefWindowProc is the default window procedure. There are two versions of it, as DefWindowProcA (ANSI version) and DefWindowProcW (Unicode version). If the application window doesn't process the window message, then the window message will be processed by DefWindowProc.

What is EDIT control?

The EDIT control is a text box, that is a child window whose window class name is "Edit".

What is standard IME message?

The standard IME messages are the Windows messages whose names begin with "WM_IME_".

What is Cicero?

Cicero is the code name of TSF.

What is TSF?

TSF stands for "Text Services Framework". TSF appears on Windows XP and later. It has many other names, such as "Text Framework", "Text Services", "Text Frame Service" etc. TSF is an empowered framework over IMM32. TSF has COM (Component Object Model)-based design.

Old IMM32-based IME runs under "CUAS" emulation layer of TSF. New IMM32 has ImmDisableTextFrameService API to disable TSF in the thread. See REF024.

What is CUAS?

CUAS (Cicero Unaware Application Support) is an emulation layer that connects between the old IMM32-based application and a TSF TIP.

What is COM (Component Object Model)?

See REF023.

What does an IME installer?

It copys the IME-related files into the system, writes some settings in the registry, and call the ImmInstallIME function.

What is registry key "Run"?

Registry key "Run" is as follows:

  • In cases of Windows 95/98/Me: "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run"
  • In the other cases: "HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run"

This registry key contains the list of the registered start-up entries. The key of an entry is the start-up program name. The value of an entry is a start-up command line.

What is internat.exe?

internat.exe is an old program that is called "international keyboard/language indicator applet". Windows 95, Windows 98, Windows Me, and Windows 2000 had internat.exe. This program can display and choose a keyboard / IME from notification area of task bar. It creates an invisible window "Indicator". It is a start-up program. Some keyboard/IME settings can register internat.exe to registry key "Run". It manages keyboard layout list.

What is ctfmon.exe?

ctfmon.exe is Language Bar front-end and a replacement of internat.exe. This program can also display and choose a keyboard / IME from Language Bar. ctfmon.exe hates internat.exe. ctfmon.exe kills internat.exe process, "Indicator" window, and registry key "Run" entry of internat.exe. ctfmon.exe is a start-up program. ctfmon.exe registers itself as a start-up program to registry key "Run".

See $(REACTOS)/base/applications/ctfmon .

What is HKL?

A handle of keyboard layout.

The LOWORD of an HKL is a language ID.

#define IS_IME_HKL(hKL)     ((((ULONG_PTR)(hKL)) & 0xF0000000) == 0xE0000000)
#define IS_SPECIAL_HKL(hKL) ((((ULONG_PTR)(hKL)) & 0xF0000000) == 0xF0000000)

If IS_IME_HKL(hKL) is TRUE, then hKL is a handle for an IME keyboard layout. If IS_SPECIAL_HKL(hKL) is TRUE, then hKL is a handle for a special keyboard layout.

See $(REACTOS)/dll/cpl/input/layout_list.c and registry key "Keyboard Layouts" for details.

What is registry key "Keyboard Layouts"?

The registry key "Keyboard Layouts" is located at:

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layouts .

See $(REACTOS)/dll/cpl/input/layout_list.c.

What is a keyboard layout list?

A "keyboard layout list" is the list of HKLs that can obtain from USER32!GetKeyboardLayoutList function. internat.exe manages the keyboard layout list.

What is CTF?

CTF stands for "Collaborative Translation Framework".

CTF provides msctf.dll, msctfime.ime, CTF IMEs etc.

What is IMM IME?

It is a DLL file that contains the following exported functions:

  • ImeInquire
  • ImeConversionList
  • ImeRegisterWord
  • ImeUnregisterWord
  • ImeGetRegisterWordStyle
  • ImeEnumRegisterWord
  • ImeConfigure
  • ImeDestroy
  • ImeEscape
  • ImeProcessKey
  • ImeSelect
  • ImeSetActiveContext
  • ImeToAsciiEx
  • NotifyIME
  • ImeSetCompositionString
  • ImeGetImeMenuItems

And it must have a version info in the following conditions (See IMM32!ImmLoadLayout):

  • The FILETYPE value must be VFT_DRV (0x3).
  • The FILESUBTYPE value must be VFT2_DRV_INPUTMETHOD (0xB).
  • The StringFileInfo block must contain correct codepage, language, and FileDescription values.

Usually an installed IME file is in %WINDIR%\system32. The filename extension is ".ime".

See $(REACTOS)/sdk/include/reactos/imetable.h and $(REACTOS)/dll/win32/imm32/utils.c .

What is CTF IME?

It is an extension of "IMM IME". It contains the IMM IME functions and the following exported functions:

  • CtfImeInquireExW
  • CtfImeSelectEx
  • CtfImeEscapeEx
  • CtfImeGetGuidAtom
  • CtfImeIsGuidMapEnable

See $(REACTOS)/sdk/include/reactos/imetable.h and $(REACTOS)/dll/win32/imm32/ctf.c .

How to export functions?

Just define the functions and add a .DEF file to your DLL file project.

What is TIP?

TIP stands for "Text Input Processor". The new-style IME of new design is called as TIP. A TIP is a DLL file, that is built with many COM objects and interfaces. The filename extension of a TIP is usually .dll. Strictly saying, a TIP is not an IME file (Not IMM IME nor CTF IME!).

What is msctfime.ime?

The system IME file msctfime.ime is the back-end of TIP. msctfime.ime communicates with the current TIP.

You can find the name of msctfime.ime in registry key:

  • HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\IMM

of Windows XP/2003.

See $(REACTOS)/dll/ime/msctfime .

What is msutb.dll?

msutb.dll is the back-end of TIP Bar.

What is msctf.dll?

msctf.dll is the back-end of Language Bar.

See $(REACTOS)/dll/win32/msctf .

Reference

imestudy's People

Contributors

katahiromz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

vn-os

imestudy's Issues

mzimejaが勝手にON OFFを繰り返す不具合

現状の最新版でIME ONとIME OFFの描画がうまくいかず繰り返しONOFFを繰り返す状態があるようです。
いまいち再現性が取れていませんが、デスクトップを選択しているときに起きるようです。

これはmzimeja側の不具合と思われる。

状態ウィンドウがときどき表示されない

[ファイル名を指定して実行] (Win+R) などでkatahiromz/mzimejaの状態ウィンドウが表示されないことがある。
日本語XPでは常に表示されるので、これはReactOS側の不具合と考えられる。

変換時にフリーズが起こる

「書かない」「見ない」「寝ない」「来ない」でフリーズ。
「書くな」「見るな」「寝るな」「来るな」でフリーズ。
「運びます」「直します」「直して」でフリーズ。

候補ウィンドウが表示されない

katahiromz/mzimejaでカナ漢字変換中に候補ウィンドウが表示されない。
日本語XPでは表示されるので、これはReactOS側の不具合と考えられる。

候補ウィンドウを正しい位置で表示させて下さい。

[半角/全角]キーが効かない

katahiromz/mzimejaの状態ウィンドウが表示されているときに[半角/全角]キーを押しても漢字入力モードにならない。
日本語XPではモード切り替えできるので、これはReactOS側の不具合だと考えられる。

未確定文字列ウィンドウが表示されない

2023年3月27日の現状では、EDITで「未確定文字列ウィンドウ」らしきものは、EDITコントロールの内部であり、これは「未確定文字列ウィンドウ」ではありません。

未確定文字列ウィンドウ(composition window)を表示させて下さい。

くだけた表現を適切に処理する

現在、mzimejaは、次のようなくだけた表現(casual speech)をうまく処理できない。

  • 要らない要らぬ
  • 要らない要らん
  • 知らない知らねー
  • ならないわならんわ
  • 寝ています寝てます
  • 違うのですよ違うんすよ
  • 買っておこうね買っとこうね
  • 好きなのだろうなあ好きなんやろうなあ
  • 見ているんだよ見てんだよ
  • 飛んでいる飛んでる
  • 止まらない止まんない
  • ならないだろうならんやろ
  • やってられないやってらんない
  • 余裕じゃない余裕ちゃう
  • たまらないたまらん
  • 選んではだめ選んじゃだめ
  • ならないだろうならんやろ
  • 話にならない話にならん
  • 合わない合わん
  • 辛いのだよな辛いんよな
  • 馴染んでいる馴染んでる
  • 生きていて生きてて

A greet from Indonesia

@katahiromz

Hi, Katayama Hirofumi!

I am very happy to found out this study.

Can we discuss this furthermore? I have zero experience on IME handling in Win32 applications...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.