Input method

From ArchWiki

Input method

Tip: If parts of the following text show on your screen as gibberish, please make sure you have installed a Chinese or Japanese font on your system. See Fonts#Chinese, Japanese, Korean, Vietnamese for a non-exhaustive list of available fonts.
Note: For the sake of simplicity only the keyboard has been mentioned in the following examples, but an IME can actually work with a number of input sources, such as drawing characters by hand with the mouse or with a Wacom tablet.

From Wikipedia:Input method:

An input method (or input method editor, commonly abbreviated as IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse operations) that are natively available on their input devices. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.

In simpler words, an IME is an application that allows us to use Latin characters in order to type non-Latin characters.

Some IMEs do this through a process called romanization, which is the transliteration of non-Latin language sounds into the Latin equivalents that most closely resemble them. As an example, the Japanese written word for "sake" or "rice wine" is , also written as さけ, and romanized as "sake". The IME's role is to act as a middleman between the keyboard and the input fields, so that when we type "sake" it will intercept the keyboard's input, replace "sake" with or さけ (as chosen by users) and type the native characters for us instead of the keys we pressed.

There are also IMEs that do not make use of romanization. One of the most prominent ones, Cangjie, does so by decomposing Chinese characters into their radicals, matching these radicals to a second set of its own internal radicals, and finally matching these internal radicals to the Latin characters. As an example, the Chinese written word for "wine" is also , which consists of the radicals , , , and . Cangjie matches these radicals to the internal radicals , , , and , and then matches these to the Latin characters emcw; this means that when we type "emcw", Cangjie will intercept the keyboard's input, replace "emcw" with , and type that character on the screen.

Input method framework

Most IMEs work as part of an input method framework (commonly abbreviated as IMF), which is an application that allows the user to easily switch between different IMEs. In fact, this is the exact same application that many of us unknowingly use everyday to switch between the different Latin keyboard layouts (e.g. English, Spanish, German, etc).

The most common IMF is IBus (often used in GTK-based environments like GNOME), followed by Fcitx5 (mostly used in Qt-based environments like KDE), Scim, Fcitx, and Uim. Very uncommon ones include Gcin, Nimf and Hime. [1] Additionally, Emacs is a very popular text editor that has its own internal IMF.

See also Wikipedia:List of input methods for Unix platforms.

Note: SCIM current lacks maintenance and is therefore not recommended.

List of available input method editors

The following table shows the IMEs for various languages currently available in the Arch repositories and the AUR.

Note: In some cases, multiple packages exist for the same IME. A good example of this is Mozc: as the currently most popular Japanese IME, multiple packagers have attempted over the years to create the "perfect" Mozc package. In the table below, a single Mozc package has been included for each IMF; but this does not imply that these are the only Mozc packages a user should ever consider to install.
Warning: Not all the IME/IMF are up-to-date; For example, Anthy is no longer maintained and mozc is usually preferred. Some of the IMEs are also rather unpopular and not much support is provided. Check the corresponding Localization page for more information.
Fcitx5 Fcitx IBus Uim Emacs Scim Hime Gcin Nimf
Chinese
Rime fcitx5-rime fcitx-rime ibus-rime built-in
Pinyin fcitx5-chinese-addons built-in ibus-pinyinAUR scim-pinyinAUR built-in
Zhuyin fcitx5-chewing fcitx-chewing ibus-chewing scim-chewing built-in
Cangjie
Sucheng
SmartCangjie
fcitx5-table-extra fcitx-table-extra ibus-table-chinese scim-tablesAUR
Wubi built-in built-in ibus-table built-in scim-tablesAUR
Libpinyin fcitx-libpinyin ibus-libpinyin
SunPinyin fcitx-sunpinyin ibus-sunpinyin
Japanese
Mozc fcitx5-mozc-utAUR fcitx-mozc-utAUR ibus-mozcAUR emacs-mozcAUR
Anthy fcitx5-anthy fcitx-anthy ibus-anthy built-in scim-anthyAUR built-in built-in built-in
SKK fcitx5-skk fcitx-skk ibus-skk built-in
KKC fcitx5-kkc fcitx-kkc ibus-kkc
Korean
Libhangul fcitx5-hangul fcitx-hangul ibus-hangul built-in scim-hangulAUR built-in
Vietnamese
UniKey fcitx5-unikey fcitx-unikey ibus-unikey
Bamboo fcitx5-bamboo ibus-bambooAUR
Indic
Avro (Bangla) ibus-avro-gitAUR
Helakuru (Sinhala) ibus-helakuruAUR
m17n fcitx5-m17n fcitx-m17n ibus-m17nAUR scim-m17nAUR
OpenBangla Keyboard (Bangla) fcitx5-openbangla-gitAUR openbangla-keyboardAUR
Sayura (Sinhala) fcitx5-sayura fcitx-sayura
Varnam govarnam-ibus-gitAUR

Configuration

In order for your desktop environment to properly register an installed input method framework as available and assign it to handle user input, a set of environment variables must be configured accordingly.

Note: If these variables are not set, both GTK and Qt will attempt to read the system's locale settings to determine which IMF they should use, but this process relies on guesswork and can be very error-prone. For a properly working system, you should always opt to explicitly set these variables yourself.
Tip: If for some reason you wish to completely disable your desktop environment from handling input via an IMF (NOT recommended in GNOME due to tight integration with IBus), then you may either leave these variables unset or, in the case of GTK and Qt, you can replace their values with: GTK_IM_MODULE=gtk-im-context-simple and QT_IM_MODULE=simple.

Fcitx5

See Fcitx5#Integration for more information.

Fcitx

See Fcitx for more information.

GTK_IM_MODULE=fcitx
QT_IM_MODULE=fcitx
XMODIFIERS=@im=fcitx

IBus

See IBus for more information.

GTK_IM_MODULE=ibus
QT_IM_MODULE=ibus
XMODIFIERS=@im=ibus

Uim

See Uim for more information.

GTK_IM_MODULE=uim
QT_IM_MODULE=uim
XMODIFIERS=@im=uim

Emacs

The factual accuracy of this article or section is disputed.

Reason: This needs to be verified. (Discuss in Talk:Input method)

According to this Fcitx wiki entry, "in some case, including emacs and java. Emacs has a historical bug, that under en_US.UTF-8 or similar locale, it will never use XIM (Though emacs is a gtk app, it use XIM). The only way to walkaround this is to use LC_CTYPE to fix this."

Scim

See Scim for more information.

GTK_IM_MODULE=scim
QT_IM_MODULE=scim
XMODIFIERS=@im=scim

Xim

GTK_IM_MODULE=xim
QT_IM_MODULE=xim

See also