Input method

From ArchWiki
Jump to navigation Jump to search

Input method

From Wikipedia:Input method:

An input method (or input method editor, commonly abbreviated as IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse operations) that are natively available on their input devices. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: Not all IMEs work by romanisation, not even Latin all the times. Redefinition required (Discuss in Talk:Input method#Not all IMEs work by romanisation)

In very simplified terms, an IME is an application that takes Latin characters that you type on your keyboard and outputs them on your screen as non-Latin characters. The IME does this through a process called romanization, which is the transliteration of non-Latin language sounds into the Latin equivalents that most closely resemble them.

Tip: If parts of the following text show on your screen as gibberish, please make sure you have installed a Japanese font on your system. See Localization/Japanese#Fonts for a non-exhaustive list of available fonts.

As an example, the Japanese written word for sake is , also written as さけ, and romanized as "sake". The IME's role is to act as a middleman between the keyboard and the screen, so that when we type "sake" it will intercept the keyboard's input, replace "sake" with or さけ (as chosen by us) and put the native characters on the screen instead of what we actually typed.

Note: For the sake of simplicity only the keyboard has been mentioned in the example, but an IME can actually produce native characters via a number of ways and input devices, like e.g. drawing characters by hand using a mouse or a Wacom tablet.

Input method framework

Most IMEs work as part of an input method framework (commonly abbreviated as IMF), which is an application that allows the user to easily switch between different IMEs. In fact, this is the exact same application that many of us unknowingly use everyday to switch between the different Latin keyboard layouts (e.g. English, Spanish, German, etc).

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: This is described from a Western point of view (i.e. mine). I know for a fact that in the Eastern hemisphere, things are somewhat different when it comes to IME/IMF popularity and development activity. If these differences should be taken into account, input from native users will be needed. (Discuss in Talk:Input method#)

The most common, most diverse and most well supported input method frameworks are Fcitx (mostly used in KDE and other Qt-based environments) and IBus (mostly used in GNOME and other GTK-based environments). Less common ones include Uim, Scim, Hime, Gcin and Nimf. Additionally, Emacs is a very popular text editor that contains its own internal IMF.

See also Wikipedia:List of input methods for Unix platforms.

List of available input method editors

The following table shows the IMEs for the various languages currently available in the Arch repositories and the AUR.

Note: In some cases, multiple AUR packages exist for the same IME. A good example of this is Mozc: as the currently most popular Japanese IME, multiple packagers have attempted over the years to create the "perfect" Mozc package. In the table below, a single Mozc package has been included for each IMF; but this does not imply that these are the only Mozc packages a user should ever consider to install.
Tip: Fcitx5 is the successor of Fcitx. It is still somewhat new so some IMEs may currently lack support for it, and even if they do support it the required packages may have not yet been published in the repositories or the AUR; still, if it covers your needs then you should prefer it over the previous version.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: There are a few more IMEs in the repos than what is listed here, especially for Chinese but maybe for other languages too. Again, I'm only including what I'm familiar with or what I could find with a quick scan of the repos and the AUR. For better results we should consult a user who actively uses them. (Discuss in Talk:Input method#)
Fcitx5 Fcitx IBus Uim Emacs Scim Hime Gcin Nimf
Chinese
Rime fcitx5-rime fcitx-rime ibus-rime built-in
Sogou fcitx-sogoupinyinAUR
Baidu fcitx-baidupinyinAUR
Chewing fcitx5-chewing fcitx-chewing ibus-chewing scim-chewing built-in
Cangjie 3/5 fcitx5-table-extra fcitx-table-extra ibus-table-chinese scim-tablesAUR
Pinyin fcitx5-chinese-addons built-in ibus-pinyin scim-pinyinAUR built-in
Wubixing built-in built-in ibus-table built-in scim-tablesAUR
Libpinyin fcitx-libpinyin ibus-libpinyin
Google Pinyin fcitx-googlepinyin ibus-googlepinyinAUR
SunPinyin fcitx-sunpinyin ibus-sunpinyin
Japanese
Mozc fcitx5-mozc-utAUR fcitx-mozc-utAUR ibus-mozc-utAUR uim-mozc-ut2AUR emacs-mozc-utAUR
Anthy fcitx5-anthy fcitx-anthy ibus-anthy built-in built-in built-in built-in
SKK fcitx5-skk fcitx-skk ibus-skk built-in
KKC fcitx5-kkc fcitx-kkc ibus-kkc
Korean
Libhangul fcitx5-hangul fcitx-hangul ibus-hangul built-in scim-hangulAUR built-in
Vietnamese
UniKey fcitx5-unikey fcitx-unikey ibus-unikey
Bamboo ibus-bambooAUR
Indic
Avro ibus-avro-gitAUR
Helakuru ibus-helakuruAUR
m17n fcitx5-m17n fcitx-m17n ibus-m17n scim-m17nAUR
OpenBangla Keyboard openbangla-keyboardAUR
Varnam libvarnam-ibus-gitAUR

Configuration

In order for your desktop environment to properly register an installed input method framework as available and assign it to handle user input, a set of environment variables must be configured accordingly. A good place to do so is /etc/environment.

Note: If these variables are not set, both GTK and Qt will attempt to read the system's locale settings to determine which IMF they should use, but this process relies on guesswork and can be very error-prone. For a properly working system, you should always opt to explicitly set these variables yourself.
Tip: If for some reason you wish to completely disable your desktop environment from handling input via an IMF (NOT recommended in GNOME due to tight integration with IBus), then you may either leave these variables unset or, in the case of GTK and Qt, you can replace their values with: GTK_IM_MODULE=gtk-im-context-simple and QT_IM_MODULE=simple.

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: There have been reports (even in this Wiki) that these variables sometimes (or maybe in some specific desktop environments like LXDE) do not work as expected, with users claiming that they had to specify xim instead of their IMF. I'm not really sure if these claims are accurate or when exactly they were reported (it might have been years ago, when the IM frameworks were even more of a mess than they are now). For what it's worth, with regards to Fcitx and IBus at least, I haven't had any problem using them in both GTK and Qt environments and applications by only specifying the variables exactly as laid out below. Maybe for Uim and Scim (which don't seem to have been updated much in recent times) the situation is different, but I wouldn't know. (Discuss in Talk:Input method#)

Fcitx5

Note: The following also applies to Fcitx.

See Fcitx5 for more information.

GTK_IM_MODULE=fcitx
QT_IM_MODULE=fcitx
XMODIFIERS=@im=fcitx

IBus

See IBus for more information.

GTK_IM_MODULE=ibus
QT_IM_MODULE=ibus
XMODIFIERS=@im=ibus

Uim

See Uim for more information.

GTK_IM_MODULE=uim
QT_IM_MODULE=uim
XMODIFIERS=@im=uim

Emacs

Tango-inaccurate.pngThe factual accuracy of this article or section is disputed.Tango-inaccurate.png

Reason: This needs to be verified. (Discuss in Talk:Input method#)

According to this Fcitx wiki entry, "in some case, including emacs and java. Emacs has a historical bug, that under en_US.UTF-8 or similar locale, it will never use XIM (Though emacs is a gtk app, it use XIM). The only way to walkaround this is to use LC_CTYPE to fix this."

Scim

See Scim for more information.

GTK_IM_MODULE=scim
QT_IM_MODULE=scim
XMODIFIERS=@im=scim

See also