Difference between revisions of "Input Japanese using uim"

From ArchWiki
Jump to: navigation, search
m (Mozc: Upstream has been dropped to support scim)
m (Mozc: forgot to edit)
Line 112: Line 112:
  
 
* {{AUR|mozc-svn}}
 
* {{AUR|mozc-svn}}
** mozc-svn builds with the published svn repository instead of source tarball and can build uim-mozc plugin. {{Note|If you do not use uim (use ibus or scim), you should not use mozc-svn. It is exactly similar to mozc (published svn repository is not actually trunk) and run-time of makepkg of mozc-svn will be longer than mozc.}}
+
** mozc-svn builds with the published svn repository instead of source tarball and can build uim-mozc plugin. {{Note|If you do not use uim (use ibus), you should not use mozc-svn. It is exactly similar to mozc (published svn repository is not actually trunk) and run-time of makepkg of mozc-svn will be longer than mozc.}}
 
* {{AUR|mozc-ut}}
 
* {{AUR|mozc-ut}}
 
** mozc-ut comes with [http://www.geocities.jp/ep3797/mozc_01.html Mozc UT dictionary] and can build uim-mozc plugin. The dictionary adds over 350,000 words into original.{{Note|Building mozc-ut requires further long time to generate dictionary seed.}}
 
** mozc-ut comes with [http://www.geocities.jp/ep3797/mozc_01.html Mozc UT dictionary] and can build uim-mozc plugin. The dictionary adds over 350,000 words into original.{{Note|Building mozc-ut requires further long time to generate dictionary seed.}}

Revision as of 09:21, 26 May 2012

This template has only maintenance purposes. For linking to local translations please use interlanguage links, see Help:i18n#Interlanguage links.


Local languages: Català – Dansk – English – Español – Esperanto – Hrvatski – Indonesia – Italiano – Lietuviškai – Magyar – Nederlands – Norsk Bokmål – Polski – Português – Slovenský – Česky – Ελληνικά – Български – Русский – Српски – Українська – עברית – العربية – ไทย – 日本語 – 正體中文 – 简体中文 – 한국어


External languages (all articles in these languages should be moved to the external wiki): Deutsch – Français – Română – Suomi – Svenska – Tiếng Việt – Türkçe – فارسی

This page explains how to get the Japanese input to work using uim.
If you use SCIM, see Smart Common Input Method platform.
If you use IBus, see Ibus.

Installation

You need the following packages to input Japanese.

  • Japanese fonts
  • Japanese input method (Kana to Kanji conversion engine): This article describes about Anthy and Mozc.
  • Input method framework: uim

Japanese fonts

see also Fonts and Font Configuration for configuration or more detail.

Recommended Japanese fonts are as follows.

A high quality and formal style opensource font set including Gothic (sans-serif) and Mincho (serif) glyphs. Default font of openSUSE-ja.
Default Gothic font of Debian-ja, Fedora-ja, Vine Linux, et al.
Default Gothic font of Mandriva Linux ja environment.

If you want to show 2channel Shift JIS art properly, use one of the following fonts:

uim

Using pacman

Pull down the necessary things with a :

# pacman -S uim

Compiling uim from source using PKGBUILD

For instance, if you want to build uim with your configurationin, you can compile from source.

Arch's configuration

In Arch official repository, uim is built with the following custom configuration (as of 1.7.1):

  • --with-anthy-utf8 : Enable Anthy(UTF-8) support
  • --with-qt4 : Build uim-tools for Qt

Please see official wiki for all configure options.

Steps to build

The one of the easy way to build from source is using ABS.
First, install ABS:

# pacman -S abs

Update ABS:

# abs

Then, copy uim's directory to under your $HOME. For example:

$ cp -R /var/abs/extra/uim ~/sources/

Edit PKGBUILD.

Finally, run makepkg under uim directory to make and install package:

$ makepkg -s -i

Input method

Anthy

Anthy is one of the most popular Japanese input method in open source world. Though it is not maintained for a long time, Debian succeeds it from May 2010.

To install Anthy :

# pacman -S anthy
Extra dictionary

Default dictionary of original Anthy does not include several characters which are not specified on EUC-JP (JIS X 0208) such as "①", "♥", etc. alt-cannadic provides extra dictionaries including those characters.

Get alt-cannadic dictionary and put them under your ~/.anthy/imported_words_default.d.

$ tar jxvf alt-cannadic-091230.tar.bz2
$ mkdir ~/.anthy/imported_words_default.d (if not exist)
$ cp alt-cannadic-091230/extra/*.t ~/.anthy/imported_words_default.d/

Please see official wiki for more detail (Japanese).

Warning: If you will be using this extra dictionary, choose Anthy (UTF-8) for default input method on uim.

Modified Anthy (anthy-ut)

Modified Anthy is a set of patches and huge extended dictionaries which aims to improve the Kana to Kanji conversion quality of original Anthy.

Modified Anthy consists two different upstreams:

  • Patched source of Anthy by G-HAL
  • Huge extended dictionalies by UTSUMI
Warning: Modified Anthy applies to only Anthy (UTF-8). So you have to choose Anthy (UTF-8) for default input method on uim.
Warning: Modified Anthy does not have compatibility of the dictionaries and learning data with original Anthy.
Compiling Modified Anthy using PKGBUILD

Modified Anthy is available on AUR named anthy-utAUR.

Get anthy-ut tarball and makepkg to make and install package:

$ wget https://aur.archlinux.org/packages/anthy-ut/anthy-ut.tar.gz
$ tar xvf anthy-ut.tar.gz
$ cd anthy-ut
$ makepkg -s -i

If you have already used original Anthy, you have to convert the existing learning data format.

$ rm ~/.anthy/last-record1_*.bin
$ anthy-agent --update-base-record
$ rm ~/.anthy/last-record1_*.bin
$ anthy-agent --update-base-record

(Though this step repeats the same commands twice, it is not mistypes.)

Anthy Kaomoji

Anthy Kaomoji is a modified version of Anthy that converts Hiragana text to Kana Kanji mixed text and has emoticon (顔文字) and 2ch dictionaries. It can be found in the AUR (anthy-kaomojiAUR).

Mozc

Mozc (on AUR) is a Japanese open source input method originates from Google Japanese Input. It is considered that it has better conversion quality than Anthy as for multi segments conversion (e.g. one sentence) in a lump but the dictionary is not so sufficient than Google Japanese Input. Though Mozc adapts to only ibus input method framework, macuim provides uim-mozc plugin and you can use it with mozc-svnAUR or mozc-utAUR on AUR.

  • mozc-svnAUR
    • mozc-svn builds with the published svn repository instead of source tarball and can build uim-mozc plugin.
      Note: If you do not use uim (use ibus), you should not use mozc-svn. It is exactly similar to mozc (published svn repository is not actually trunk) and run-time of makepkg of mozc-svn will be longer than mozc.
  • mozc-utAUR
    • mozc-ut comes with Mozc UT dictionary and can build uim-mozc plugin. The dictionary adds over 350,000 words into original.
      Note: Building mozc-ut requires further long time to generate dictionary seed.

Each packages consist as follows:

Package mozc mozc-svn mozc-ut description
Group mozc-im mozc-im-svn mozc-im-ut
Component mozc mozc-svn mozc-ut Server part of the Mozc
ibus-mozc ibus-mozc-svn ibus-mozc-ut IBus engine module (optional)
N/A uim-mozc-svn uim-mozc-ut uim plugin module (optional)
N/A fcitx-mozc-svn N/A Fcitx module (optional)
emacs-mozc emacs-mozc-svn emacs-mozc-ut Mozc for Emacs (optional)
Using Unofficial user repository

There is an unofficial user repository of Mozc. Add the following into /etc/pacman.conf.

[pnsft-pur]
Server = http://downloads.sourceforge.net/project/pnsft-aur/pur/$arch
Note: This repo provides x86_64 packages only.

You can choose install packages of Mozc as follows:

# pacman -S mozc-im (Or mozc-im-svn / mozc-im-ut)

Or, specify package names directly. For example:

# pacman -S uim-mozc-svn emacs-mozc-svn
Compiling Mozc for uim using PKGBUILD
Prepare building Mozc

Mozc requires the following packages to be built in addition to its depending packages:

Install zinniaAUR from AUR before building Mozc.

Edit PKGBUILD

First, you get mozc-svnAUR or mozc-utAUR tarball from the AUR and edit the PKGBUILD to enable uim-mozc. That is, uncomment _uim_mozc line and you can comment out _ibus_mozc line to disable ibus module if unnecessary:

## You can choose the input method framework to use either ibus, uim or both.
## If you will be not using ibus, comment out below.
#_ibus_mozc="yes"
## If you will be using uim, uncomment below.
_uim_mozc="yes"

If you will be using mozc.el on Emacs instead of uim.el, uncomment _emacs_mozc line.

## If you will be using mozc.el on Emacs, uncomment below.
_emacs_mozc="yes"
Build and install

Finally, build and install:

$ makepkg -s -i
Re-register Mozc on uim

You must run the following command whenever you upgrade or (re-)install uim.

# uim-module-manager --register mozc

Settings

Add the followings to ~/.xprofile, ~/.xinitrc or ~/.xsession:

Environment variables

export GTK_IM_MODULE='uim'
export QT_IM_MODULE='uim'
uim-xim &
export XMODIFIERS=@im='uim'

Toolbar utilities

If you want to use UimToolbar utilities which shows and controls uim mode, add one of the followings, too.

uim-toolbar-gtk/qt

Using toolbar appears as a window.

For Gtk2:

uim-toolbar-gtk &

For Gtk3:

uim-toolbar-gtk3 &

For Qt:

uim-toolbar-qt4 &

uim-toolbar-gtk-systray

Using toolbar for system tray.

For Gtk2:

uim-toolbar-gtk-systray &

For Gtk3:

uim-toolbar-gtk3-systray &

Panel applet

Or, if you use GNOME, KDE or Xfce, you can use uim-toolbar panel applet (Xfce requires xfce4-xfapplet-plugin to use uim-applet-gnome).

uim preferences

Configure uim preferences by running :

$ uim-pref-gtk (Or, uim-pref-gtk3/uim-pref-qt4)

which brings forth a GUI.
Choose "Anthy", "Anthy (UTF-8)" or "Mozc" for 'Default input method'.

Note: Mozc will be not listed in 'Default input method' at first time so you will need to add it into 'Enabled input methods' to use.


You can run uim-xim or restart X to test your settings.
Provided everything went well you should be able to input Japanese in X.

お疲れ様です!

Input Japanese on Emacs

This section describes using uim.el (by uim), mozc.el (by Mozc) and anthy.el (by Anthy).

Using uim.el

uim provides uim.el the bridge software between Emacs and uim. Here is a sample to use uim on Emacs with utf-8 encoding.

Please see Official wiki for more detail.

LEIM or minor-mode

You can call uim.el from Emacs in two ways; directly or with the LEIM (Library of Emacs Input Method) framework. Though settings of them are different, basic functions are same. If you want to switch between uim.el and other Emacs IMs frequently, you should use LEIM framework.

Settings for the minor-mode

If you will be using on minor-mode, write the following settings into your .emacs.d/init.el or some other file for Emacs customizing.

;; read uim.el
(require 'uim)
;; uncomment next and comment out previous to load uim.el on-demand
;; (autoload 'uim-mode "uim" nil t)

;; key-binding for activate uim (ex. C-\)
(global-set-key "\C-\\" 'uim-mode)
Settings for the LEIM

If you will be using via LEIM, write the following settings into your .emacs.d/init.el or some other file for Emacs customizing and choose default input method.

;; read uim.el with LEIM initializing
(require 'uim-leim)

;; set default IM. Uncomment the one of the followings.
;(setq default-input-method "japanese-anthy-utf8-uim") ; Anthy (UTF-8)
;(setq default-input-method "japanese-mozc-uim")       ; Mozc
Preferred character encoding

uim.el uses euc-jp character encoding by default. To set UTF-8 as preferred encodings, add the followings into your .emacs.d/init.el or some other file for Emacs customizing.

;; Set UTF-8 as preferred character encoding (default is euc-jp).
(setq uim-lang-code-alist
      (cons '("Japanese" "Japanese" utf-8 "UTF-8")
           (delete (assoc "Japanese" uim-lang-code-alist) 
                   uim-lang-code-alist)))
Enable inline candidates displaying mode by default

The inline candidates displaying mode displays conversion candidates just below (or above) preedit text vertically instead of echo area. If you want to enable inline candidates displaying mode by default, write as follows.

;; set inline candidates displaying mode as default
(setq uim-candidate-display-inline t)
Set Hiragana input mode by default

To set Hiragana input mode at activting uim, add the settings like follows:

;; Set Hiragana input mode at activating uim.
(setq uim-default-im-prop '("action_anthy_utf8_hiragana" "action_mozc_hiragana"))
Ignoring C-SPC on uim.el

When you are assigning activation/deactivation of input method to C-SPC, C-SPC is stolen to switch input mode by uim.el while it is activated. To prevent the stealing and use for set-mark-command, add the followings into your .emacs.d/init.el or some other file for Emacs customizing.

(add-hook 'uim-load-hook
          '(lambda ()
             (define-key uim-mode-map [67108896] nil)
             (define-key uim-mode-map [0] nil)))

Using mozc.el

If you use Mozc, you can use mozc.el. Write the following settings into your .emacs.d/init.el or some other file for Emacs customizing.

(require 'mozc)  ; or (load-file "/path/to/mozc.el")
(setq default-input-method "japanese-mozc")

mozc.el provides "overlay" mode in the styles of showing candidates (from mozc r77) similar to the inline candidates displaying mode of uim.el. If you want to use overlay mode, add the following.

(setq mozc-candidate-style 'overlay)

Using anthy.el

If you use Anthy, you can use anthy.el. Write the following settings into your .emacs.d/init.el or some other file for Emacs customizing.

(load-library "anthy")
(setq default-input-method "japanese-anthy")
Note: anthy.el may not support UTF-8 properly.

Disabling XIM on Emacs

When you are using input method on your desktop and assigning activation/deactivation of input method to C-SPC, you will be not able to use C-SPC/C-@ as set-mark-command on Emacs. To avoid this problem, add the following into your ~/.Xresources or ~/.Xdefaults. xim will be disabled on Emacs.

Emacs*UseXIM: false

Troubleshooting

Cannot input Japanese on Opera

If you use Opera and cannot input Japanese with uim, try to edit environment variable as follows:

export QT_IM_MODULE='xim'

uim-toolbar-gtk-systray: tray icon is crushed

Though some of DE, WM or panel application may provide only one icon space per application on system-tray/notification-area, uim-toolbar-gtk-systray displays some icons on it by default so those icons are crushed. Choose just one of them to solve it. The steps to display only 'Input mode' icon for example as follows:

  1. Run uim-pref-gtk.
  2. Click 'Toolbar' on 'Group' list.
  3. Take the all checkmarks off.
  4. Click 'Anthy', 'Anthy (UTF-8)' or 'Mozc' which you are using on 'Group' list.
  5. Click Edit button in 'Toolbar' box -> 'Enable toolbar buttons' line.
  6. Enable only 'Input mode' and click 'Close' button.
  7. Click 'OK' button to close uim-pref-gtk.

The tray icon will be displayed "あ" (Hiragana mode) or "ー" (Direct mode).

I use darker theme, I cannot read the uim mode icons

You can choose icons for darker background (uim 1.6.0 or later).

  1. Run uim-perf-gtk
  2. Click 'Toolbar' on 'Group' list.
  3. Check 'Use icon for dark background'.

Useful literature

uim
uim official document
uim on wikibooks
Fonts
Japanese fonts showcase
modified Japanese fonts