Localization/Simplified Chinese

From ArchWiki
Jump to navigation Jump to search

According to "The Arch Way": We will not configure everything for you, because "Preferences and needs are different for everyone", but we will try to ensure that the configuration is convenient and simple. In fact, it is even easier than using some Chinese versions of Linux.

This article provides Chinese cultural guidance for various common software as much as possible. But in practical applications, you may encounter all kinds of troubles. Do not be discouraged when you are in trouble. Solving problems is a pleasure in itself. You can seek help through various channels:

Basic Chinese support

To display Chinese correctly, you must set the correct locale and install the appropriate Chinese fonts.

locale settings

Install Chinese locale

In Linux, locale is used to set up different environments for program running. Commonly used Chinese locales are (the most intuitive is the number of words that can be displayed):

zh_CN.GB2312
zh_CN.GBK
zh_CN.GB18030
zh_CN.UTF-8
zh_TW.BIG-5
zh_TW.UTF-8

It is recommended to use UTF-8 locale. You need to modify the /etc/locale.gen file to set the locale that can be used in the system (cancel the comment symbol "#" before the corresponding item):

en_US.UTF-8 UTF-8
zh_CN.UTF-8 UTF-8

Then execute the locale-gen command, you can use these locales in the system. You can use the locale command to view the currently used locale: You can also use the locale command to view the currently used locale: You can also use the locale -a command to view the currently available locale.

Enable Chinese locale

In Arch Linux, set the globally effective locale through the /etc/locale.conf file:

LANG=en_US.UTF-8
Warning: It is not recommended to set Chinese locale here, it will cause tty garbled characters; Chinese can also be displayed and input under tty, but you need to install zhconAUR or other packages.
Tip: If you want to patch the kernel in Chinese, please refer to [1].

For specific users, you can also set your own user environment in ~/.bashrc, ~/.xinitrc, or ~/.xprofile. The difference is:

  • .bashrc:Read and apply the settings inside each time you log in using the terminal.
  • .xinitrc:Each time you use startx or SLiM to start the X interface, read and apply the settings.
  • .xprofile:Read and apply the settings each time you log in using a display manager such as GDM.
Enable Chinese locale in the graphical interface separately

It is not recommended to use the global Chinese locale in /etc/locale.conf, which will cause tty garbled characters.

As mentioned earlier, the Chinese locale can be set separately in ~/.xinitrc or ~/.xprofile . Add the following content to the front-end comment of the above file (if you are not sure which file to use, you can add both):

export LANG=zh_CN.UTF-8
export LANGUAGE=zh_CN:en_US
Note: If you want to put these two lines in ~/.xinitrc , please put them before the exec _example_WM_or_DE_ line. This is a common mistake.
Note: This method is suitable for users with SLiM or no login manager. GDM and SDDM users can select the language in the GNOME or KDE settings.
Note: It is not recommended that the global export export LC_ALL cover all locale settings. LC_ALL should be reserved for diagnostic debugging purposes. The global setting LC_ALL will bring unnecessary difficulties for diagnosing language setting problems.

Chinese fonts

Install fonts

In addition to setting up the locale, you also need to install Chinese fonts.

Commonly used free (GPL or compatible copyright) Chinese fonts are:

System fonts will be installed to /usr/share/fonts by default. If you do not have root authority or only plan to use certain fonts yourself, you can directly copy these fonts to the ~/.fonts directory (or its subdirectories) and add the path to /etc/fonts/local.conf. For details, see the following chapters.

See also: [2]

Chinese fonts configuration

fontconfig settings

The setting file of fontconfig is ~/.fonts.conf (user) or /etc/fonts/conf.d (global). It is recommended to modify the former.

For Chinese font settings, see: Fonts (简体中文)Font configuration (简体中文).

Font Configuration (简体中文)/Chinese (简体中文) provides a demonstration of Chinese fontconfig.

See also:

Fixed Simplified Chinese display as a variant (Japanese) glyph

After installing Noto Sans CJK or adobe-source-han-sans-otc-fonts (Siyuan Bold) or adobe-source-han-serif-otc-fonts (Siyuan Song), in some cases (framework undefined area) The shape of the Chinese character does not match the standard form. For example, the font shape of the 门, 关, and 复 does not match the standard Chinese.

This is because different default fonts can be set in each program, such as Arial or Tohamo, and the attributes of these fonts are controlled by fontconfig and the order of use is based on the regional code and the default order of A-Z alphabet order, because ja-JP is before zh_{CN,HK,SG,TW} , So Japanese fonts are displayed first.

Tip: You can set the font separately in the settings of the Chromium/Chrome/Firefox browser, for example, adjust the font option to Noto xxx CJK SC.

You can use the following methods to solve (taking simplified Chinese as an example):

  • Add LANG=zh_CN.UTF-8 to locale.conf to set Simplified Chinese as the default language. Since the Locale is defined for the locale (ie CJK priority), the default priority is ignored.
  • Manually adjust the priority to adjust the Chinese font before the Japanese font. [4]:

Create a file under /etc/fonts/conf.d/ or /etc/fonts/conf.avail/ , such as 64-language-selector-prefer.conf, or modify or create ~/.fonts.conf (only for User effective):

If noto-fonts-cjk is installed, write:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>sans-serif</family>
    <prefer>
      <family>Noto Sans CJK SC</family>
      <family>Noto Sans CJK TC</family>
      <family>Noto Sans CJK JP</family>
    </prefer>
  </alias>
  <alias>
    <family>monospace</family>
    <prefer>
      <family>Noto Sans Mono CJK SC</family>
      <family>Noto Sans Mono CJK TC</family>
      <family>Noto Sans Mono CJK JP</family>
    </prefer>
  </alias>
</fontconfig>

If you installed adobe-source-han-sans-otc-fonts, write:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <alias>
    <family>sans-serif</family>
    <prefer>
      <family>Source Han Sans SC</family>
      <family>Source Han Sans TC</family>
      <family>Source Han Sans HW</family>
      <family>Source Han Sans K</family>
    </prefer>
  </alias>
  <alias>
    <family>monospace</family>
    <prefer>
      <family>Source Han Sans SC</family>
      <family>Source Han Sans TC</family>
      <family>Source Han Sans HW</family>
      <family>Source Han Sans K</family>
    </prefer>
  </alias>
</fontconfig>

Note that if you create an xml file in the /etc/fonts/conf.avail Note that if you create an xml file in the /etc/fonts/conf.d , for example:

# ln -s /etc/fonts/conf.avail/64-language-selector-prefer.conf /etc/fonts/conf.d/64-language-selector-prefer.conf

Then update the font cache to take effect:

# fc-cache -fv

Execute the following command to check. If NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular" appears, the setting is successful:

# fc-match -s | grep 'Noto Sans CJK'

Chinese input method

Commonly used Chinese input method platforms are IBus, fcitx and scim. For specific installation and configuration, please refer to the respective entries.

Note: scim is currently lagging in maintenance and is not recommended.

Terminal Chinese support

Bootloader Chinese support

See GRUB2 (简体中文).

Cultural configuration in software

Firefox

Simplified Chinese users install firefox-i18n-zh-cn

Traditional Chinese users install firefox-i18n-zh-tw

Libreoffice

Simplified Chinese users install libreoffice-fresh-zh-cn or libreoffice-still-zh-cn.

Traditional Chinese users install libreoffice-fresh-zh-tw or libreoffice-still-zh-cn.

PDF reader

Most PDF viewers already support Chinese. But there are some parts that need to install additional language packs/fonts:

Arcobat's font is acroread-fontsAUR, or you can install acroread-fonts-systemwideAUR to use system-wide fonts.

okular,evince and other poppler related readers and Inkscape, Krita, MyPaint and other image processing tools that can handle pdf: poppler-data needs to be installed

Java

For Sun Java users, create a fallback directory in /opt/java/jre/lib/fonts, and then link or copy several Chinese fonts to this directory to make the java program display Chinese correctly. For example, if jreAUR and opendesktop-fonts have been installed, use the following command:

# ln -s /usr/share/fonts/TTF/odosung.ttc /opt/java/jre/lib/fonts/fallback/
# cd /opt/java/jre/lib/fonts/fallback/
# mkfontdir
# mkfontscale

vim

If the locale is utf8 encoded, using vim to open other Chinese encoded files may be garbled. The following settings need to be made in ~/.vimrc:

~/.vimrc
...
set fileencodings=utf8,cp936,gb18030,big5
...

Chinese video subtitles

MPlayer

To enable MPlayer to display subtitles correctly, the key is to make the encoding of the subtitle file consistent with the encoding used in mplayer config. The subtitle file is encoded as gbk, then subcp=cp936; the subtitle file is encoded as utf-8, then subcp=utf8. If the subtitle file is encoded as utf-8 and set to subcp=cp936, some garbled characters will appear. Another simpler method is to set to subcp=enca:zh:ucs-2, and enca is responsible for the encoding and display of subtitles.

Modify ~/.mplayer/config:

~/.mplayer/config
font='文泉驿正黑'
subcp=enca:zh:ucs-2

Use the following command to manually load subtitles:

$ mplayer xxx.avi -sub xxxxx.srt

If you use a graphics front end (such as SMPlayer), it will be simpler, as long as you set the default subtitle encoding and font in the settings dialog box.

xine

Xine can also display Chinese subtitles, but you need to make your own Chinese fonts. For details, please refer to: [5].

gstreamer

In totem 1.4.0, since gstreamer0.10 is used, it should be able to automatically load srt subtitles with the same name.

LaTeX

First you need to install the CJK package, then you need to install the appropriate font. For details, please refer to: [6].

Garbled problem

The basic principle to avoid garbled characters: use utf-8 instead of gbk/gb2312.

File name is garbled

Install convmv and use the convmv command to convert the encoding format. For example:

$ convmv -f GBK -t UTF-8 --notest --nosmart file

-f specifies the original encoding, and -t specifies the output encoding. Use convmv --list 可查询所有支持的编码。 --notest means not to test but to transcode (if you do not use this parameter, only the conversion result will be printed instead of actual transcoding), --smart means if it is already UTF-8, it will be ignored.

File content is garbled

Use the iconv command to convert the format. For example:

$ iconv -f GBK -t UTF-8 -o new-file origin-file

-f specifies the original encoding, and -t specifies the output encoding. Use iconv -l to query all supported encodings. -o specifies the output file.

zip compressed package is garbled

How to avoid: Under non-utf8 coding environment (generally the Chinese environment under windows is), do not use zip for compression (7z is recommended). Solution: Install and use unzip-iconvAUR or unzip-natspecAUR instead of the original unzip to decompress. For example:

$ unzip -O gbk file.zip

file.zip is a compressed file, gbk is the encoding format of the file, specified with -O (the original unzip has no -O option).

MP3 file label garbled

For players that use GStreamer as the backend, such as Rhythmbox and totem, after setting the following environment variables, the GB3 encoded ID3 tag in mp3 can be read correctly:

export GST_ID3_TAG_ENCODING=GBK:UTF-8:GB18030
export GST_ID3V2_TAG_ENCODING=GBK:UTF-8:GB18030

For Beep media player, you can select MPEG Audio plugin in pefenrence->plugins->media , and then click Penfenrences below. At this time, a dialog box will appear, select title, change Disable ID3v2 and Convert non-UTF8 ID3 tags to UTF8 The selection box is checked. Then fill in gbk in ID3 encoding. In this way, bmp can correctly display the GB3 encoded ID3 tag.

Quod Libet player supports tag editing and setting ID3v2 encoding. Can be set in ~/.quodlibet/config

~/.quodlibet/config
...
id3encoding = gbk
...
Note: Quod Libet supports utf8 encoding by default.

The most thorough solution is to convert the id3 tag encoded as gbk to utf8 encoding. First install python-mutagen, and then use the following command to convert:

$ mid3iconv -e gbk XXX.mp3

Garbled Chinese file name under Windows partition

Generally, the mounted character set is different from locale, you can modify /etc/fstab (if you do not understand, please read fstab carefully). If locale is utf8, modify to:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=utf8 0 0

If the locale is GBK, it should be:

/etc/fstab
...
/dev/sdxx /media/win ntfs defaults,iocharset=cp936 0 0
...

Samba garbled

When using Arch as the Samba server, adding the following line to /etc/samba/smb.conf can solve the Windows client garbled problem:

/etc/samba/smb.conf
...
unix charset=gb2312
...

ftp garbled

Many ftp sites are GBK encoded. If you use UTF8 locale, the downloaded file name may be garbled. For lftp, make the following settings under .lftp/rc:

.lftp/rc
...
set ftp:charset "gbk"
set file:charset "UTF-8"
...

For gftp, you can do the following settings in .gftp/gftprc:

.gftp/gftprc
...
remote_charsets=gb2312
...

However, the downloaded file name is still garbled and needs to be patched and compiled. The patch address is: http://www.teatime.com.tw/%7Etommy/linux/gftp_remote_charsets.patch

Translation software

  • stardict: StarDict.
  • sdcv: command line StarDict.
  • ydcv: Youdao dictionary on the command line.
  • youdao-dictAUR: Youdao dictionary (graphic interface), screen word translation.
  • goldendict: There is no dictionary by default, you can download the corresponding dictionary package (supports Babylon's thesaurus format .BGL, StarDict no longer maintained thesaurus format (.ifo/.dict/.idx/.syn), dictd words Library format (.index/.dict(.dz), ABBYY Lingvo's thesaurus format (.dsl/.lsa/.dat), mdict's thesaurus format, etc. The thesaurus files of these dictionaries can be downloaded and imported on the Internet Use of GoldenDict (may have copyright issues).
  • moedictAUR: A multi-platform Chinese dictionary, in addition to Chinese characters, words, idioms, etc., it also contains Hakka, Hokkien, simple foreign language translation, stroke order writing, etc., moedict online address.
  • linedictAUR: An online English-Chinese dictionary that gets results by crawling Youdao translation webpage, some support English-Chinese translation, imitating dmenu to display the results at the top of the screen, easy to use, because the API used by ydcv has expired, and the new API is free to use The frequency limit, linedictAUR is a good alternative.