Character encoding

From ArchWiki

Character encoding is the process of interpreting bytes to readable characters. UTF-8 is the dominant encoding since 2009 and is promoted as a de-facto standard [1].

UTF-8

Terminal

The following lists some terminals that support UTF-8:

Gnome-terminal or rxvt-unicode

You need to launch these applications from a UTF-8 locale or they will drop UTF-8 support. Enable the en_US.UTF-8 locale (or your local UTF-8 alternative) per the instructions above and set it as the default locale, then reboot.

Troubleshooting

This article or section is a candidate for merging with Localization/Simplified Chinese#Garbled problem.

Notes: The mentioned encoding at #Incorrect encoding for extracted files is chinese-specific, having a dedicated section for a single line about mp3unicode does not seem adequate, the whole section should be merged there. (Discuss in Talk:Character encoding)

This article or section is a candidate for merging with Archiving_and_compression#Garbled_Japanese_Filenames.

Notes: This is a problem related to character encoding more than usage of archiving utilities. (Discuss in Talk:Character encoding)


  • Use mp3unicode for fixing encoding problems with mp3 files.

Incorrect encoding for extracted files

Old versions of Windows (XP, Vista, and 7) use different encoding for the content of compressed files. To unzip use the command:

$ unzip -O CP936 file.zip