id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
182	UTF-8	ken	clfs-commits@…	"Couldn't find a ticket for this, so starting a new one as an aide-memoire.

If people want to use UTF-8 (and so far, there seems a lack of consensus), the assumption is that it should be optional.  So far, I've been using it for a couple of years or so, and I'm aware of at least the following additions (there are probably others):

1. for glibc add libidn.  Now that glibc no longer gets releases, I'm going to try this with upstream libidn (v1.9), but I haven't yet.

2. for ncurses --enable-widec so that we build the ...w versions and remove/replace the non-wide versions similar to in LFS (ISTR the detail is slightly different for how to do this on multilib).

3. perhaps a note that if procps fails to compile in a UTF-8 system, check what you did to ncurses.

4. for groff, optionally sed characters U+2010,2018,2019,2212 to ascii characters more likely to be found in common screen fonts, as in LFS.

5. for man, convert the message files from various legacy encodings to UTF-8, and similarly the supplied non-English man pages (apropos, makewhatis, etc).  I don't know if any other core packages need this, the problem for each package is to find a message that has been translated, and work out how to generate that error so it can be tested to ee if the translation appears or if a legacy encoding appears.

6. follow man by groff-utf8 and sed man.conf to use it.

7. alter vim to put UTF-8 pages (fr, it, pl, ru) into the language directory instead of fr.UTF-8 etc.  My notes say that russian otherwise goes into ru.KOI8-R but I don't apparently do any recoding, so that needs to be checked again - certainly, with vim-7.1 I've got UTF-8 pages installed.

8. At the moment, I don't think there are any UTF-8 pages shipped in any of the core packages.  Shadow used to have loads, but those seem to have been dropped when debian  rescued it.  Perhaps we should have something a bit like what is in LFS explaining how to recode pages, but with the presumption that anyone doing this wil be recoding to UTF-8.  Maybe also a note that support for non-alphabetic in groff-utf8 is not perfect - sometimes there are error messages about fitting the text to the line, e.g. 
<standard input>:51: warning [p 1, 2.3i]: cannot adjust line - this applies particularly for japanese, but maybe also for chinese or korean (I can only trigger it for japanese).
 
Doing the recoding of the man files apparently means that 'man' cannot use legacy encodings (e.g. latin2, koi8r) - even latin1 might have oddities.

Note that man pages in UTF-8 alphabetic languages work in the console, provided you have a suitable font.  For chinese, japanese, korean you need a graphical display - rxvt-unicode works, I assume gnome-terminal does too.

We would also need some explanation of why to use this (easy - supports multiple languages on screen at the same time, rather than just a number of neighbouring languages, and handles ""fancy quotes"" sometimes found in english pages, e.g. from smartmontools), and alternatively why to not use it (perhaps, for people who have a large amount of text in legacy encodings, or who need to use legacy encodings).

 Discussion about the ""should we do this"" part on -dev, please."	task	closed	minor	CLFS Standard 1.2.0	BOOK	CLFS Standard GIT	fixed