coptic cross
Moheb's Coptic Pages
Tuesday, February 19, 2013

Typing and displaying Unicode Coptic texts under X11


1. Displaying Coptic Unicode text in an editor

Suppose you have already a Coptic text which is for example in UTF-8 format and you would like to display it, take for example the following sample,  (right click with the mouse to download and open with an editor) then, assuming that you work under X11, you must have:
  • a Unicode capable editor (like gtk2edit, gedit) or a word processor such as abiword or OpenOffice
  • an appropriate Unicode font installed in your system, in which the Coptic glyphs are included
Since there are many Unicode formats, you must know in which format your text is, in our case UTF-8, and tell the editor when you open it, in which format your text is. Refer to the section: installing Unicode Coptic font for a detailed description of how to install the font.

2. Entering Coptic text in an editor

Now suppose you further would like to modify this text, or write your own text, then you must be able to switch your current keymap to Coptic. For this you have two possibilities:
  • you either define a new X11 keymap.
  • or you use a graphical virtual keyboard, like xvkbd to enter your text.
Both methods are described in the section adding Coptic keyboard mapping.

3. Displaying Coptic text in an X-Terminal

If  for example you would like to display the above sample using  "more" from an X-terminal, then your terminal must have a Unicode support (like gnome terminal) and of course you must set the font of the terminal correctly. For a simple command like "more", this should be sufficient.
But there are a lot other applications that use your current locale of the glibc. A lot of applications also use the ncurses library. In such cases you will further need to:
  • modify at least some of your locale files as described in more details below upgrading the glibc locale for Coptic support
  • make sure you are using an ncurses library with UTF-support, which should be the case in most modern Linux distributions.


Coptic Keyboard Mapping


This describes, how you can enter a Coptic Unicode Text (under X11). It explains:

The key map file "cop"

XFree86 and other X11 implementations define the keyboard layout in a file that normally gets the same name as the abbreviation of the language, for example "de" for German or "us" for English. In my systems these files reside in the directory:
/usr/lib/X11/xkb/symbols/pc
In the/etc/X11/XF86Config file, there is one section for each input device describing also the keyboard parameters (language, variants, models...), in XFree86 it looks like:

Section "InputDevice"
Identifier  "Keyboard0"
Driver      "keyboard"
Option      "XkbRules" "xfree86"
Option      "XkbModel" "pc105"
Option      "XkbLayout" "de"
Option      "XkbVariant" "nodeadkeys"
EndSection

It is the task of the file /usr/lib/X11/xkb/symbols/pc/de in the example above to define a default key mapping to German and in addition all the different variants (keyboard model, variants,...).
I have prepared a similar file for Coptic. It follows the encoding suggested by Logos Research Systems. You can get a detailed document describing their exact layout from:http://www.logos.com/support/lbs/fonts/CopticKeyboard.
They also offer a Coptic layout prepared for WindowsXP for downloading. My X11 implementation follows the same layout for the normal, shifted, AltGr and shift+AltGr states. To install it, follow the following steps:
  1. download the cop file (if you are used to the old CS-Coptic encoding, you can alternatively download the file cop_CS, though I would recommend to stick to the cop file that follows the Logos encoding).
  2. as root, copy it to /usr/lib/X11/xkb/symbols/pc (or similar, depending on your system)
  3. If you would like to try it out in your current X11 session, type in (as same user who owns the X11 session): setxkbmap -layout "cop", ATTENTION: once you have typed this, you will not be able to type anything in latin anymore!! It is better to type before this in a terminal something like: setxkbmap -layout "us", so that you can use the up-arrow key in your X-terminal later and set the mapping back to "us". Alternatively, you can type: setxkbmap -layout "us,cop" -option "grp:alt_shift_toggle" which should allow toggling between "us" and "cop" key maps with the simultaneous pressing of both <Alt> + <Shift> keys.
  4. To have this every time you start your X session, as root, add the following lines in the InputDevice section of your /etc/X11/XF86Config file:
Option      "XkbLayout" "us,cop"
Option      "XKbOptions" "grp:ctrl_shift_toggle"


Virtual Keyboard

There is another alternative way to make entering Unicode text possible, without the need of the above steps. Instead of hitting the keyboard with your fingers, you would rather click a virtual keyboard with the mouse. One such virtual keyboards is xvkbd (written by Tom Sato, and is distributed under the terms of the GNU General Public License). The main problem with this tool is, that it uses the Xaw (or Xaw3) widget tool kit (which is by itself very fine), but the Xaw tool kit does not seem to have a Unicode support. I have succeeded in implementing a workaround, that at least in the case of Coptic Unicode works fine tough with some limitations. So patching the source code will be necessary.

You can have a look at some snapshots of the Coptic version of xkbrd:
Now, here are the detailed steps you should follow for installing xvkbd:
  1. first download the xkbrd version: 2.7a from xkbrd home page. You can also download newer versions, but the patch I have written is based on the version 2.7a.
  2. download the patch I have prepared: xvkbd-2.7a-Coptic.patch.
  3. download and install for example the TTF font New Athena Unicode. Follow these steps for installing the font.
  4. untar the file xvkbd-2.7a.tar.gz
  5. apply the patch, for example if you change directory one level above xvkbd-2.7a, type in:
    • patch -p0 < xvkbd-2.7a-Coptic.patch
  6. follow the steps described in the directory xvkbd-2.7a for making and installing xvkbd ( xmkmf; make install)
  7. copy the file XVkbd-coptic.ad into the app default directory (if it should be accessible for all users: /usr/X11R6/lib/X11/app-defaults, otherwise if only for you: $HOME/app-defaults, and define this directory in your rc file of the shell you use, i.e.: setenv XAPPLRESDIR $HOME/app-defaults for (t)csh or export XAPPLRESDIR=$HOME/app-defaults for bash).
  8. Make sure that the file XVkbd in your app default directory includes both lines:
    #include "XVkbd-common"
    #include "XVkbd-coptic.ad"
  9. If the key labels looks very weird when you start  xvkbd, then  either the font is not installed correctly, or the patch was not applied. Try "xlsfonts  | grep athena" , if this does not output anything, then the font is not installed.


glibc support for Coptic


The glibc library defines the locales (internationalization files) in your Linux operating systems. A lot of applications depend on it. It defines a set of attributes for every different country or region, like the language character set, the collation (sequence) of characters, currency, date format,...

Since there is no "Coptic" territory, it does not make sense to define a dedicated locale for Coptic. It would be even absolutely sufficient to extend your current locale by few more capabilities, so that at least the Coptic characters are defined.

Try to type: locale, the output should be something like this:

LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

If you have the extension UTF-8 at the end of you current locale parameters, then your current locale actually knows UTF-8, and there is almost nothing you have to do. I say almost, because the UTF definition will probably be lacking the Coptic range, since it is new (introduced in Unicode 4.1.0). In other words: if you wait till a new version of glibc is available, you will probably have to do nothing. But if you are inpatient, read few lines later.

What about, if your current locale does not have the UTF-8 extension? Try to list all available locales by typing:
locale -a
If there is no single locale ending with .utf8 then you should consider updating your glibc (which is really very critical because of the dependencies with other applications). Maybe it would be more convenient to update you whole distribution!

Otherwise, you need to consider modifying the following files:
  • /usr/share/i18n/locales/i18n
  • /usr/share/i18n/charmaps/UTF-8
optionally
  • /usr/share/i18n/locales/i18n/iso14651_t1

Updating the i18n file can be systematically done using the utility "gen-unicode-ctype", which is included in the tar ball of the glibc library. You can also get it directly here. Compile it with "gcc gen-unicode-ctype.c -o gen-unicode-ctype" then download the latest Unicode definition file (UnicodeData.txt) from the Unicode.org server. Generate a new version of i18n with: "./gen-unicode-ctype UnicodeData.txt", rename the output file to i18n and copy it to the directory /usr/share/i18n/locales/. You can also get the version, which I generated that way here.

Updating the file UTF-8 requires more "hand work". I have prepared a version that only adds the Coptic range, you can get it here.

So far I did not update my iso14651_t1 file.

After updating these files, you have to compile your current locale to reflect these changes, if for example you current locale is en_US.UTF-8, then as root type:

localedef --charmap=UTF-8 --inputfile=en_US en_US.utf8

Make sure that the subdirectory en_US.utf8 under the directory: /usr/lib/locale is now updated.

If you would like to test, if your modified locale now works, try to compile and run this test code. It tests the conversion of the upper case Coptic character alpha to lower case. It should output: 0x2c81

Re-encoding Coptic texts with iconv


GNU-libiconv is a library (including the command line iconv) which is distributed among the glibc package. It can convert between a lot of different encodings and UTF formats. The chances that you have the command line iconv on your Linux are almost 100%. Try: iconv -l, to see all encodings that iconv can deal with.

This is actually the perfect tool to use to re-encode your older Coptic texts into Unicode. I have already prepared a patch which extends the encodings of iconv by the Coptic Font Standard (CS Coptic) as established few years ago by CopticChurch.Net. You should apply the patch to libiconv-1.9.2 (you can download it here). After applying the patch, compiling and installing, you should now get the cs_coptic encoding when you type in: iconv -l
The good news are: patching iconv means, that many other applications that rely on it will now be able to understand the cs_coptic encoding. This applies for example to the font creation tool: fontforge.



Moheb Mekhaiel email