Conversion of old Tibetan texts#
Offline conversion#
Conversion of old Tibetan PDFs#
The github project Python Tibetan Legacy Encodings tool by Elie Roux provides a python-based toolset to convert old PDFs that were created with old (pre-Unicode) Tibetan fonts.
You’ll find an email address on the github project page that offers support with the conversion!
Supported formats are:
Tibetisch dBu-can
DBu-can
Youtsoweb (TCRC)
Youtso (TCRC)
Bod-Yig (TCRC)
Ededris
Dedris
Drutsa
Khamdris
Sama / Esama
LTibetan, LTibetanExtension and LMantra
TibetanMachine
TibetanMachineWeb
TibetanMachineSkt
TibetanChogyal (PKTC)
TibetanClassic (PKTC)
DzongkhaCalligraphic (PKTC)
TB-Youtso, TB-TTYoutso, TB2-Youtso, TB2-TTYoutso (LTWA)
Monlam ouchan and Monlam yigchong
UTFC converter#
The trace foundation has supported the development of a Tibetan text converter Yalasoo UTFC:
encodings |
txt |
unicode txt |
rtf |
html |
---|---|---|---|---|
ACIP Transliteration |
yes |
yes |
yes |
yes |
ALA-LC Transliteration |
yes |
yes |
yes |
yes |
Bandrida |
yes |
no |
yes |
yes |
Beida Founder |
yes |
no |
no |
no |
Huanguang |
yes |
no |
no |
no |
LTibetan |
no |
no |
yes |
yes |
National Standard Extended |
no |
yes |
yes |
yes |
Sambhota 1.0 (Sama) |
no |
no |
yes |
yes |
Sambhota 2.0 (Dedris) |
no |
no |
yes |
yes |
TCRC Bod-Yig |
no |
no |
yes |
yes |
THDL Wylie |
yes |
yes |
yes |
yes |
Tibetan Machine |
no |
no |
yes |
yes |
Tibetan Machine Web |
no |
no |
yes |
yes |
Tongyuan |
yes |
no |
yes |
yes |
Unicode |
no |
yes |
yes |
yes |
Wylie Transliteration |
yes |
yes |
yes |
yes |
Attu#
The makers of PechaMaker have developed a Windows program that converts RTF documents that use a large number of legacy Tibetan fonts into Unicode: Attu.
Attu currently supports the conversion of the following legacy fonts into Unicode:[1]
Monlam: Monlam ouchan 1, Monlam ouchan 2, Monlam ouchan 3, Monlam ouchan 4, Monlam yigchong
Nitartha International (Sambhota): Dedris, Drutsa, Ededris, Khamdris, Sama
Tibetan Computer Company (TCC): TibetanMachine, TibetanChogyal, TibetanClassic, TibetanCalligraphic, DzongkhaCalligraphic
Tibetan Computing Resource Center (TCRC): TCRC Bod-Yig, TCRC Youtso, TCRC Youtsoweb
Tibetan Library of Works and Arts (TLWA): TB-Youtso, TB-TTYoutso, TB2-Youtso, TB2-TTYoutso
Others: LTibetan, LTibetanExtension, LMantra
See the Attu website for more information.
While Attu is a Windows program, is does run on Mac and Linux with Wine.
UDP#
Tibetan/Dzongkha Font Based Formats[2]#
A very useful conversion program for legacy Tibetan text formats is UDP. UDP converts the following legacy Tibetan text formats into Tibetan Unicode:
TibetanMachine
TibetanMachineWeb
Tibetan Modern A
Robillard (Ltibetan, etc)
Sambhota including Dedris, Eedris, Esama/b/c, Sama/b/c, Samw
TIBETBT
fonts derived from the “P.R.C. National Standard for Tibetan (Extension A)” (aka “Set A”). For a documentation see Chris Fynn’s website: Tibetan Extension A. Chris Fynn’s Jomolhari supports this standard.
TCRC Bod-Yig, TCRC Youtsoweb, TCRC Youtso
All Tibetan Unicode fonts (of course).
Checkout Pre-Unicode font list for download sources for most of those fonts.
How to use UDP#
First time configuration#
Get a copy of UDP from UDP website and install the application. UDP can be installed on computers running Windows or on computers running Linux and Wine.
Start UDP and select
Options/Font...
SelectUnicode
and chose a Unicode Tibetan font.Select
Options/Advanced...
and selectDocument are saved by default in: Unicode RTF
.
Now every document you will load into UDP will be displayed using a Unicode font and will be saved by default as RTF Unicode. RTF Unicode files can be directly edited using OpenOffice or Microsoft word.
Converting files#
Note: This conversion procedures work best with Windows, but it is also possible to run UDP using Wine for Linux (see below).
TibetDoc documents:
No conversion needed, continue with: Steps common to TibetDoc and Word documents
Word documents
Export as RTF: Save the document containing legacy Tibetan fonts as RTF document.
Simplify the RTF encoding: Many word processors (like Microsoft Word) create RTF files whose encoding is too complex for UDP to understand and that might cause UDP to crash. It is possible to simplify the RTF encoding by loading the RTF file with
wordpad
(comes with Windows) and directly saving the file in wordpad again. Wordpad writes the file in a format that is easier to process for UDP. Steps: (1) Load RTF file created in step 1 withwordpad
. (2) Save it without changes inwordpad
.
Steps common to TibetDoc and Word documents
Load into UDP: Load the TibetDoc or RTF file that has been saved in steps above into UDP
Create a Tibetan Unicode RTF file: In UDP, chose
File/Save as...
and select “Rich text Unicode” as output format.Done: Use any Unicode application (e.g. LibreOffice) to work with the resulting file.
Using UDP in Linux or OS-X with Wine#
UDP can be installed in Linux if Wine is installed. Simply start the installation program for UDP which can be downloaded from the UDP website.
Mac OS-X users need to install wine first, using for example homebrew.
Converting between Wylie and Tibetan Unicode#
ACIP conversion#
The pyewts library by Elie Roux also supports conversion from ACIP to Unicode.
Converting Tibetan Unicode into phonetics#
Other converters#
THDL offers a table with several Converters & Reverters For Tibetan: Tabular Survey Of Converters & Reverters For Tibetan
Resources and standards#
ACIP#
A backup of the ACIP Tibetan Input Code Standards as of July 1998 (
ticode.pdf
) can be downloaded here.[3]
Wylie (EWTS)#
TeachingEWTS.pdf
by ALEXANDRU ANTON-LUCA, de-facto standard for the extended Wylie transliteration system.