Nathan W. Hill – A rule based tagger for Classical Tibetan


Event Date: 21 March 2014
Room T101
Language Centre
22 Russell Square
SOAS, University of London,

The Department of Linguistics at SOAS presents:

Tibetan in Digital Communication

Tibetan in Digital Communication is a research project funded by the Arts and Humanities Research Council, engaged in building a 1,000,000 syllable part-of-speech tagged corpus of Tibetan texts spanning the language’s entire history. In addition to the corpus, the project is developing a number of digital tools that allows for the corpus to be employed in many areas of humanities research, and enables other researchers to more easily develop their own corpora or software tools.
The corpus will itself be a powerful resource for scholars working with Tibetan language materials in a wide range of disciplines –including history, religion, literature and linguistics–since it offers ready access to, and comparison across, texts from different time periods, regions and genres. It will also provide an important foundation for subsequent work on a historically comprehensive, lexicographically rigorous dictionary of Tibetan, akin to the Oxford English Dictionary.
By building this corpus for Tibetan, the cost of developing language technologies, such as text messaging, spellcheckers and machine-aided translation will be reduced. These technologies would give Tibetans the choice to use their language as they see fit in a world that is increasingly shaped by digital communication.

Dr Nathan W. Hill (SOAS) – A rule based tagger for Classical Tibetan





<<Back to conference page>>

share this entry: