Event Date: 21 March 2014
22 Russell Square
SOAS, University of London,
The Department of Linguistics at SOAS presents:
Tibetan in Digital Communication
Tibetan in Digital Communication is a research project funded by the Arts and Humanities Research Council, engaged in building a 1,000,000 syllable part-of-speech tagged corpus of Tibetan texts spanning the language’s entire history. In addition to the corpus, the project is developing a number of digital tools that allows for the corpus to be employed in many areas of humanities research, and enables other researchers to more easily develop their own corpora or software tools.
The corpus will itself be a powerful resource for scholars working with Tibetan language materials in a wide range of disciplines –including history, religion, literature and linguistics–since it offers ready access to, and comparison across, texts from different time periods, regions and genres. It will also provide an important foundation for subsequent work on a historically comprehensive, lexicographically rigorous dictionary of Tibetan, akin to the Oxford English Dictionary.
By building this corpus for Tibetan, the cost of developing language technologies, such as text messaging, spellcheckers and machine-aided translation will be reduced. These technologies would give Tibetans the choice to use their language as they see fit in a world that is increasingly shaped by digital communication.
Introduction to the day by Dr Nathan W. Hill (SOAS):
Dr Nathan W. Hill (SOAS) – Tibetan Word Breaking and Part of Speech Categories
Abel Zadoks (SOAS) – The Middle Tibetan auxiliary system
Dr Nathan W. Hill (SOAS) – A rule based tagger for Classical Tibetan
Dr Edward Garrett (SOAS) – An interface for corpus based Tibetan linguistics research