Development Of A Finite-State Model For Morphological Processing Of Tuvan

Document Type

Conference Proceeding

Publication Date

2016

Published In

Rodnoy Yazyk

Abstract

This paper describes the development of a free/open-source finitestate morphological transducer for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST); we use the lexc formalism for modelling the morphotactics and twol formalism for modelling morphophonological alternations. We describe how the development of a transducer can provide new insight into grammatical generalisations, as the transducer functions as a testable model of the language’s morphology. Based on this, we add to the existing literature on Tuvan morphology a novel description of the morphological combinatorics of quasi-derivational morphemes in Tuvan, as well as some previously undescribed morphophonological phenomena. An evaluation is presented which shows that the transducer has a reasonable coverage—around 93%—on freely-available corpora of the language, and high precision—over 99%—on a manually verified test set.

Keywords

Tuvan, morphological analysis, finite-state transducers

Share

COinS