Original Research
Die ontwikkeling van ’n fleksievormgenereerder vir Afrikaans
Literator | Vol 29, No 1 | a102 |
DOI: https://doi.org/10.4102/lit.v29i1.102
| © 2008 S. Pilon
| This work is licensed under CC Attribution 4.0
Submitted: 25 July 2008 | Published: 25 July 2008
Submitted: 25 July 2008 | Published: 25 July 2008
About the author(s)
S. Pilon, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South AfricaFull Text:
PDF (260KB)Abstract
The development of an inflected form generator for Afrikaans
In this article the development of an inflected form generator for Afrikaans is described. Two requirements are set for this inflected form generator, viz. to generate only one specific inflected form of a lemma and to generate all possible inflected forms of a lemma. The decision to use machine learning instead of the more traditional rule-based approach in the development of this core-technology is explained and a brief overview of the development of LIA, a lemmatiser for Afrikaans, is given. Experiments are done with three different methods and it is shown that the most effective way of developing an inflected form generator for Afrikaans is by training different classifiers for each affix. Therefore a classifier is trained to generate a plural form, one to generate the diminutive, one to generate the plural of diminutive, et cetera. The final inflected form generator for Afrikaans (AIL-3) reaches an average accuracy of 86,37% on the training data and 86,88% on a small amount of new data. It is indicated that, with the help of a preprocessing module, AIL-3 meets the requirements that were set for an Afrikaans inflected form generator. Finally suggestions are made on how to improve the accuracy of AIL-3.
In this article the development of an inflected form generator for Afrikaans is described. Two requirements are set for this inflected form generator, viz. to generate only one specific inflected form of a lemma and to generate all possible inflected forms of a lemma. The decision to use machine learning instead of the more traditional rule-based approach in the development of this core-technology is explained and a brief overview of the development of LIA, a lemmatiser for Afrikaans, is given. Experiments are done with three different methods and it is shown that the most effective way of developing an inflected form generator for Afrikaans is by training different classifiers for each affix. Therefore a classifier is trained to generate a plural form, one to generate the diminutive, one to generate the plural of diminutive, et cetera. The final inflected form generator for Afrikaans (AIL-3) reaches an average accuracy of 86,37% on the training data and 86,88% on a small amount of new data. It is indicated that, with the help of a preprocessing module, AIL-3 meets the requirements that were set for an Afrikaans inflected form generator. Finally suggestions are made on how to improve the accuracy of AIL-3.
Keywords
Afrikaanslinguistics; Core Technologies; Inflected Form Generator; Lemmatiser; Machine Learning
Metrics
Total abstract views: 2960Total article views: 2623