Outomatiese lemma-identifisering vir Afrikaans

H.J. Groenewald; G.B. van Huyssteen

doi:10.4102/lit.v29i1.101

Original Research

Outomatiese lemma-identifisering vir Afrikaans

H.J. Groenewald, G.B. van Huyssteen

About the author(s)

H.J. Groenewald, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South Africa
G.B. van Huyssteen, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South Africa

Full Text:

PDF (250KB)

Abstract

Automatic lemmatisation for Afrikaans

Automatic lemmatisation is a general normalisation procedure in text processing, where all inflected forms of a lexical word are normalised to a single lemma (i.e. a meaningful, uninflected base form from which more complex word forms could be formed). Traditionally, lemmatisers are developed by writing language-specific rules to identify lemmas. In this article an alternative approach is investigated, namely a machine learning approach, to develop a lemmatiser for Afrikaans (LIA: “Lemmaidentifiseerder vir Afrikaans”). An overview regarding the process of inflection in Afrikaans is provided with the aim of identifying the categories of inflection that are relevant for lemmatisation in Afrikaans. The format of the input and output is described with special reference to the nine inflectional categories for Afrikaans that the system should be able to handle. Then the task of lemmatisation as a classification task for machine learning is described, and a concise introduction to memory-based learning is provided. The development and evaluation of LIA is discussed in detail, and it is illustrated how the performance of the initial classifier is improved through feature selection and parameter optimisation. The best classifier reaches an accuracy of 92,8%. The article concludes with a view on some future work.

Keywords

Afrikaans; Feature Selection; Inflection; Lemmatisation; Machine Learning; Morphology; Natural Language Processing; Parameter Optimisation; Text Technology

Metrics

Total abstract views: 2983
Total article views: 2856

Crossref Citations

No related citations found.

African Online Scientific Information Systems (Pty) Ltd t/a AOSIS
Reg No: 2002/002017/07
International Tel: +27 21 975 2602
5 Hafele Street, Durbanville, Cape Town, 7550, South Africa
publishing(AT)aosis.co.za replace (AT) with @

All articles published in this journal are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, unless otherwise stated.
Website design & content: ©2024 AOSIS (Pty) Ltd. All rights reserved. No unauthorised duplication allowed.
By continuing to use this website, you agree to our Privacy Policy, Terms of Use and Security Policy.

________

Subscribe to our newsletter

Get specific, domain-collection newsletters detailing the latest CPD courses, scholarly research and call-for-papers in your field.

Literator | ISSN: 0258-2279 (PRINT) | ISSN: 2219-8237 (ONLINE)