Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

S. Pilon; M.J. Puttkammer; G.B. van Huyssteen

doi:10.4102/lit.v29i1.99

Original Research

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

S. Pilon, M.J. Puttkammer, G.B. van Huyssteen

About the author(s)

S. Pilon, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South Africa
M.J. Puttkammer, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South Africa
G.B. van Huyssteen, Sentrum vir Tekstegnologie (CTexT), Potchefstroomkampus, Noordwes-Universiteit, South Africa

Full Text:

PDF (213KB)

Abstract

The development of a hyphenator and compound analyser for Afrikaans

The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Keywords

Afrikaans Linguistics; Compound Analyser; Core-Technologies; Hyphenator; Machine Learning

Metrics

Total abstract views: 3226
Total article views: 2596

Crossref Citations

No related citations found.

African Online Scientific Information Systems (Pty) Ltd t/a AOSIS
Reg No: 2002/002017/07
International Tel: +27 21 975 2602
5 Hafele Street, Durbanville, Cape Town, 7550, South Africa
publishing(AT)aosis.co.za replace (AT) with @

All articles published in this journal are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, unless otherwise stated.
Website design & content: ©2024 AOSIS (Pty) Ltd. All rights reserved. No unauthorised duplication allowed.
By continuing to use this website, you agree to our Privacy Policy, Terms of Use and Security Policy.

________

Subscribe to our newsletter

Get specific, domain-collection newsletters detailing the latest CPD courses, scholarly research and call-for-papers in your field.

Literator | ISSN: 0258-2279 (PRINT) | ISSN: 2219-8237 (ONLINE)