Replicating NLP Approaches for African Languages in Ethiopian Contexts: Challenges and Opportunities

Mekuria Negash; Sileshi Assefa; Yared Mengesha; Tsegaye Abebe

doi:10.5281/zenodo.18958197

Vol. 2012 No. 1 (2012)

Replicating NLP Approaches for African Languages in Ethiopian Contexts: Challenges and Opportunities

Mekuria Negash, Department of Cybersecurity, Africa Centers for Disease Control and Prevention (Africa CDC), Addis Ababa Sileshi Assefa, Ethiopian Institute of Agricultural Research (EIAR) Yared Mengesha, Department of Software Engineering, Bahir Dar University Tsegaye Abebe, Bahir Dar University

DOI: 10.5281/zenodo.18958197

Published: January 22, 2012

Abstract

Natural Language Processing (NLP) has seen significant progress in processing English and other widely spoken languages. However, there is a growing need to develop NLP techniques for African languages, particularly those with limited resources such as Amharic, which is the official language of Ethiopia. We followed the methodology outlined in the original study, using similar data sets but with an additional dataset of Amharic texts collected from various sources within Ethiopia. The NLP tasks include part-of-speech tagging and named entity recognition (NER). In our replication study, we observed a precision rate of 85% for part-of-speech tagging across all datasets, with slight variations in performance due to differences in text complexity and domain specificity. The findings suggest that the NLP techniques developed for English can be effectively applied to Amharic without substantial modifications. However, further research is needed to validate these results on larger datasets and in different contexts. Future studies should aim to identify and address potential biases or limitations specific to the Amharic language and Ethiopian context. Additionally, there is a need for more diverse and representative data sets to improve model generalization. Model estimation used $\hat{\theta}=argmin_{\theta}\sum_i\ell(y_i,f_\theta(x_i))+\lambda\lVert\theta\rVert_2^2$, with performance evaluated using out-of-sample error.

Full Text:

HTML (English (United Kingdom)) HTML (English (United Kingdom))

Read the Full Article

The HTML galley is loaded below for inline reading and better discovery.

Open HTML Open PDF

How to Cite

Mekuria Negash, Sileshi Assefa, Yared Mengesha, Tsegaye Abebe (2012). Replicating NLP Approaches for African Languages in Ethiopian Contexts: Challenges and Opportunities. African GIS in Urban Planning (Technical/Methodology), Vol. 2012 No. 1 (2012). https://doi.org/10.5281/zenodo.18958197

EndNote (RIS) BibTeX

Keywords

AfricanEthopiaComputational LinguisticsData-DrivenMachine Learning

Research Snapshot

Desktop reading view

Language

EN

Formats

HTML + PDF

Publication Track

Vol. 2012 No. 1 (2012)

Current Journal

African GIS in Urban Planning (Technical/Methodology)

Replicating NLP Approaches for African Languages in Ethiopian Contexts: Challenges and Opportunities

Abstract

Full Text:

Read the Full Article

How to Cite

Keywords

Research Snapshot

References