Vol. 2005 No. 1 (2005)
African Perspectives on Natural Language Processing Challenges and Opportunities
Abstract
Natural Language Processing (NLP) is a field within Computer Science that focuses on enabling computers to understand and process human language. Despite its widespread applications in various domains, NLP has not fully addressed the challenges posed by African languages due to their unique linguistic features. A mixed-method approach was employed, including surveys of linguistic experts and practitioners, focus group discussions with language users, and an online survey targeting developers working on NLP projects. Data were analysed using thematic analysis and statistical software to quantify findings and identify patterns. The analysis revealed that the primary challenge in developing NLP systems for African languages is the limited availability of annotated datasets, which constitute more than 70% of the variance explained by the model (R² = 0.74). This finding underscores a critical gap in current research efforts. This study highlights the significant challenges faced in developing NLP technologies for African languages and identifies a clear need for greater investment in annotated datasets to support research and development in this field. Developers should prioritise the creation of comprehensive annotated datasets, while researchers must collaborate more closely with linguistic experts to ensure that these resources are culturally and linguistically appropriate. Policy makers could facilitate this by allocating funds towards building such datasets. Model estimation used $\hat{\theta}=argmin_{\theta}\sum_i\ell(y_i,f_\theta(x_i))+\lambda\lVert\theta\rVert_2^2$, with performance evaluated using out-of-sample error.