Replication Study on Natural Language Processing for African Languages in Cape Verde Context

F; e; r; n; a; n; d; o; C; o; e; l; h; o; L; i; m; a

doi:10.5281/zenodo.18819594

Abstract

Natural Language Processing (NLP) is a critical area in Computer Science with applications ranging from machine translation to sentiment analysis. In Africa, particularly in Cape Verde, where multiple indigenous languages coexist, NLP research has faced challenges due to the diversity of languages and limited resources. The methodology involves re-analysing data from previous studies using established machine learning algorithms and evaluating their performance across various linguistic tasks within Cape Verdean Creole. Specific attention is given to identifying patterns that may not have been apparent in the original research. A key finding was a significant improvement (p < 0.05) in sentiment analysis accuracy when utilising a custom pre-trained model tailored for Cape Verdean Creole, indicating potential benefits of localized linguistic resources. The replication study confirms and enhances the applicability of NLP models to African languages, particularly in the context of Cape Verde. This research contributes valuable insights into resource development for under-resourced languages. Future work should focus on expanding the scope of pre-trained models to include additional Cape Verdean Creole dialects and integrate user feedback loops to refine model performance continuously. Natural Language Processing, African Languages, NLP Models, Cape Verde, Machine Learning Model estimation used $\hat{\theta}=argmin{\theta}\sumi\ell(yi,f\theta(xi))+\lambda\lVert\theta\rVert2^2$, with performance evaluated using out-of-sample error.