Abstract
Water treatment systems in many developing nations face chronic underperformance, leading to unreliable supply. Accurate forecasting of system yield is critical for infrastructure planning and operational management, yet robust, context-specific models are often lacking. This study aimed to develop and evaluate a novel time-series forecasting model to predict treated water yield, thereby providing a methodological framework for performance improvement in treatment facilities. A seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) model was developed, formalised as $\phi(B)\Phi(B^s)\nabla^d\nablas^D yt = \theta(B)\Theta(B^s)\epsilont + \beta Xt$, where $X_t$ represents rainfall and operational expenditure. The model was trained and validated using high-frequency operational data from multiple facilities. The model achieved a mean absolute percentage error of 8.7% on test data, with a 95% confidence interval for one-year-ahead forecasts indicating a potential yield improvement of 12–18% through optimised chemical dosing schedules aligned with predicted raw water quality. The proposed SARIMAX model provides a statistically robust and operationally actionable tool for forecasting treated water yield, demonstrating superior accuracy over conventional moving-average approaches in this context. Water utilities should integrate this forecasting methodology into their asset management systems to enable proactive maintenance and resource allocation. Further research should focus on real-time model integration using supervisory control and data acquisition (SCADA) systems. water treatment yield, time-series analysis, SARIMAX, forecasting, infrastructure performance, operational management This paper presents a novel application of a SARIMAX model incorporating local climatic and operational drivers to forecast water treatment yield, providing a new evidence-based tool for engineers managing similar systems in resource-constrained settings.