Abstract. Nutrient data from catchments discharging to receiving waters are monitored for catchment
management. However, nutrient data are often sparse in time and space and have non-linear
responses to environmental factors, making it difficult to systematically analyse long- and
short-term trends and undertake nutrient budgets. To address these challenges, we developed a
hybrid machine learning (ML) framework that first separated baseflow and quickflow from total
flow, generated data for missing nutrient species, and then utilised the pre-generated nutrient
data as additional variables in a final simulation of tributary water quality. Hybrid random
forest (RF) and gradient boosting machine (GBM) models were employed and their performance
compared with a linear model, a multivariate weighted regression model, and stand-alone RF and GBM
models that did not pre-generate nutrient data. The six models were used to predict six different
nutrients discharged from two study sites in Western Australia: Ellen Brook (small and ephemeral)
and the Murray River (large and perennial). Our results showed that the hybrid RF and GBM models
had significantly higher accuracy and lower prediction uncertainty for almost all nutrient species
across the two sites. The pre-generated nutrient and hydrological data were highlighted as the
most important components of the hybrid model. The model results also indicated different
hydrological transport pathways for total nitrogen (TN) export from two tributary catchments. We demonstrated that
the hybrid model provides a flexible method to combine data of varied resolution and quality and
is accurate for the prediction of responses of surface water nutrient concentrations to hydrologic
variability.