Abstract. Evapotranspiration (ET) is a major component of the land surface process involved in energy fluxes and balance, especially in the hydrological cycle of agricultural ecosystems. While many models have been developed to estimate ET, there has been no agreement on which model has the best performance. In this study, we evaluate four widely used ET models (i.e., the Shuttleworth Wallace (SW) model, Penman-Monteith (PM) model, Priestley-Taylor and Flint-Childs (PT-FC) model, and Advection-Aridity (AA) model) by using half-hourly ET observations obtained at a spring maize field in an arid region. The model evaluation is based on Bayesian model comparison and ranking using the Bayesian model evidence (BME), which balances between goodness-of-fit to data and model complexity. The BME-based model ranking (from the best to the worst) is SW, PM, PT-FC, and AA. The residuals between observations and corresponding model simulations are also analyzed, and the same model ranking is also obstained by using residual-based statistics, i.e., the coefficient of determination (R2), index of agreement (IA), root mean square error (RMSE) and model efficiency (EF). The PM and SW models overestimate ET, whereas the PT-FC and AA models underestimate ET in the study period. The four models also underestimate ET during the periods of partial crop cover. Especially during the late maturity stage, the PT-FC and AA models consistently produce an underestimation, and provide the worst simulated ET. As a result, at the half-hourly time scale, the SW model is the best model and recommend as the first choice for evaluating ET of spring maize in arid desert oasis areas.