Basin-centric long short-term memory (LSTM) network models have recently
been shown to be an exceptionally powerful tool for simulating stream
temperature (Ts, temperature measured in rivers), among other
hydrological variables. However, spatial extrapolation is a well-known
challenge to modeling Ts and it is uncertain how an LSTM-based daily Ts
model will perform in unmonitored or dammed basins. Here we compiled a
new benchmark dataset consisting of >400 basins for across
the contiguous United States in different data availability groups (DAG,
meaning the daily sampling frequency) with or without major dams and
study how to assemble suitable training datasets for predictions in
monitored or unmonitored situations. For temporal generalization,
CONUS-median best root-mean-square error (RMSE) values for sites with
extensive (99%), intermediate (60%), scarce (10%) and absent (0%,
unmonitored) data for training were 0.75, 0.83, 0.88, and 1.59°C,
representing the state of the art. For prediction in unmonitored basins
(PUB), LSTM’s results surpassed those reported in the literature. Even
for unmonitored basins with major reservoirs, we obtained a median RMSE
of 1.492°C and an R2 of 0.966. The most suitable training set was the
matching DAG that the basin could be grouped into, e.g., the 60% DAG
for a basin with 61% data availability. However, for PUB, a training
dataset including all basins with data is preferred. An input-selection
ensemble moderately mitigated attribute overfitting. Our results suggest
there are influential latent processes not sufficiently described by the
inputs (e.g., geology, wetland covers), but temporal fluctuations are
well predictable, and LSTM appears to be the more accurate Ts modeling
tool when sufficient training data are available.