Understanding catchment influences on flood generating processes - accounting for correlated attributes

<p>Understanding flood generating mechanisms is critical for model development and evaluation. While several studies analyse how catchment attributes influence flood magnitude and duration, very few studies examine how they influence flood generating processes. Based on prior knowledge about runoff behaviour and flood generation, we assume that flood processes depend not only on climate, but also on catchment characteristics such as topography, vegetation and geology. Specifically, we hypothesize that the influence of catchment attributes on flood processes will vary between different climate types. We tested our hypothesis on the CAMELS dataset, a large sample (671) of catchments in the United States. We classified 61,828 flood events into flood process types using a previously published location-independent classification methodology. Then we quantified the importance of both individual attributes (comparing probability distributions of different flood types) and interacting attributes (using random forests). Accumulated local effects allow interpretability of random forest with correlated attributes. Results show that climate attributes most strongly influence the distribution of flood generating processes within a catchment. However, other catchment attributes can be influential, depending on climate type. Based on the subset of influential catchment attributes, a random forest model can predict flood generating processes with high accuracy for most processes and climates, demonstrating capabilities to predict flood processes in ungauged catchments. Some attributes proved less influential than common hydrologic knowledge would suggest and are not informative in predicting flood process distribution.</p>

Download Full-text

Acoustic Modeling of Meteorological Effects on Roadway Noise

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198121994584 ◽

2021 ◽

pp. 036119812199458

Author(s):

Roger L. Wayson ◽

Kenneth Kaliski

Keyword(s):

Road Traffic ◽

Traffic Noise ◽

Model Development ◽

The United States ◽

Noise Model ◽

Road Traffic Noise ◽

Atmospheric Conditions ◽

Noise Propagation ◽

Transportation Research ◽

Meteorological Effects

Modeling road traffic noise levels without including the effects of meteorology may lead to substantial errors. In the United States, the required model is the Traffic Noise Model which does not include meteorology effects caused by refraction. In response, the Transportation Research Board sponsored NCHRP 25-52, Meteorological Effects on Roadway Noise, to collect highway noise data under different meteorological conditions, document the meteorological effects on roadway noise propagation under different atmospheric conditions, develop best practices, and provide guidance on how to: (a) quantify meteorological effects on roadway noise propagation; and (b) explain those effects to the public. The completed project at 16 barrier and no-barrier measurement positions adjacent to Interstate 17 (I-17) in Phoenix, Arizona provided the database which has enabled substantial developments in modeling. This report provides more recent information on the model development that can be directly applied by the noise analyst to include meteorological effects from simple look-up tables to more precise use of statistical equations.

Download Full-text

Application of big data in healthcare: examination of the military experience

Health and Technology ◽

10.1007/s12553-020-00513-7 ◽

2021 ◽

Author(s):

David Berry

Keyword(s):

Big Data ◽

Vietnam War ◽

Business Models ◽

Model Development ◽

The United States ◽

Practical Implementation ◽

Systems Research ◽

System Versus ◽

Military Systems ◽

The Military

AbstractHealthcare is fully embracing the promise of Big Data for improving performance and efficiency. Such a paradigm shift, however, brings many unforeseen impacts both positive and negative. Healthcare has largely looked at business models for inspiration to guide model development and practical implementation of Big Data. Business models, however, are limited in their application to healthcare as the two represent a complicated system versus a complex system respectively. Healthcare must, therefore, look toward other examples of complex systems to better gauge the potential impacts of Big Data. Military systems have many similarities with healthcare with a wealth of systems research, as well as practical field experience, from which healthcare can draw. The experience of the United States Military with Big Data during the Vietnam War is a case study with striking parallels to issues described in modern healthcare literature. Core principles can be extracted from this analysis that will need to be considered as healthcare seeks to integrate Big Data into its active operations.

Download Full-text

Performance Testing of Coal Fired Power Plants

ASME 2007 Power Conference ◽

10.1115/power2007-22132 ◽

2007 ◽

Author(s):

Shane E. Powers ◽

William C. Wood

Keyword(s):

Power Plant ◽

Power Plants ◽

Performance Test ◽

Model Development ◽

Performance Testing ◽

The United States ◽

Plant Performance ◽

Cycle Performance ◽

Test Program

With the renewed interest in the construction of coal-fired power plants in the United States, there has also been an increased interest in the methodology used to calculate/determine the overall performance of a coal fired power plant. This methodology is detailed in the ASME PTC 46 (1996) Code, which provides an excellent framework for determining the power output and heat rate of coal fired power plants. Unfortunately, the power industry has been slow to adopt this methodology, in part because of the lack of some details in the Code regarding the planning needed to design a performance test program for the determination of coal fired power plant performance. This paper will expand on the ASME PTC 46 (1996) Code by discussing key concepts that need to be addressed when planning an overall plant performance test of a coal fired power plant. The most difficult aspect of calculating coal fired power plant performance is integrating the calculation of boiler performance with the calculation of turbine cycle performance and other balance of plant aspects. If proper planning of the performance test is not performed, the integration of boiler and turbine data will result in a test result that does not accurately reflect the true performance of the overall plant. This planning must start very early in the development of the test program, and be implemented in all stages of the test program design. This paper will address the necessary planning of the test program, including: • Determination of Actual Plant Performance. • Selection of a Test Goal. • Development of the Basic Correction Algorithm. • Designing a Plant Model. • Development of Correction Curves. • Operation of the Power Plant during the Test. All nomenclature in this paper utilizes the ASME PTC 46 definitions for the calculation and correction of plant performance.

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Probabilistic Precipitation Forecast Postprocessing Using Quantile Mapping and Rank-Weighted Best-Member Dressing

Monthly Weather Review ◽

10.1175/mwr-d-18-0147.1 ◽

2018 ◽

Vol 146 (12) ◽

pp. 4079-4098 ◽

Cited By ~ 13

Author(s):

Thomas M. Hamill ◽

Michael Scheuerer

Keyword(s):

Probability Distributions ◽

The United States ◽

Ensemble Forecast ◽

Distribution Functions ◽

Cumulative Distribution ◽

Precipitation Forecast ◽

Prediction System ◽

Quantile Mapping ◽

State Dependent ◽

Grid Points

Abstract Hamill et al. described a multimodel ensemble precipitation postprocessing algorithm that is used operationally by the U.S. National Weather Service (NWS). This article describes further changes that produce improved, reliable, and skillful probabilistic quantitative precipitation forecasts (PQPFs) for single or multimodel prediction systems. For multimodel systems, final probabilities are produced through the linear combination of PQPFs from the constituent models. The new methodology is applied to each prediction system. Prior to adjustment of the forecasts, parametric cumulative distribution functions (CDFs) of model and analyzed climatologies are generated using the previous 60 days’ forecasts and analyses and supplemental locations. The CDFs, which can be stored with minimal disk space, are then used for quantile mapping to correct state-dependent bias for each member. In this stage, the ensemble is also enlarged using a stencil of forecast values from the 5 × 5 surrounding grid points. Different weights and dressing distributions are assigned to the sorted, quantile-mapped members, with generally larger weights for outlying members and broader dressing distributions for members with heavier precipitation. Probability distributions are generated from the weighted sum of the dressing distributions. The NWS Global Ensemble Forecast System (GEFS), the Canadian Meteorological Centre (CMC) global ensemble, and the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble forecast data are postprocessed for April–June 2016. Single prediction system postprocessed forecasts are generally reliable and skillful. Multimodel PQPFs are roughly as skillful as the ECMWF system alone. Postprocessed guidance was generally more skillful than guidance using the Gamma distribution approach of Scheuerer and Hamill, with coefficients generated from data pooled across the United States.

Download Full-text

Modeling and analysis of COVID-19 new deaths using tree-based ensemble

10.36227/techrxiv.16566012.v1 ◽

2021 ◽

Author(s):

Ibrahim Abaker Targio Hashem ◽

Raja Sher Afgun Usmani ◽

Asad Ali Shah ◽

Abdulwahab Ali Almazroi ◽

Muhammad Bilal

Keyword(s):

Infectious Disease ◽

United States ◽

Random Forest ◽

Economic Activity ◽

The United States ◽

Gradient Boosting ◽

Health Crisis ◽

Modeling And Analysis ◽

Extreme Gradient Boosting ◽

The World

The COVID-19 pandemic has emerged as the world's most serious health crisis, affecting millions of people all over the world. The majority of nations have imposed nationwide curfews and reduced economic activity to combat the spread of this infectious disease. Governments are monitoring the situation and making critical decisions based on the daily number of new cases and deaths reported. Therefore, this study aims to predict the daily new deaths using four tree-based ensemble models i.e., Gradient Tree Boosting (GB), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Voting Regressor (VR) for the three most affected countries, which are the United States, Brazil, and India. The results showed that VR outperformed other models in predicting daily new deaths for all three countries. The predictions of daily new deaths made using VR for Brazil and India are very close to the actual new deaths, whereas the prediction of daily new deaths for the United States still needs to be improved.<br>

Download Full-text

Anything Lorenz Curves Can Do, Top Shares Can Do: Assessing the TopBot Family of Inequality Measures

Sociological Methods & Research ◽

10.1177/0049124118769106 ◽

2018 ◽

Vol 49 (4) ◽

pp. 947-981 ◽

Cited By ~ 2

Author(s):

Guillermina Jasso

Keyword(s):

Gini Coefficient ◽

Probability Distributions ◽

The United States ◽

Current Evidence ◽

Inequality Measure ◽

Lorenz Curves ◽

Inequality Measures ◽

Top Incomes ◽

The Gini Coefficient ◽

And Shifts

Newly precise evidence of the trajectory of top incomes in the United States and around the world relies on shares and ratios, prompting new inquiry into their properties as inequality measures. Current evidence suggests a mathematical link between top shares and the Gini coefficient and empirical links extending as well to the Atkinson measure. The work reported in this article strengthens that evidence, making several contributions: First, it formalizes the shares and ratios, showing that as monotonic transformations of each other, they are different manifestations of a single inequality measure, here called TopBot. Second, it presents two standard forms of TopBot, which satisfy the principle of normalization. Third, it presents a new link between top shares and the Gini coefficient, showing that properties and results associated with the Lorenz curve pertain as well to top shares. Fourth, it investigates TopBot in mathematically specified probability distributions, showing that TopBot is monotonically related to classical measures such as the Gini, Atkinson, and Theil measures and the coefficient of variation. Thus, TopBot appears to be a genuine inequality measure. Moreover, TopBot is further distinguished by its ease of calculation and ease of interpretation, making it an appealing People’s measure of inequality. This work also provides new insights, for example, that, given nonlinearities in the (monotonic) relations among inequality measures, Spearman correlations are more appropriate than Pearson correlations and that weakening of correlations signals differences and shifts in distributional form, themselves signals of income dynamics.

Download Full-text

Enhancing the Performance of a Model to Predict Driving Distraction with the Random Forest Classifier

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211018695 ◽

2021 ◽

pp. 036119812110186

Author(s):

Samira Ahangari ◽

Mansoureh Jeihani ◽

Anam Ardeshiri ◽

Md Mahmudur Rahman ◽

Abdollah Dehzangi

Keyword(s):

Random Forest ◽

Driving Simulator ◽

Situational Awareness ◽

True Positive Rate ◽

The United States ◽

Driving Performance ◽

Lane Changing ◽

Wide Range ◽

Positive Rate ◽

Better Than

Distracted driving is known to be one of the main causes of crashes in the United States, accounting for about 40% of all crashes. Drivers’ situational awareness, decision-making, and driving performance are impaired as a result of temporarily diverting their attention from the primary task of driving to other unrelated tasks. Detecting driver distraction would help in adapting the most effective countermeasures. To tackle this problem, we employed a random forest (RF) classifier, one of the best classifiers that has attained promising results for a wide range of problems. Here, we trained RF using the data collected from a driving simulator, in which 92 participants drove under six different distraction scenarios of handheld calling, hands-free calling, texting, voice command, clothing, and eating/drinking on four different road classes (rural collector, freeway, urban arterial, and local road in a school zone). Various driving performance measures such as speed, acceleration, throttle, lane changing, brake, collision, and offset from the lane center were investigated. Using the RF method, we achieved 76.5% prediction accuracy on the independent test set, which is over 8.2% better than results reported in previous studies. We also obtained a 76.6% true positive rate, which is 14% better than those reported in previous studies. Such results demonstrate the preference of RF over other machine learning methods to identify driving distractions.

Download Full-text

0386 Identification of Sleep Complaints Using Social Media: Effect of the Daylight Savings Time to Standard Time Transition

SLEEP ◽

10.1093/sleep/zsaa056.383 ◽

2020 ◽

Vol 43 (Supplement_1) ◽

pp. A148-A148

Author(s):

O J Veatch ◽

D R Mazzotti

Keyword(s):

Machine Learning ◽

Social Media ◽

Random Forest ◽

The United States ◽

Natural Experiments ◽

Media Effect ◽

Sleep Complaints ◽

Standard Time ◽

The Impact ◽

Circadian Patterns

Abstract Introduction Transitions to and from daylight savings time (DST) are natural experiments of circadian disruption and are associated with negative health consequences. Yet, the majority of the United States and several other countries still adopt these changes. Large observational studies focused on understanding the impact of DST transitions on sleep are difficult to conduct. Social media platforms, like Twitter, are powerful sources of human behavior data. We used machine learning to identify tweets reporting sleep complaints (TRSC) during the week of the standard time (ST)-DST transition. Next, we evaluated the circadian patterns of TRSC and compared their prevalence before and after the transition. Methods Using data publicly available via the Twitter API, we collected 500 tweets with evidence of sleep complaints, and manually annotated each tweet to validate true sleep complaints. Next, we calculated term frequency-inverse document frequency of each word in each tweet and trained a random forest to classify TRSC using a 3-fold cross-validation design. The trained model was then used to annotate a collection of tweets captured between Oct. 30, 2019-Nov. 6, 2019, overlapping with the DST-ST transition, which occurred on Nov. 3, 2019. Results Random forest demonstrated good performance in classifying TRSC (AUC[95%CI]=0.85[0.82-0.89]). This model was applied to 3,738,383 tweets collected around the DST-ST transition, and identified 11,044 TRSC. Posting of these tweets had a circadian pattern, with peak during nighttime. We found a higher frequency of TRSC after the DST-ST transition (0.33% vs. 0.27%, p<0.00001), corresponding to a ~20% increase in the odds of reporting sleep complaints (OR[95%CI]=1.21[1.16-1.25]). Conclusion Using machine learning and Twitter data, we identified tweets reporting sleep complaints, described their circadian patterns and demonstrated that the prevalence of these types of tweets is significantly increased after the transition from DST to ST. These results demonstrate the applicability of social media data mining for public health in sleep medicine. Support NIH (K01LM012870); AASM Foundation (194-SR-18)

Download Full-text

Random forest predictive model development with uncertainty analysis capability for the estimation of evapotranspiration in an arid oasis region

Hydrology Research ◽

10.2166/nh.2020.012 ◽

2020 ◽

Vol 51 (4) ◽

pp. 648-665

Author(s):

Min Wu ◽

Qi Feng ◽

Xiaohu Wen ◽

Ravinesh C. Deo ◽

Zhenliang Yin ◽

...

Keyword(s):

Random Forest ◽

Uncertainty Analysis ◽

Predictive Model ◽

Air Temperature ◽

Model Development ◽

Influential Factor ◽

Weather Data ◽

Data Sets ◽

Oasis Area ◽

Lower Uncertainty

Abstract The study evaluates the potential utility of the random forest (RF) predictive model used to simulate daily reference evapotranspiration (ET0) in two stations located in the arid oasis area of northwestern China. To construct an accurate RF-based predictive model, ET0 is estimated by an appropriate combination of model inputs comprising maximum air temperature (Tmax), minimum air temperature (Tmin), sunshine durations (Sun), wind speed (U2), and relative humidity (Rh). The output of RF models are tested by ET0 calculated using Penman–Monteith FAO 56 (PMF-56) equation. Results showed that the RF model was considered as a better way to predict ET0 for the arid oasis area with limited data. Besides, Rh was the most influential factor on the behavior of ET0, except for air temperature in the proposed arid area. Moreover, the uncertainty analysis with a Monte Carlo method was carried out to verify the reliability of the results, and it was concluded that RF model had a lower uncertainty and can be used successfully in simulating ET0. The proposed study shows RF as a sound modeling approach for the prediction of ET0 in the arid areas where reliable weather data sets are available, but relatively limited.

Download Full-text