scholarly journals Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area

Water ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 638 ◽  
Author(s):  
Nan Wang ◽  
Weiming Cheng ◽  
Min Zhao ◽  
Qiangyi Liu ◽  
Jing Wang

The distinguishable sediment concentration, density, and transport mechanisms characterize the different magnitudes of destruction due to debris flow process (DFP). Identifying the dominating DFP type within a catchment is of paramount importance in determining the efficient delineation and mitigation strategies. However, few studies have focused on the identification of the DFP types (including water-flood, debris-flood, and debris-flow) based on machine learning methods. Therefore, while taking Beijing as the study area, this paper aims to establish an integrated framework for the identification of the DFP types, which consists of an indicator calculation system, imbalance dataset learning (borderline-Synthetic Minority Oversampling Technique (borderline-SMOTE)), and classification model selection (Random Forest (RF), AdaBoost, Gradient Boosting (GBDT)). The classification accuracies of the models were compared and the significance of parameters was then assessed. The results indicate that Random Forest has the highest accuracy (0.752), together with the highest area under the receiver operating characteristic curve (AUROC = 0.73), and the lowest root-mean-square error (RMSE = 0.544). This study confirms that the catchment shape and the relief gradient features benefit the identification of the DFP types. Whereby, the roughness index (RI) and the Relief ratio (Rr) can be used to effectively describe the DFP types. The spatial distribution of the DFP types is analyzed in this paper to provide a reference for diverse practical measures, which are suitable for the particularity of highly destructive catchments.

2020 ◽  
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
Williams D. Beavis

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Stephanie O Frisch ◽  
Zeineb Bouzid ◽  
Jessica Zègre-Hemsey ◽  
Clifton W CALLAWAY ◽  
Holli A Devon ◽  
...  

Introduction: Overcrowded emergency departments (ED) and undifferentiated patients make the provision of care and resources challenging. We examined whether machine learning algorithms could identify ED patients’ disposition (hospitalization and critical care admission) using readily available objective triage data among patients with symptoms suggestive of acute coronary syndrome (ACS). Methods: This was a retrospective observational cohort study of adult patients who were triaged at the ED for a suspected coronary event. A total of 162 input variables (k) were extracted from the electronic health record: demographics (k=3), mode of transportation (k=1), past medical/surgical history (k=57), first ED vital signs (k=7), home medications (k=31), symptomology (k=40), and the computer generated automatic interpretation of 12-lead electrocardiogram (k=23). The primary outcomes were hospitalization and critical care admission (i.e., admission to intensive or step-down care unit). We used 10-fold stratified cross validation to evaluate the performance of five machine learning algorithms to predict the study outcomes: logistic regression, naïve Bayes, random forest, gradient boosting and artificial neural network classifiers. We determined the best model by comparing the area under the receiver operating characteristic curve (AUC) of all models. Results: Included were 1201 patients (age 64±14, 39% female; 10% Black) with a total of 956 hospitalizations, and 169 critical care admissions. The best performing machine learning classifier for the outcome of hospitalization was gradient boosting machine with an AUC of 0.85 (95% CI, 0.82–0.89), 89% sensitivity, and F-score of 0.83; random forest classifier performed the best for the outcome of critical care admission with an AUC of 0.73 (95% CI, 0.70–0.77), 76% sensitivity, and F-score of 0.56. Conclusion: Predictive machine learning algorithms demonstrate excellent to good discriminative power to predict hospitalization and critical care admission, respectively. Administrators and clinicians could benefit from machine learning approaches to predict hospitalization and critical care admission, to optimize and allocate scarce ED and hospital resources and provide optimal care.


Author(s):  
Nelson Yego ◽  
Juma Kasozi ◽  
Joseph Nkrunziza

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.


2021 ◽  
Vol 11 (21) ◽  
pp. 10336
Author(s):  
Yitao Wang ◽  
Lei Yang ◽  
Xin Song ◽  
Quan Chen ◽  
Zhenguo Yan

AIS (Automatic Identification System) is an effective navigation aid system aimed to realize ship monitoring and collision avoidance. Space-based AIS data, which are received by satellites, have become a popular and promising approach for providing ship information around the world. To recognize the types of ships from the massive space-based AIS data, we propose a multi-feature ensemble learning classification model (MFELCM). The method consists of three steps. Firstly, the static and dynamic information of the original data is preprocessed and features are then extracted in order to obtain static feature samples, dynamic feature distribution samples, time-series samples, and time-series feature samples. Secondly, four base classifiers, namely Random Forest, 1D-CNN (one-dimensional convolutional neural network), Bi-GRU (bidirectional gated recurrent unit), and XGBoost (extreme gradient boosting), are trained by the above four types of samples, respectively. Finally, the base classifiers are integrated by another Random Forest, and the final ship classification is outputted. In this paper, we use the global space-based AIS data of passenger ships, cargo ships, fishing boats, and tankers. The model gets a total accuracy of 0.9010 and an F1 score of 0.9019. The experiments prove that MFELCM is better than the base classifiers. In addition, MFELCM can achieve near real-time online classification, which has important applications in ship behavior anomaly detection and maritime supervision.


2021 ◽  
Vol 25 (5) ◽  
pp. 1291-1322
Author(s):  
Sandeep Kumar Singla ◽  
Rahul Dev Garg ◽  
Om Prakash Dubey

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis based on machine learning methods. A number of machine learning data analytical tools may be exploited for the classification and regression problems. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market. Machine learning ensemble methods such as Decision Tree (C5.0), Classification and Regression (CART), Gradient Boosting Machine (GBM) and Random Forest (RF) has been investigated in the proposed work. The proposed work demonstrates that temporal variations in the spectral data and computational efficiency of machine learning methods may be effectively used for the discrimination of types of sugarcane. The discrimination has been considered as a binary classification problem to segregate ratoon from plantation sugarcane. Variable importance selection based on Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG) have been used to create the appropriate dataset for the classification. The performance of the binary classification model based on RF is the best in all the possible combination of input images. Feature selection based on MDA and MDG measures of RF is also important for the dimensionality reduction. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM method is the lowest. Binary classification based on the remotely sensed data can be effectively handled using random forest method.


2020 ◽  
Author(s):  
Zhu Liang ◽  
Changming Wang ◽  
Donghe Ma ◽  
Kaleem Ullah Jan Khan

Abstract. he aim of the present study is to explore the potential relationship between debris flow and soil slide by establishing susceptibility zoning maps (SZM) separately with the use of random forest. Longzi County, located in Southeastern Tibet, where historical landslides occurred commonly, was selected as the study area. The work has been carried out with the following steps: (1) An inventory map consisting of 448 landslides (399 soil slides and 49 debris flows) was determined; (2) Slope units and 11 conditioning factors were prepared for the susceptibility modelling of landslide while watershed units and 12 factors for debris flow; (3) SZM were constructed for landslide and debris flow, respectively, with the use of random forest; (4) The performance of two models were evaluated by 5-fold cross-validation using relative operating characteristic curve (ROC), area under the curve (AUC) and statistical measures; (5) The potential relationship between soil slide and debris flow was explored by the superimposition of two zoning maps; (6) Gini index was applied to determined the major factors and analyze the difference between debris flow and soil slide; (7) A combined susceptibility map with two kinds of disaster was obtained. Two models had demonstrated great predictive capabilities, of which accuracy and AUC was 87.33 %, 0.902 and 85.17 %, 0.892, respectively. The loose sources need by the debris flow were not necessarily brought by the landslides although most landslides can be converted into debris flow. The area prone to debris flow did not promote the occurrence of landslide. A susceptibility zoning map composed of two or more natural disasters is comprehensive and significant in this regard, which provides valuable reference for researches of disaster-chain and engineering applications.


Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4238
Author(s):  
Fanglin Mu ◽  
Yu Gu ◽  
Jie Zhang ◽  
Lei Zhang

In this study, an electronic nose (E-nose) consisting of seven metal oxide semiconductor sensors is developed to identify milk sources (dairy farms) and to estimate the content of milk fat and protein which are the indicators of milk quality. The developed E-nose is a low cost and non-destructive device. For milk source identification, the features based on milk odor features from E-nose, composition features (Dairy Herd Improvement, DHI analytical data) from DHI analysis and fusion features are analyzed by principal component analysis (PCA) and linear discriminant analysis (LDA) for dimension reduction and then three machine learning algorithms, logistic regression (LR), support vector machine (SVM), and random forest (RF), are used to construct the classification model of milk source (dairy farm) identification. The results show that the SVM model based on the fusion features after LDA has the best performance with the accuracy of 95%. Estimation model of the content of milk fat and protein from E-nose features using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and random forest (RF) are constructed. The results show that the RF models give the best performance (R2 = 0.9399 for milk fat; R2 = 0.9301 for milk protein) and indicate that the proposed method in this study can improve the estimation accuracy of milk fat and protein, which provides a technical basis for predicting the quality of milk.


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


Sign in / Sign up

Export Citation Format

Share Document