Prediction model based on the Laplacian eigenmap method combined with a random forest algorithm for rainstorm satellite images during the first annual rainy season in South China

2021 ◽  
Author(s):  
Xiao-yan Huang ◽  
Li He ◽  
Hua-sheng Zhao ◽  
Ying Huang ◽  
Yu-shuang Wu
2014 ◽  
Vol 20 (5) ◽  
Author(s):  
P. Dohnalek ◽  
M. Dvorsky ◽  
P. Gajdos ◽  
L. Michalek ◽  
R. Sebesta ◽  
...  

Machines ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 69 ◽  
Author(s):  
Gino Iannace ◽  
Giuseppe Ciaburro ◽  
Amelia Trematerra

Wind energy is one of the most widely used renewable energy sources in the world and has grown rapidly in recent years. However, the wind towers generate a noise that is perceived as an annoyance by the population living near the wind farms. It is therefore important to new tools that can help wind farm builders and the administrations. In this study, the measurements of the noise emitted by a wind farm and the data recorded by the supervisory control and data acquisition (SCADA) system were used to construct a prediction model. First, acoustic measurements and control system data have been analyzed to characterize the phenomenon. An appropriate number of observations were then extracted, and these data were pre-processed. Subsequently two models of prediction of sound pressure levels were built at the receiver: a model based on multiple linear regression, and a model based on Random Forest algorithm. As predictors wind speeds measured near the wind turbines and the active power of the turbines were selected. Both data were measured by the SCADA system of wind turbines. The model based on the Random Forest algorithm showed high values of the Pearson correlation coefficient (0.981), indicating a high number of correct predictions. This model can be extremely useful, both for the receiver and for the wind farm manager. Through the results of the model it will be possible to establish for which wind speed values the noise produced by wind turbines become dominant. Furthermore, the predictive model can give an overview of the noise produced by the receiver from the system in different operating conditions. Finally, the prediction model does not require the shutdown of the plant, a very expensive procedure due to the consequent loss of production.


2020 ◽  
Vol 221 (Supplement_2) ◽  
pp. S263-S271 ◽  
Author(s):  
Peng Lan ◽  
Qiucheng Shi ◽  
Ping Zhang ◽  
Yan Chen ◽  
Rushuang Yan ◽  
...  

Abstract Background Hypervirulent Klebsiella pneumoniae (hvKP) infections can have high morbidity and mortality rates owing to their invasiveness and virulence. However, there are no effective tools or biomarkers to discriminate between hvKP and nonhypervirulent K. pneumoniae (nhvKP) strains. We aimed to use a random forest algorithm to predict hvKP based on core-genome data. Methods In total, 272 K. pneumoniae strains were collected from 20 tertiary hospitals in China and divided into hvKP and nhvKP groups according to clinical criteria. Clinical data comparisons, whole-genome sequencing, virulence profile analysis, and core genome multilocus sequence typing (cgMLST) were performed. We then established a random forest predictive model based on the cgMLST scheme to prospectively identify hvKP. The random forest is an ensemble learning method that generates multiple decision trees during the training process and each decision tree will output its own prediction results corresponding to the input. The predictive ability of the model was assessed by means of area under the receiver operating characteristic curve. Results Patients in the hvKP group were younger than those in the nhvKP group (median age, 58.0 and 68.0 years, respectively; P < .001). More patients in the hvKP group had underlying diabetes mellitus (43.1% vs 20.1%; P < .001). Clinically, carbapenem-resistant K. pneumoniae was less common in the hvKP group (4.1% vs 63.8%; P < .001), whereas the K1/K2 serotype, sequence type (ST) 23, and positive string tests were significantly higher in the hvKP group. A cgMLST-based minimal spanning tree revealed that hvKP strains were scattered sporadically within nhvKP clusters. ST23 showed greater genome diversification than did ST11, according to cgMLST-based allelic differences. Primary virulence factors (rmpA, iucA, positive string test result, and the presence of virulence plasmid pLVPK) were poor predictors of the hypervirulence phenotype. The random forest model based on the core genome allelic profile presented excellent predictive power, both in the training and validating sets (area under receiver operating characteristic curve, 0.987 and 0.999 in the training and validating sets, respectively). Conclusions A random forest algorithm predictive model based on the core genome allelic profiles of K. pneumoniae was accurate to identify the hypervirulent isolates.


2021 ◽  
Vol 8 ◽  
Author(s):  
Guan Wang ◽  
Yanbo Zhang ◽  
Sijin Li ◽  
Jun Zhang ◽  
Dongkui Jiang ◽  
...  

Objective: Preeclampsia affects 2–8% of women and doubles the risk of cardiovascular disease in women after preeclampsia. This study aimed to develop a model based on machine learning to predict postpartum cardiovascular risk in preeclamptic women.Methods: Collecting demographic characteristics and clinical serum markers associated with preeclampsia during pregnancy of 907 preeclamptic women retrospectively, we predicted the cardiovascular risk (ischemic heart disease, ischemic cerebrovascular disease, peripheral vascular disease, chronic kidney disease, metabolic system disease or arterial hypertension). The study samples were divided into training sets and test sets randomly in the ratio of 8:2. The prediction model was developed by 5 different machine learning algorithms, including Random Forest. 10-fold cross-validation was performed on the training set, and the performance of the model was evaluated on the test set.Results: Cardiovascular disease risk occurred in 186 (20.5%) of these women. By weighing area under the curve (AUC), the Random Forest algorithm presented the best performance (AUC = 0.711[95%CI: 0.697–0.726]) and was adopted in the feature selection and the establishment of the prediction model. The most important variables in Random Forest algorithm included the systolic blood pressure, Urea nitrogen, neutrophil count, glucose, and D-Dimer. Random Forest algorithm was well calibrated (Brier score = 0.133) in the test group, and obtained the highest net benefit in the decision curve analysis.Conclusion: Based on the general situation of patients and clinical variables, a new machine learning algorithm was developed and verified for the individualized prediction of cardiovascular risk in post-preeclamptic women.


2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


Sign in / Sign up

Export Citation Format

Share Document