scholarly journals Improving Radar-Based Precipitation Nowcasts with Machine Learning Using an Approach Based on Random Forest

2020 ◽  
Vol 35 (6) ◽  
pp. 2461-2478
Author(s):  
Yiwen Mao ◽  
Asgeir Sorteberg

AbstractA binary classification model is trained by random forest using data from 41 stations in Norway to predict the precipitation in a given hour. The predictors consist of results from radar nowcasts and numerical weather predictions. The results demonstrate that the random forest model can improve the precipitation predictions by the radar nowcasts and the numerical weather predictions. This study clarifies whether certain potential factors related to model training can influence the predictive skill of the random forest method. The results indicate that enforcing a balanced prediction by resampling the training datasets or lowering the threshold probability for classification cannot improve the predictive skill of the random forest model. The study reveals that the predictive skill of the random forest model shows seasonality, but is only weakly influenced by the geographic diversity of the training dataset. Finally, the study shows that the most important predictor is the precipitation predictions by the radar nowcasts followed by the precipitation predictions by the numerical weather predictions. Although meteorological variables other than precipitation are weaker predictors, the results suggest that they can help to reduce the false alarm ratio and to increase the success ratio of the precipitation prediction.

Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 1312-1312
Author(s):  
Panxiang Cao ◽  
Mingyu Wang ◽  
Guangsi Zhang ◽  
Fang Wang ◽  
Xue Chen ◽  
...  

Abstract Background B-cell precursor acute lymphoblastic leukemia (B-ALL) is a genetically heterogeneous group of acute leukemia with stage-specific phenotypes and cytogenetic features. Although the research on the molecular profile of B-ALL benefits diagnosis and risk stratification, the idiographic leukemogenesis beyond the transcriptome remains unknown. Genomic lesions in B-ALL frequently involve genes belonging to transcription factors, such as TCF3, EBF1, PAX5, and IKZF1. The investigation of dysregulated transcriptional networks behind various B-ALL subtypes may help unravel the specific process of leukemogenesis. Methods A random forest model was trained on a well-defined molecular subtype B-ALL cohort (n = 504) to improve the molecular classification. The subtype-specific transcriptional network was constructed by weighted correlation network analysis (WGCNA) once the B-ALL subtypes were genetically determined by the random forest model. Additionally, alternative splicing analysis from RNA-seq was emphasized since aberrant splicing events could lead to abnormalities in transcription factors or tumor suppressor genes. Results The random forest model performs well for the classification of most B-ALL subtypes (Figure 1A). It also benefits the classification of Ph-like B-ALL, which displays a gene expression profile similar to BCR-ABL1 B-ALL, as it achieves 100% accuracy on well-known Ph-like cases characterized by ABL-class gene fusions, PAX5-JAK2, EBF1-PDGFRB, and IGH-EPOR. We successfully separated a candidate molecular subtype characterized by CXCR4 alteration (CXCR4alt) for the first time, through our novel classification model (Figure 1B). This newly identified CXCR4alt subtype accounts for 2% of B-ALL cases (11/504), characterized by CXCR4 C-terminal mutation R334X or FLNA overexpression. Both C-terminal mutation and upregulated FLNA contribute to delayed CXCR4 receptor internalization, enhanced CXCL12-CXCR4 signaling, and then continuously activates the downstream MAPK pathway. It is further supported by the high expression of the two oncogenic MAPK signaling pathway genes KIAA1549 and KIAA1549L from the co-expression network of CXCR4alt in these cases. Transcriptional co-expression networks constructed by WGCNA and network hub genes for most B-ALL subtypes also help to elucidate the mechanism of leukemogenesis (Figure 2). We identified an alternative first exon of BLNK (BLNKaf) that leads to loss of function as a shared event in specific subtypes, such as BCR-ABL1, BCR-ABL1-like, and PAX5alt; while in pre-BCR signaling positive subtypes, such as TCF3-PBX1 and MEF2D-r, only express normal BLNK transcripts. Discussion By comprehensive transcriptome-based classification model and co-expression networks analysis, we identified a novel defined CXCR4alt subtype with an incidence of 2% in B-ALL. We also observed that BLNKaf might supply a practical marker for monitoring pre-BCR signaling. Our report emphasizes the role of transcriptome-based machine learning and WGCNA in mining the molecular mechanism of B-ALL. The molecular pathogenesis and clinical significance of these newly identified molecular subtypes and molecular abnormalities are worthy of further investigation. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Lu Liu ◽  
Shushan Zhang ◽  
Xiaofei Yao ◽  
Hongmei Gao ◽  
Zhihua Wang ◽  
...  

Liquefaction evaluation on the sands induced by earthquake is of significance for engineers in seismic design. In this study, the random forest (RF) method is introduced and adopted to evaluate the seismic liquefaction potential of soils based on the shear wave velocity. The RF model was developed using the Andrus database as a training dataset comprising 225 sets of liquefaction performance and shear wave velocity measurements. Five training parameters are selected for RF model including seismic magnitude ( M w ), peak horizontal ground surface acceleration ( a max ), stress-corrected shear wave velocity of soil ( V s 1 ), sandy-layer buried depth (ds), and a new introduced parameter, stress ratio (k). In addition, the optimal hyperparameters for the random forest model are determined based on the minimum error rate for the out-of-bag dataset (ERROOB) such as the number of classification trees, maximum depth of trees, and maximum number of features. The established random forest model was validated using the Kayen database as testing dataset and compared with the Chinese code and the Andrus methods. The results indicated that the random forest method established based on the training dataset was credible. The random forest method gave a success rate for liquefied sites and even a total success rate for all cases higher than 80%, which is completely acceptable. By contrast, the Chinese code method and the Andrus methods gave a high success rate for liquefaction but very low for nonliquefaction which led to the increase of engineering cost. The developed RF model can provide references for engineers to evaluate liquefaction potential.


CrystEngComm ◽  
2018 ◽  
Vol 20 (28) ◽  
pp. 3947-3950 ◽  
Author(s):  
Rajni M. Bhardwaj ◽  
Susan M. Reutzel-Edens ◽  
Blair F. Johnston ◽  
Alastair J. Florence

A random forest (RF) classification model obtained from physicochemical properties of solvents and crystal structures of olanzapine has for the first time enabled the prediction of 3-D crystal packings of solvates. A novel solvate was obtained by targeted crystallization from the solvent identified by RF model.


2021 ◽  
Author(s):  
Christian Thiele ◽  
Gerrit Hirschfeld ◽  
Ruth von Brachel

AbstractRegistries of clinical trials are a potential source for scientometric analysis of medical research and serve important functions for the research community and the public at large. Clinical trials that recruit patients in Germany are usually registered in the German Clinical Trials Register (DRKS) or in international registries such as ClinicalTrials.gov. Furthermore, the International Clinical Trials Registry Platform (ICTRP) aggregates trials from multiple primary registries. We queried the DRKS, ClinicalTrials.gov, and the ICTRP for trials with a recruiting location in Germany. Trials that were registered in multiple registries were linked using the primary and secondary identifiers and a Random Forest model based on various similarity metrics. We identified 35,912 trials that were conducted in Germany. The majority of the trials was registered in multiple databases. 32,106 trials were linked using primary IDs, 26 were linked using a Random Forest model, and 10,537 internal duplicates on ICTRP were identified using the Random Forest model after finding pairs with matching primary or secondary IDs. In cross-validation, the Random Forest increased the F1-score from 96.4% to 97.1% compared to a linkage based solely on secondary IDs on a manually labelled data set. 28% of all trials were registered in the German DRKS. 54% of the trials on ClinicalTrials.gov, 43% of the trials on the DRKS and 56% of the trials on the ICTRP were pre-registered. The ratio of pre-registered studies and the ratio of studies that are registered in the DRKS increased over time.


2021 ◽  
Vol 10 (8) ◽  
pp. 503
Author(s):  
Hang Liu ◽  
Riken Homma ◽  
Qiang Liu ◽  
Congying Fang

The simulation of future land use can provide decision support for urban planners and decision makers, which is important for sustainable urban development. Using a cellular automata-random forest model, we considered two scenarios to predict intra-land use changes in Kumamoto City from 2018 to 2030: an unconstrained development scenario, and a planning-constrained development scenario that considers disaster-related factors. The random forest was used to calculate the transition probabilities and the importance of driving factors, and cellular automata were used for future land use prediction. The results show that disaster-related factors greatly influence land vacancy, while urban planning factors are more important for medium high-rise residential, commercial, and public facilities. Under the unconstrained development scenario, urban land use tends towards spatially disordered growth in the total amount of steady growth, with the largest increase in low-rise residential areas. Under the planning-constrained development scenario that considers disaster-related factors, the urban land area will continue to grow, albeit slowly and with a compact growth trend. This study provides planners with information on the relevant trends in different scenarios of land use change in Kumamoto City. Furthermore, it provides a reference for Kumamoto City’s future post-disaster recovery and reconstruction planning.


2021 ◽  
pp. 100017
Author(s):  
Xinyu Dou ◽  
Cuijuan Liao ◽  
Hengqi Wang ◽  
Ying Huang ◽  
Ying Tu ◽  
...  

2021 ◽  
Vol 49 (3) ◽  
pp. 030006052199398
Author(s):  
Jinwu Peng ◽  
Zhili Duan ◽  
Yamin Guo ◽  
Xiaona Li ◽  
Xiaoqin Luo ◽  
...  

Objectives Liver echinococcosis is a severe zoonotic disease caused by Echinococcus (tapeworm) infection, which is epidemic in the Qinghai region of China. Here, we aimed to explore biomarkers and establish a predictive model for the diagnosis of liver echinococcosis. Methods Microarray profiling followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis was performed in liver tissue from patients with liver hydatid disease and from healthy controls from the Qinghai region of China. A protein–protein interaction (PPI) network and random forest model were established to identify potential biomarkers and predict the occurrence of liver echinococcosis, respectively. Results Microarray profiling identified 1152 differentially expressed genes (DEGs), including 936 upregulated genes and 216 downregulated genes. Several previously unreported biological processes and signaling pathways were identified. The FCGR2B and CTLA4 proteins were identified by the PPI networks and random forest model. The random forest model based on FCGR2B and CTLA4 reliably predicted the occurrence of liver hydatid disease, with an area under the receiver operator characteristic curve of 0.921. Conclusion Our findings give new insight into gene expression in patients with liver echinococcosis from the Qinghai region of China, improving our understanding of hepatic hydatid disease.


Sign in / Sign up

Export Citation Format

Share Document