Evolutionary drivers of the hump-shaped latitudinal gradient of benthic polychaete species richness along the Southeastern Pacific coast

PeerJ ◽

10.7717/peerj.12010 ◽

2021 ◽

Vol 9 ◽

pp. e12010

Author(s):

Rodrigo A. Moreno ◽

Fabio A. Labra ◽

Darko D. Cotoras ◽

Patricio A. Camus ◽

Dimitri Gutiérrez ◽

...

Keyword(s):

Species Richness ◽

Random Forest ◽

Random Forest Model ◽

Pairwise Distance ◽

Predictor Variables ◽

Salinity Range ◽

Shelf Area ◽

Forest Model ◽

Southeastern Pacific ◽

Latitudinal Range

Latitudinal diversity gradients (LDG) and their explanatory factors are among the most challenging topics in macroecology and biogeography. Despite of its apparent generality, a growing body of evidence shows that ‘anomalous’ LDG (i.e., inverse or hump-shaped trends) are common among marine organisms along the Southeastern Pacific (SEP) coast. Here, we evaluate the shape of the LDG of marine benthic polychaetes and its underlying causes using a dataset of 643 species inhabiting the continental shelf (<200 m depth), using latitudinal bands with a spatial resolution of 0.5°, along the SEP (3–56° S). The explanatory value of six oceanographic (Sea Surface Temperature (SST), SST range, salinity, salinity range, primary productivity and shelf area), and one macroecological proxy (median latitudinal range of species) were assessed using a random forest model. The taxonomic structure was used to estimate the degree of niche conservatism of predictor variables and to estimate latitudinal trends in phylogenetic diversity, based on three indices (phylogenetic richness (PDSES), mean pairwise distance (MPDSES), and variation of pairwise distances (VPD)). The LDG exhibits a hump-shaped trend, with a maximum peak of species richness at ca. 42° S, declining towards northern and southern areas of SEP. The latitudinal pattern was also evident in local samples controlled by sampling effort. The random forest model had a high accuracy (pseudo-r2 = 0.95) and showed that the LDG could be explained by four variables (median latitudinal range, SST, salinity, and SST range), yet the functional relationship between species richness and these predictors was variable. A significant degree of phylogenetic conservatism was detected for the median latitudinal range and SST. PDSES increased toward the southern region, whereas VPD showed the opposite trend, both statistically significant. MPDSES has the same trend as PDSES, but it is not significant. Our results reinforce the idea that the south Chile fjord area, particularly the Chiloé region, was likely the evolutionary source of new species of marine polychaetes along SEP, creating a hotspot of diversity. Therefore, in the same way as the canonical LDG shows a decline in diversity while moving away from the tropics; on this case the decline occurs while moving away from Chiloé Island. These results, coupled with a strong phylogenetic signal of the main predictor variables suggest that processes operating mainly at evolutionary timescales govern the LDG.

P1511MODELO PREDICTIVO DE SUPERVIVENCIA DE INTELIGENCIA ARTIFICIAL (BOSQUE ALEATORIO) EN HEMODIALISIS. DATOS DEL REGISTRO ANDALUSINA DE ENFERMEDADES RENALES. SICATA

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfaa142.p1511 ◽

2020 ◽

Vol 35 (Supplement_3) ◽

Author(s):

Manuel Benítez Sánchez ◽

Guillermo Martín ◽

Luis Gil Sacaluga ◽

Maria Jose Garcia Cortes ◽

Sergio García Marcos ◽

...

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Roc Curve ◽

Explanatory Models ◽

Multivariate Model ◽

Random Forest Model ◽

Predictor Variables ◽

Multivariate Logistic Regression ◽

Analytical Technique ◽

Forest Model

Abstract Background and Aims Random Forest (RF) is an analytical technique of Artificial Intelligence (AI) that consists of an assembly of trees built by bootstrapping (resampling with replacement). In each node a subset of predictor variables is selected and for them the best cut point is determined. Each division of the tree is based on a random sample of the predictors. The trees are as long as possible. In the construction of each RF tree a part of the observations is not used (37% approx.). It is called an out-of-bag (OOB) sample and is used to obtain an honest estimate of the predictive capacity of the model. So it does not require validation. In each analysis, a few hundred Regression or classification trees are carried out, depending on whether the response variable is numerical or qualitative respectively. The result is an average of the repeated predictions of the model (Bagging). RF allows to calculate the importance of the predictor variables, which can be used later to be included in a multivariate regression model. Method We analyzed 14750 records between 2011 and 2014 contained in Information System of the Autonomous Transplant Coordination of Andalusia (SICATA) a system that includes clinical-epidemiological variables, about anemia, bone bone metabolism, adequacy of dialysis and vascular access. 1911 patients presented the event of interest (exitus). Three predictive and explanatory models of survival are developed: 1-RF. 2-.Multivariate Logistic Regression. 3- Multivariate Logistic Regression that includes the important variables of the previous RF model. We compare them in terms of accuracy (AUC of the ROC curve). Results AUC of the ROC curve of the multivariate model without prior RF was: 0.75 AUC of the ROC curve of the multivariate model with previous RF was: 0.81. AUC of the ROC curve of the Random Forest model: 0.98 Conclusion The Random Forest model has a 98% discrimination in the mortality of patients on Hemodialysis, far superior to the classic multivariate analyzes. The Multivariate Logistic Regression performed with the important RF variables improves the AUC of the previous model 0.81 vs. 0.75.

Learning Daily Activity Sequences of Population Groups using Random Forest Theory

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118773197 ◽

2018 ◽

Vol 2672 (47) ◽

pp. 194-207 ◽

Cited By ~ 9

Author(s):

Mohammad Hesam Hafezi ◽

Lei Liu ◽

Hugh Millward

Keyword(s):

Random Forest ◽

Daily Activity ◽

Travel Demand ◽

Confusion Matrix ◽

Demographic Characteristics ◽

Random Forest Model ◽

Predictor Variables ◽

Demand Model ◽

Time Activity ◽

Forest Model

The choice of daily activity sequences differs between individuals based on their socio-demographic characteristics and their health and/or mobility status. The aim of this paper is to provide an improved methodology for learning and modeling the daily activity engagement patterns of individuals using a state-of-the-art machine learning algorithm. The dependencies between activity type, activity frequency, activity sequence, and socio-demographic characteristics of individuals are taken into account by employing a random forest model. In order to capture the heterogeneity and diversity among the predictor variables, we employed two different methods for split selection in the random forest algorithm: Classification and Regression Tree (CART) and curvature search. These two methods were examined under two different layer settings. In the first setting, the algorithm grows trees using all alternative predictor variables, whereas in the second setting the importance of the predictor variables is estimated and then the algorithm grows trees using only high-score predictor variables. The models were applied to time use data from the large Halifax Space-Time Activity Research (STAR) household travel diary survey. We evaluated the estimation accuracy of the proposed models using confusion matrix, transition matrix, and sequential alignment techniques. Results show that the random forest model with CART split selection using the first layer setting has the best accuracy in replicating activity agendas and activity sequences of individuals. The results of this paper are expected to be implemented within the activity-based travel demand model, Scheduler for Activities, Locations, and Travel (SALT).

Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.147040 ◽

2021 ◽

Vol 783 ◽

pp. 147040

Author(s):

Chengcheng Jiang ◽

Wen Fan ◽

Ningyu Yu ◽

Enlong Liu

Keyword(s):

Random Forest ◽

Loess Plateau ◽

Spatial Modeling ◽

Random Forest Model ◽

Certainty Factor ◽

The Loess Plateau ◽

Forest Model ◽

Gully Head

Clinical trial registries as Scientometric data: A novel solution for linking and deduplicating clinical trials from multiple registries

Scientometrics ◽

10.1007/s11192-021-04111-w ◽

2021 ◽

Author(s):

Christian Thiele ◽

Gerrit Hirschfeld ◽

Ruth von Brachel

Keyword(s):

Clinical Trials ◽

Random Forest ◽

Random Forest Model ◽

Scientometric Analysis ◽

Data Set ◽

The Public ◽

Forest Model ◽

Clinical Trial Registries ◽

Multiple Primary ◽

Clinical Trials Registry

AbstractRegistries of clinical trials are a potential source for scientometric analysis of medical research and serve important functions for the research community and the public at large. Clinical trials that recruit patients in Germany are usually registered in the German Clinical Trials Register (DRKS) or in international registries such as ClinicalTrials.gov. Furthermore, the International Clinical Trials Registry Platform (ICTRP) aggregates trials from multiple primary registries. We queried the DRKS, ClinicalTrials.gov, and the ICTRP for trials with a recruiting location in Germany. Trials that were registered in multiple registries were linked using the primary and secondary identifiers and a Random Forest model based on various similarity metrics. We identified 35,912 trials that were conducted in Germany. The majority of the trials was registered in multiple databases. 32,106 trials were linked using primary IDs, 26 were linked using a Random Forest model, and 10,537 internal duplicates on ICTRP were identified using the Random Forest model after finding pairs with matching primary or secondary IDs. In cross-validation, the Random Forest increased the F1-score from 96.4% to 97.1% compared to a linkage based solely on secondary IDs on a manually labelled data set. 28% of all trials were registered in the German DRKS. 54% of the trials on ClinicalTrials.gov, 43% of the trials on the DRKS and 56% of the trials on the ICTRP were pre-registered. The ratio of pre-registered studies and the ratio of studies that are registered in the DRKS increased over time.

Discrimination of the geographic origins and varieties of wine grapes using high-throughput sequencing assisted by a random forest model

LWT ◽

10.1016/j.lwt.2021.111333 ◽

2021 ◽

pp. 111333

Author(s):

Feifei Gao ◽

Guihua Zeng ◽

Bin Wang ◽

Jing Xiao ◽

Liang Zhang ◽

...

Keyword(s):

Random Forest ◽

High Throughput ◽

High Throughput Sequencing ◽

Random Forest Model ◽

Wine Grapes ◽

Forest Model ◽

Geographic Origins

Multi-Scenario Prediction of Intra-Urban Land Use Change Using a Cellular Automata-Random Forest Model

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10080503 ◽

2021 ◽

Vol 10 (8) ◽

pp. 503

Author(s):

Hang Liu ◽

Riken Homma ◽

Qiang Liu ◽

Congying Fang

Keyword(s):

Land Use ◽

Random Forest ◽

Cellular Automata ◽

Land Use Change ◽

Urban Land ◽

Urban Land Use ◽

Random Forest Model ◽

Growth Trend ◽

Related Factors ◽

Forest Model

The simulation of future land use can provide decision support for urban planners and decision makers, which is important for sustainable urban development. Using a cellular automata-random forest model, we considered two scenarios to predict intra-land use changes in Kumamoto City from 2018 to 2030: an unconstrained development scenario, and a planning-constrained development scenario that considers disaster-related factors. The random forest was used to calculate the transition probabilities and the importance of driving factors, and cellular automata were used for future land use prediction. The results show that disaster-related factors greatly influence land vacancy, while urban planning factors are more important for medium high-rise residential, commercial, and public facilities. Under the unconstrained development scenario, urban land use tends towards spatially disordered growth in the total amount of steady growth, with the largest increase in low-rise residential areas. Under the planning-constrained development scenario that considers disaster-related factors, the urban land area will continue to grow, albeit slowly and with a compact growth trend. This study provides planners with information on the relevant trends in different scenarios of land use change in Kumamoto City. Furthermore, it provides a reference for Kumamoto City’s future post-disaster recovery and reconstruction planning.

Estimates of daily ground-level NO2 concentrations in China based on Random Forest model integrated K-means

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100017 ◽

2021 ◽

pp. 100017

Author(s):

Xinyu Dou ◽

Cuijuan Liao ◽

Hengqi Wang ◽

Ying Huang ◽

Ying Tu ◽

...

Keyword(s):

Random Forest ◽

Ground Level ◽

Random Forest Model ◽

Forest Model

Improving satellite-based estimation of surface ozone across China during 2008–2019 using iterative random forest model and high-resolution grid meteorological data

Sustainable Cities and Society ◽

10.1016/j.scs.2021.102807 ◽

2021 ◽

pp. 102807

Author(s):

Gongbo Chen ◽

Jiang Chen ◽

Guang-hui Dong ◽

Bo-yi Yang ◽

Yisi Liu ◽

...

Keyword(s):

High Resolution ◽

Random Forest ◽

Meteorological Data ◽

Surface Ozone ◽

Random Forest Model ◽

Forest Model

Identification of candidate biomarkers of liver hydatid disease via microarray profiling, bioinformatics analysis, and machine learning

Journal of International Medical Research ◽

10.1177/0300060521993980 ◽

2021 ◽

Vol 49 (3) ◽

pp. 030006052199398

Author(s):

Jinwu Peng ◽

Zhili Duan ◽

Yamin Guo ◽

Xiaona Li ◽

Xiaoqin Luo ◽

...

Keyword(s):

Random Forest ◽

Hydatid Disease ◽

Characteristic Curve ◽

Receiver Operator Characteristic Curve ◽

Random Forest Model ◽

Hepatic Hydatid Disease ◽

Forest Model ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Microarray Profiling

Objectives Liver echinococcosis is a severe zoonotic disease caused by Echinococcus (tapeworm) infection, which is epidemic in the Qinghai region of China. Here, we aimed to explore biomarkers and establish a predictive model for the diagnosis of liver echinococcosis. Methods Microarray profiling followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis was performed in liver tissue from patients with liver hydatid disease and from healthy controls from the Qinghai region of China. A protein–protein interaction (PPI) network and random forest model were established to identify potential biomarkers and predict the occurrence of liver echinococcosis, respectively. Results Microarray profiling identified 1152 differentially expressed genes (DEGs), including 936 upregulated genes and 216 downregulated genes. Several previously unreported biological processes and signaling pathways were identified. The FCGR2B and CTLA4 proteins were identified by the PPI networks and random forest model. The random forest model based on FCGR2B and CTLA4 reliably predicted the occurrence of liver hydatid disease, with an area under the receiver operator characteristic curve of 0.921. Conclusion Our findings give new insight into gene expression in patients with liver echinococcosis from the Qinghai region of China, improving our understanding of hepatic hydatid disease.

MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification

Mobile Networks and Applications ◽

10.1007/s11036-020-01699-w ◽

2021 ◽

Author(s):

Wei Xu ◽

Vinh Truong Hoang

Keyword(s):

Random Forest ◽

Data Processing ◽

Random Forest Model ◽

Forest Model