Combining antibody markers for serosurveillance of SARS-CoV-2 to estimate seroprevalence and time-since-infection

SummarySerosurveillance is an important epidemiologic tool for SARS-CoV-2, used to estimate burden of disease and degree of population immunity. Which antibody biomarker, and the optimal number of biomarkers, has not been well-established, especially with the emerging rollout of vaccines globally. Here, we used random forest models to demonstrate that a single spike or receptor-binding domain (RBD) antibody was adequate for classifying prior infection, while a combination of two antibody biomarkers performed better than any single marker for estimating time-since-infection. Nucleocapsid antibodies performed worse than spike or RBD antibodies for classification, but is of utility for estimating time-since-infection, and in distinguishing infection-induced from vaccine-induced responses. Our analysis has the potential to inform the design of serosurveys for SARS-CoV-2, including decisions regarding number of antibody biomarkers measured.

Download Full-text

Data-Driven Wildfire Risk Prediction in Northern California

Atmosphere ◽

10.3390/atmos12010109 ◽

2021 ◽

Vol 12 (1) ◽

pp. 109

Author(s):

Ashima Malik ◽

Megha Rajam Rao ◽

Nandini Puppala ◽

Prathusha Koouri ◽

Venkata Anil Kumar Thota ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Curves ◽

Data Driven ◽

Northern California ◽

Combined Model ◽

Wildfire Risk ◽

Study Results ◽

Forest Models ◽

Random Forest Models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major U.S. metropolitan area

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101599 ◽

2021 ◽

Vol 87 ◽

pp. 101599

Author(s):

Zhiyue Xia ◽

Kathleen Stewart ◽

Junchuan Fan

Keyword(s):

Random Forest ◽

Metropolitan Area ◽

Space And Time ◽

Forest Models ◽

Random Forest Models

Download Full-text

Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models

Environmental Earth Sciences ◽

10.1007/s12665-021-09737-w ◽

2021 ◽

Vol 80 (12) ◽

Author(s):

Binbin Zhao ◽

Yunfeng Ge ◽

Hongzhi Chen

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Fractal Theory ◽

Hybrid Approach ◽

Gansu Province ◽

Information Value ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Forest Models ◽

Random Forest Models

Download Full-text

Random forest models of 305-days milk yield for Holstein cows in Bulgaria

10.1063/5.0034778 ◽

2020 ◽

Author(s):

A. Yordanova ◽

H. Kulina

Keyword(s):

Random Forest ◽

Milk Yield ◽

Holstein Cows ◽

Forest Models ◽

Random Forest Models

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS

Journal of Environmental Management ◽

10.1016/j.jenvman.2018.11.110 ◽

2019 ◽

Vol 232 ◽

pp. 928-942 ◽

Cited By ~ 46

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Khalil Rezaei

Keyword(s):

Random Forest ◽

Geographically Weighted Regression ◽

Gully Erosion ◽

Weighted Regression ◽

Certainty Factor ◽

Forest Models ◽

Random Forest Models

Download Full-text

Forest floor temperature and greenness link significantly to canopy attributes in South Africa’s fragmented coastal forests

10.7287/peerj.preprints.27168 ◽

2018 ◽

Author(s):

Marion Pfeifer ◽

Michael JW Boyle ◽

Stuart Dunning ◽

Pieter Olivier

Keyword(s):

Land Use ◽

Food Security ◽

Surface Temperature ◽

Habitat Quality ◽

Ground Surface ◽

Quality Metrics ◽

Ground Vegetation ◽

Ground Surface Temperature ◽

Forest Models ◽

Random Forest Models

Tropical landscapes are changing rapidly due to changes in land use and land management. Being able to predict and monitor land use change impacts on species for conservation or food security concerns requires the use of habitat quality metrics, that are consistent, can be mapped using above - ground sensor data and are relevant for species performance. Here, we focus on ground surface temperature (Thermalground) and ground vegetation greenness (NDVIdown) as potentially suitable metrics of habitat quality. We measure both across habitats differing in tree cover (natural grassland to forest edges to forests and tree plantations) in the human-modified coastal forested landscapes of Kwa-Zulua Natal, South Africa. We show that both habitat quality metrics decline linearly as a function of increasing canopy closure (FCover, %) and canopy leaf area index (LAI). Opening canopies by about 20% or reducing canopy leaf area by 1% would result in an increase of temperatures on the ground by more than 1°C, and an increase in ground vegetation greenness by 0.2 and 0.14 respectively. Upscaling LAI and FCover to develop maps from Landsat imagery using random forest models allowed us to map Thermalground and NDVIdown using the linear relationships. However, map accuracy was constrained by the predictive capacity of the random forest models predicting canopy attributes and the linear models linking canopy attributes to the habitat quality metrics. Accounting for micro-scale variation in temperature is seen as essential to improve biodiversity impact predictions. Our upscaling approach suggests that mapping ground surface temperature based on radiation and vegetation properties might be possible, and that canopy cover maps could provide a useful tool for mapping habitat quality metrics that matter to species. However, we need to increase sampling of surface temperature spatially and temporally to improve and validate upscaled models. We also need to link surface temperature maps to demographic traits of species of different threat status or functions in landscapes with different disturbance and management histories testing for generalities in relationships. The derived understanding could then be exploited for targeted landscape restoration that benefits biodiversity conservation and food security sustainably at the landscape scale.

Download Full-text

Data Mining Crystallization Kinetics

10.26434/chemrxiv.11708286 ◽

2020 ◽

Author(s):

Cameron Brown ◽

Diego Maldonado ◽

Antony Vassileiou ◽

Blair Johnston ◽

Alastair Florence

Keyword(s):

Random Forest ◽

Kinetic Parameters ◽

Crystallization Kinetics ◽

Balance Model ◽

Forest Models ◽

Vast Literature ◽

Random Forest Models ◽

Kinetic Expression ◽

Population Balances ◽

Different Sources

<p>Population balance model is a valuable modelling tool which facilitates the optimization and understanding of crystallization processes. However, in order to use this tool, it is necessary to have previous knowledge of the crystallization kinetics, specifically crystal growth and nucleation. The majority of approaches to achieve proper estimations of kinetic parameters required experimental data. Across time, a vast literature about the estimation of kinetic parameters and population balances have been published. Considering the availability of data, this work built a database with information on solute, solvent, kinetic expression, parameters, crystallization method and seeding. Correlations were assessed and clusters structures identified by hierarchical clustering analysis. The final database contains 336 data of kinetic parameters from 185 different sources. The data were analysed using kinetic parameters of the most common expressions. Subsequently, clusters were identified for each kinetic model. With these clusters, classification random forest models were made using solute descriptors, seeding, solvent, and crystallization methods as classifiers. Random forest models had an overall classification accuracy higher than 70% whereby they were useful to provide rough estimates of kinetic parameters, although these methods have some limitations.</p>

Download Full-text

Identification of SARS-CoV-2 P.1-related lineages in Brazil provides new insights about the mechanisms of emergence of Variants of Concern

10.21203/rs.3.rs-580195/v1 ◽

2021 ◽

Author(s):

Tiago Gräf ◽

Gonzalo Bello ◽

Taina Moreira Martins Venas ◽

Elisa Cavalcante Pereira ◽

Anna Carolina Dias Paixão ◽

...

Keyword(s):

Human Population ◽

Infected Individual ◽

Lag Time ◽

Receptor Binding Domain ◽

Evolutionary Pattern ◽

Population Immunity ◽

Sequential Addition ◽

Specific Factors ◽

Amazonas State

Abstract One of the most remarkable features of the SARS-CoV-2 Variants of Concern (VOC) is the unusually large number of mutations they carry. However, the specific factors that drove the emergence of such variants since the second half of 2020 are not fully resolved. In this study, we described a new SARS-CoV-2 lineage provisionally designated as P.1-like-II that, as well as the previously described lineage P.1-like-I, shares several lineage-defining mutations with the VOC P.1 circulating in Brazil. Reconstructions of P.1 ancestor sequences demonstrate that the entire constellation of mutations that define the VOC P.1 did not accumulate within a single long-term infected individual, but was acquired by sequential addition during interhost transmissions. Our evolutionary analyses further estimate that P.1-ancestors strains carrying half of the P.1-lineage-defining mutations, including those at the receptor-binding domain of the Spike protein, circulated cryptically in the Amazonas state since August 2020. This evolutionary pattern is consistent with the hypothesis that partial human population immunity acquired from natural SARS-CoV-2 infections during the first half of 2020 might have been the major driving force behind natural selection that allowed VOCs' emergence and worldwide spread. These findings also support a long lag-time between the emergence of variants with key mutations of concern and expansion of the VOC P.1 in Brazil.

Download Full-text

Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning (Preprint)

10.2196/preprints.17119 ◽

2019 ◽

Author(s):

Karen-Inge Karstoft ◽

Ioannis Tsamardinos ◽

Kasper Eskelund ◽

Søren Bo Andersen ◽

Lars Ravnborg Nissen

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Linear Models ◽

Prediction Models ◽

Characteristic Curve ◽

Ptsd Symptoms ◽

Forest Models ◽

Random Forest Models ◽

Automated Machine Learning ◽

Military Rank

BACKGROUND Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. OBJECTIVE This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. METHODS Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). RESULTS Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. CONCLUSIONS Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.

Download Full-text