scholarly journals Data-Driven Wildfire Risk Prediction in Northern California

Atmosphere ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 109
Author(s):  
Ashima Malik ◽  
Megha Rajam Rao ◽  
Nandini Puppala ◽  
Prathusha Koouri ◽  
Venkata Anil Kumar Thota ◽  
...  

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-29
Author(s):  
Alessandro Canossa ◽  
Dmitry Salimov ◽  
Ahmad Azadvar ◽  
Casper Harteveld ◽  
Georgios Yannakakis

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.


2021 ◽  
Author(s):  
Enzo Losi ◽  
Mauro Venturini ◽  
Lucrezia Manservigi ◽  
Giuseppe Fabio Ceschini ◽  
Giovanni Bechini ◽  
...  

Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving filed data taken during three years of operation of two fleets of Siemens gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75–85 %, thus demonstrating the industrial feasibility of the predictive methodology.


2020 ◽  
Author(s):  
Liam Brierley ◽  
Anna Fowler

AbstractThe COVID-19 pandemic has demonstrated the serious potential for novel zoonotic coronaviruses to emerge and cause major outbreaks. The immediate animal origin of the causative virus, SARS-CoV-2, remains unknown, a notoriously challenging task for emerging disease investigations. Coevolution with hosts leads to specific evolutionary signatures within viral genomes that can inform likely animal origins. We obtained a set of 650 spike protein and 511 whole genome nucleotide sequences from 225 and 187 viruses belonging to the family Coronaviridae, respectively. We then trained random forest models independently on genome composition biases of spike protein and whole genome sequences, including dinucleotide and codon usage biases in order to predict animal host (of nine possible categories, including human). In hold-one-out cross-validation, predictive accuracy on unseen coronaviruses consistently reached ∼73%, indicating evolutionary signal in spike proteins to be just as informative as whole genome sequences. However, different composition biases were informative in each case. Applying optimised random forest models to classify human sequences of MERS-CoV and SARS-CoV revealed evolutionary signatures consistent with their recognised intermediate hosts (camelids, carnivores), while human sequences of SARS-CoV-2 were predicted as having bat hosts (suborder Yinpterochiroptera), supporting bats as the suspected origins of the current pandemic. In addition to phylogeny, variation in genome composition can act as an informative approach to predict emerging virus traits as soon as sequences are available. More widely, this work demonstrates the potential in combining genetic resources with machine learning algorithms to address long-standing challenges in emerging infectious diseases.


2022 ◽  
Vol 21 (1) ◽  
Author(s):  
Luca Boniardi ◽  
Federica Nobile ◽  
Massimo Stafoggia ◽  
Paola Michelozzi ◽  
Carla Ancona

Abstract Background Air pollution is one of the main concerns for the health of European citizens, and cities are currently striving to accomplish EU air pollution regulation. The 2020 COVID-19 lockdown measures can be seen as an unintended but effective experiment to assess the impact of traffic restriction policies on air pollution. Our objective was to estimate the impact of the lockdown measures on NO2 concentrations and health in the two largest Italian cities. Methods NO2 concentration datasets were built using data deriving from a 1-month citizen science monitoring campaign that took place in Milan and Rome just before the Italian lockdown period. Annual mean NO2 concentrations were estimated for a lockdown scenario (Scenario 1) and a scenario without lockdown (Scenario 2), by applying city-specific annual adjustment factors to the 1-month data. The latter were estimated deriving data from Air Quality Network stations and by applying a machine learning approach. NO2 spatial distribution was estimated at a neighbourhood scale by applying Land Use Random Forest models for the two scenarios. Finally, the impact of lockdown on health was estimated by subtracting attributable deaths for Scenario 1 and those for Scenario 2, both estimated by applying literature-based dose–response function on the counterfactual concentrations of 10 μg/m3. Results The Land Use Random Forest models were able to capture 41–42% of the total NO2 variability. Passing from Scenario 2 (annual NO2 without lockdown) to Scenario 1 (annual NO2 with lockdown), the population-weighted exposure to NO2 for Milan and Rome decreased by 15.1% and 15.3% on an annual basis. Considering the 10 μg/m3 counterfactual, prevented deaths were respectively 213 and 604. Conclusions Our results show that the lockdown had a beneficial impact on air quality and human health. However, compliance with the current EU legal limit is not enough to avoid a high number of NO2 attributable deaths. This contribution reaffirms the potentiality of the citizen science approach and calls for more ambitious traffic calming policies and a re-evaluation of the legal annual limit value for NO2 for the protection of human health.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Elizabeth Harrison ◽  
Sana Syed ◽  
Lubaina Ehsan ◽  
Najeeha T. Iqbal ◽  
Kamran Sadiq ◽  
...  

Abstract Background Stunting affects up to one-third of the children in low-to-middle income countries (LMICs) and has been correlated with decline in cognitive capacity and vaccine immunogenicity. Early identification of infants at risk is critical for early intervention and prevention of morbidity. The aim of this study was to investigate patterns of growth in infants up through 48 months of age to assess whether the growth of infants with stunting eventually improved as well as the potential predictors of growth. Methods Height-for-age z-scores (HAZ) of children from Matiari (rural site, Pakistan) at birth, 18 months, and 48 months were obtained. Results of serum-based biomarkers collected at 6 and 9 months were recorded. A descriptive analysis of the population was followed by assessment of growth predictors via traditional machine learning random forest models. Results Of the 107 children who were followed up till 48 months of age, 51% were stunted (HAZ < − 2) at birth which increased to 54% by 48 months of age. Stunting status for the majority of children at 48 months was found to be the same as at 18 months. Most children with large gains started off stunted or severely stunted, while all of those with notably large losses were not stunted at birth. Random forest models identified HAZ at birth as the most important feature in predicting HAZ at 18 months. Of the biomarkers, AGP (Alpha- 1-acid Glycoprotein), CRP (C-Reactive Protein), and IL1 (interleukin-1) were identified as strong subsequent growth predictors across both the classification and regressor models. Conclusion We demonstrated that children most children with stunting at birth remained stunted at 48 months of age. Value was added for predicting growth outcomes with the use of traditional machine learning random forest models. HAZ at birth was found to be a strong predictor of subsequent growth in infants up through 48 months of age. Biomarkers of systemic inflammation, AGP, CRP, IL1, were also strong predictors of growth outcomes. These findings provide support for continued focus on interventions prenatally, at birth, and early infancy in children at risk for stunting who live in resource-constrained regions of the world.


2022 ◽  
Vol 305 ◽  
pp. 117916
Author(s):  
Yifan Wen ◽  
Ruoxi Wu ◽  
Zihang Zhou ◽  
Shaojun Zhang ◽  
Shengge Yang ◽  
...  

2020 ◽  
Vol 13 (Suppl_1) ◽  
Author(s):  
Hsin-Fang Li

Background: In the 2018 AHA/ACC Blood Cholesterol Guideline, it is recommended that ASCVD patients be classified as very high-risk (VHR) vs not-VHR (NVHR) to guide treatment decisions. This has important implications for ezetimibe and PCSK9 inhibitor eligibility. We aimed to develop a tool that could assist in more easily identifying VHR patients based on machine learning (ML) techniques. This approach offers a powerful, assumption-free alternative to conventional methods, such as logistic regression, to identify potential interactions among risk factors while incorporating the hierarchy of interaction among variables. Method: We used EHR-derived ICD-10 codes to identify patients within our health system with ASCVD. VHR was defined by ≥2 major ASCVD events (ACS ≤12 months, history of MI >12 months, ischemic stroke, or symptomatic PAD) or 1 major ASCVD event and ≥2 high-risk conditions (age ≥65, diabetes, hypertension, smoking, heterozygous familial hypercholesterolemia, CKD, CHF, persistently elevated LDL-C ≥100 mg/dl, or prior CABG/PCI). Patients not meeting these criteria were classified as NVHR. We randomly assigned patients into a training set and a testing set. Classification and regression tree (CART) modeling was performed on the training set and validated on the testing set. The results were compared with a random forest model. Variables in both models included age, sex, race, ethnicity, and each of the VHR criteria above. The primary outcome for both models was VHR classification. Performance of the two models were compared using area under the curve (AUC). Result: A total of 180,669 ASCVD patients were identified in 2018: 104,123 (58%) were VHR and 76,546 (42%) were NVHR. Mean age and sex were 73.1±11.9 years, 55% male and 70.1±13.4 years, 54% male for the VHR and NVHR groups, respectively. Half the population was randomly selected as the training dataset (n=90,334) and the other half was used as the testing dataset (n=90,335). Both CART and random forest models identified recent ACS, ischemic stroke, hypertension, PAD, and history of MI as the top five predictors of VHR status. Ninety-six percent of patients with recent ACS were classified as VHR. Among patients with no recent ACS, 95% were classified as VHR if they had a stroke and hypertension. Among patients with no ACS or stroke, 89% were classified as VHR if they had PAD. Finally, among patients with no ACS, stroke or PAD, 90% were classified as VHR if they had a history of MI. The misclassification rate of the CART model on the testing set was 4.3%. The AUC for the CART and random forest models was 0.949 and 0.968, respectively. Conclusion: Both ML methods were highly predictive of VHR status among those with ASCVD. Use of this approach affords a simplified means to drive clinical decision making at the point of care.


2020 ◽  
Vol 31 (6) ◽  
pp. 1018-1024.e4 ◽  
Author(s):  
Ishan Sinha ◽  
Dilum P. Aluthge ◽  
Elizabeth S. Chen ◽  
Indra Neil Sarkar ◽  
Sun Ho Ahn

2021 ◽  
Vol 17 (4) ◽  
pp. e1009149
Author(s):  
Liam Brierley ◽  
Anna Fowler

The COVID-19 pandemic has demonstrated the serious potential for novel zoonotic coronaviruses to emerge and cause major outbreaks. The immediate animal origin of the causative virus, SARS-CoV-2, remains unknown, a notoriously challenging task for emerging disease investigations. Coevolution with hosts leads to specific evolutionary signatures within viral genomes that can inform likely animal origins. We obtained a set of 650 spike protein and 511 whole genome nucleotide sequences from 222 and 185 viruses belonging to the family Coronaviridae, respectively. We then trained random forest models independently on genome composition biases of spike protein and whole genome sequences, including dinucleotide and codon usage biases in order to predict animal host (of nine possible categories, including human). In hold-one-out cross-validation, predictive accuracy on unseen coronaviruses consistently reached ~73%, indicating evolutionary signal in spike proteins to be just as informative as whole genome sequences. However, different composition biases were informative in each case. Applying optimised random forest models to classify human sequences of MERS-CoV and SARS-CoV revealed evolutionary signatures consistent with their recognised intermediate hosts (camelids, carnivores), while human sequences of SARS-CoV-2 were predicted as having bat hosts (suborder Yinpterochiroptera), supporting bats as the suspected origins of the current pandemic. In addition to phylogeny, variation in genome composition can act as an informative approach to predict emerging virus traits as soon as sequences are available. More widely, this work demonstrates the potential in combining genetic resources with machine learning algorithms to address long-standing challenges in emerging infectious diseases.


10.2196/23948 ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. e23948
Author(s):  
Yuanfang Chen ◽  
Liu Ouyang ◽  
Forrest S Bao ◽  
Qian Li ◽  
Lei Han ◽  
...  

Background Effectively and efficiently diagnosing patients who have COVID-19 with the accurate clinical type of the disease is essential to achieve optimal outcomes for the patients as well as to reduce the risk of overloading the health care system. Currently, severe and nonsevere COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 infection in the different disease types. In addition, these type-defining features may not be readily testable at the time of diagnosis. Objective In this study, we aimed to use a machine learning approach to understand COVID-19 more comprehensively, accurately differentiate severe and nonsevere COVID-19 clinical types based on multiple medical features, and provide reliable predictions of the clinical type of the disease. Methods For this study, we recruited 214 confirmed patients with nonsevere COVID-19 and 148 patients with severe COVID-19. The clinical characteristics (26 features) and laboratory test results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest models based on all the features in each modality as well as on the top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types. Results Using clinical and laboratory results independently as input, the random forest models achieved >90% and >95% predictive accuracy, respectively. The importance scores of the input features were further evaluated, and the top 5 features from each modality were identified (age, hypertension, cardiovascular disease, gender, and diabetes for the clinical features modality, and dimerized plasmin fragment D, high sensitivity troponin I, absolute neutrophil count, interleukin 6, and lactate dehydrogenase for the laboratory testing modality, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, the random forest model was able to achieve 97% predictive accuracy. Conclusions Our findings shed light on how the human body reacts to SARS-CoV-2 infection as a unit and provide insights on effectively evaluating the disease severity of patients with COVID-19 based on more common medical features when gold standard features are not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triage, while laboratory test results should be applied when accuracy is the priority.


Sign in / Sign up

Export Citation Format

Share Document