Skill of large-scale seasonal drought impact forecasts

Abstract. Forecasting of drought impacts is still lacking in drought early-warning systems (DEWSs), which presently do not go beyond hazard forecasting. Therefore, we developed drought impact functions using machine learning approaches (logistic regression and random forest) to predict drought impacts with lead times up to 7 months ahead. The observed and forecasted hydrometeorological drought hazards – such as the standardized precipitation index (SPI), standardized precipitation evaporation index (SPEI), and standardized runoff index (SRI) – were obtained from the The EU-funded Enhancing Emergency Management and Response to Extreme Weather and Climate Events (ANYWHERE) DEWS. Reported drought impact data, taken from the European Drought Impact Report Inventory (EDII), were used to develop and validate drought impact functions. The skill of the drought impact functions in forecasting drought impacts was evaluated using the Brier skill score and relative operating characteristic metrics for five cases representing different spatial aggregation and lumping of impacted sectors. Results show that hydrological drought hazard represented by SRI has higher skill than meteorological drought represented by SPI and SPEI. For German regions, impact functions developed using random forests indicate a higher discriminative ability to forecast drought impacts than logistic regression. Moreover, skill is higher for cases with higher spatial resolution and less lumped impacted sectors (cases 4 and 5), with considerable skill up to 3–4 months ahead. The forecasting skill of drought impacts using machine learning greatly depends on the availability of impact data. This study demonstrates that the drought impact functions could not be developed for certain regions and impacted sectors, owing to the lack of reported impacts.

Download Full-text

Skill of large-scale seasonal drought impact forecasts

10.5194/nhess-2020-67 ◽

2020 ◽

Author(s):

Samuel J. Sutanto ◽

Melati van der Weert ◽

Veit Blauhut ◽

Henny A. J. Van Lanen

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Large Scale ◽

Early Warning Systems ◽

Skill Score ◽

Learning Approaches ◽

Discriminative Ability ◽

Relative Operating Characteristic ◽

Drought Impact ◽

Drought Impacts

Abstract. Forecasting drought impacts is still missing in drought early warning systems that presently do not go beyond hazard forecasting. Therefore, we developed drought impact functions using machine learning approaches (Logistic Regression and Random Forest) to predict drought impacts with a lead-time of 7 months ahead. The skill of the drought impact functions to forecast drought impacts was evaluated using the Brier Skill Score and Relative Operating Characteristic metrics for 5 Cases representing different spatial aggregation and lumping of impacted sectors. For German regions, impact functions developed using Random Forest show a higher discriminative ability to forecast drought impacts than Logistic Regression. Moreover, skill is higher for Cases with higher spatial resolution and less-lumped impacted sectors (Cases 4 and 5), with considerable skill up to 3–4 months ahead.

Download Full-text

Exploring the link between drought indicators and impacts

Natural Hazards and Earth System Science ◽

10.5194/nhess-15-1381-2015 ◽

2015 ◽

Vol 15 (6) ◽

pp. 1381-1397 ◽

Cited By ~ 52

Author(s):

S. Bachmair ◽

I. Kohn ◽

K. Stahl

Keyword(s):

Standardized Precipitation Index ◽

Early Warning Systems ◽

Drought Indices ◽

Indicator Values ◽

Drought Impact ◽

Federal States ◽

Drought Impacts ◽

Drought Conditions ◽

Drought Indicators ◽

Impact Data

Abstract. Current drought monitoring and early warning systems use different indicators for monitoring drought conditions and apply different indicator thresholds and rules for assigning drought intensity classes or issue warnings or alerts. Nevertheless, there is little knowledge on the meaning of different hydro-meteorologic indicators for impact occurrence on the ground. To date, there have been very few attempts to systematically characterize the indicator–impact relationship owing to sparse and patchy data on drought impacts. The newly established European Drought Impact report Inventory (EDII) offers the possibility to investigate this linkage. The aim of this study was to explore the link between hydro-meteorologic indicators and drought impacts for the case study area Germany and thus to test the potential of qualitative impact data for evaluating the performance of drought indicators. As drought indicators two climatological drought indices – the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI) – as well as streamflow and groundwater level percentiles were selected. Linkage was assessed though data visualization, extraction of indicator values concurrent with impact onset, and correlation analysis between monthly time series of indicator and impact data at the federal state level, and between spatial patterns for selected drought events. The analysis clearly revealed a significant moderate to strong correlation for some states and drought events allowing for an intercomparison of the performance of different drought indicators. Important findings were strongest correlation for intermediate accumulation periods of SPI and SPEI, a slightly better performance of SPEI versus SPI, and a similar performance of streamflow percentiles to SPI in many cases. Apart from these commonalities, the analysis also exposed differences among federal states and drought events, suggesting that the linkage is time variant and region specific to some degree. Concerning "thresholds" for drought impact onset, i.e. indicator values concurrent with past impact onsets, we found that no single "best" threshold value can be identified but impacts occur within a range of indicator values. Nevertheless, the median of the threshold distributions showed differences between northern/northeastern versus southern/southwestern federal states, and among drought events. While the findings strongly depend on data and may change with a growing number of EDII entries in the future, this study clearly demonstrates the feasibility of evaluating hydro-meteorologic variables with text-based impact reports and highlights the value of impact reporting as a tool for monitoring drought conditions.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Recent Progress in Machine Learning-based Prediction of Peptide Activity for Drug Discovery

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190122151634 ◽

2019 ◽

Vol 19 (1) ◽

pp. 4-16 ◽

Cited By ~ 6

Author(s):

Qihui Wu ◽

Hanzhong Ke ◽

Dongli Li ◽

Qi Wang ◽

Jiansong Fang ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Large Scale ◽

Recent Progress ◽

High Specificity ◽

Learning Approaches ◽

Anticancer Peptides ◽

The Past ◽

Traditional Approaches ◽

Large Scale Screening

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.

Download Full-text

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Journal of Cheminformatics ◽

10.1186/s13321-021-00500-8 ◽

2021 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Janna Hastings ◽

Martin Glauer ◽

Adel Memariani ◽

Fabian Neuhaus ◽

Till Mossakowski

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Short Term Memory ◽

Chemical Space ◽

Chemical Data ◽

Learning Approaches ◽

Class Prediction ◽

Chemical Structures ◽

Chemical Ontology ◽

Chemical Ontologies

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.

Download Full-text

Machine learning identifies an immunological pattern associated with multiple juvenile idiopathic arthritis subtypes

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2018-214354 ◽

2019 ◽

Vol 78 (5) ◽

pp. 617-628 ◽

Cited By ~ 5

Author(s):

Erika Van Nieuwenhove ◽

Vasiliki Lagou ◽

Lien Van Eyck ◽

James Dooley ◽

Ulrich Bodenhofer ◽

...

Keyword(s):

Machine Learning ◽

Juvenile Idiopathic Arthritis ◽

Large Scale ◽

Inflammatory Diseases ◽

Adaptive Immune System ◽

Healthy Children ◽

Learning Approaches ◽

Data Set ◽

Immune Signature ◽

Systemic Jia

ObjectivesJuvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed.MethodsHere we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches.ResultsImmune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy.ConclusionsThese results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Exploring the link between drought indicators and impacts

Natural Hazards and Earth System Sciences Discussions ◽

10.5194/nhessd-2-7583-2014 ◽

2014 ◽

Vol 2 (12) ◽

pp. 7583-7620 ◽

Cited By ~ 1

Author(s):

S. Bachmair ◽

I. Kohn ◽

K. Stahl

Keyword(s):

State Level ◽

Early Warning Systems ◽

Drought Indices ◽

Federal State ◽

Drought Impact ◽

Federal States ◽

Drought Conditions ◽

Drought Indicators ◽

Ground Truthing ◽

Impact Data

Abstract. Current drought monitoring and early warning systems use different indicators for monitoring drought conditions and apply different indicator thresholds and rules for assigning drought intensity classes or issue warnings or alerts. Nevertheless, there is little knowledge on the meaning of different hydro-meteorologic indicators for impact occurrence on the ground. To date, there have been very few attempts to systematically characterize the indicator–impact-relationship owing to the sparse and patchy data for ground truthing hydro-meteorologic variables. The newly established European Drought Impact report Inventory (EDII) offers the possibility to investigate this linkage. The aim of this study was to explore the link between hydro-meteorologic indicators and drought impacts for the case study area Germany and thus to test the potential of qualitative impact data for evaluating the performance of drought indicators. As drought indicators two climatological drought indices as well as streamflow and groundwater level percentiles were selected. Linkage was assessed though data visualization and correlation analysis between monthly timeseries of indicator–impact data at the federal state level, and between spatial patterns for selected drought events. The analysis clearly revealed a significant moderate to strong correlation for some states and drought events allowing for an intercomparison of the performance of different drought indicators. While several commonalities could be identified regarding "best" indicator, indicator metric, and time-scale of climatic anomaly, the analysis also exposed differences among federal states and drought events, suggesting that the linkage is time-variant and region specific to some degree. Concerning thresholds associated with drought impact onset, we found that no single "best" threshold value can be identified but impacts occur within a range of indicator values. While the findings strongly depend on data and may change with a growing number of EDII entries in the future, this study clearly demonstrates the feasibility of ground truthing hydro-meteorologic variables with text-based impact reports and highlights the value of impact reporting as a tool for monitoring drought conditions.

Download Full-text

Machine Learning Based Taxonomy and Analysis of English Learners' Translation Errors

International Journal of Computer-Assisted Language Learning and Teaching ◽

10.4018/ijcallt.2019070105 ◽

2019 ◽

Vol 9 (3) ◽

pp. 68-83

Author(s):

Ying Qin

Keyword(s):

Machine Learning ◽

English Learners ◽

Large Scale ◽

Learning Approaches ◽

Efl Learners ◽

Translation Error ◽

Chinese Learners ◽

Error Taxonomy ◽

Skill Improvement ◽

Translation Errors

This study extracts the comments from a large scale of Chinese EFL learners' translation corpus to study the taxonomy of translation errors. Two unsupervised machine learning approaches are used to obtain the computational evidences of translation error taxonomy. After manually revision, ten types of English to Chinese (E2C) and eight types Chinese to English (C2E) translation errors are finally confirmed. There probably exists three categories of top-level errors according to the hierarchical clustering results. In addition, three supervised learning methods are applied to automatically recognize the types of errors, among which the highest performance reaches F1 = 0.85 on E2C and F1 = 0.90 on C2E translation. Further comparison to the intuitive or theoretical studies on translation taxonomy shows some phenomenon accompanied by language skill improvement of Chinese learners. Analysis on translation problems based on machine learning provides the objective insight and understanding on the students' translations.

Download Full-text

Big Data’s Role in Health and Risk Messaging

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.359 ◽

2017 ◽

Author(s):

Bradford William Hesse

Keyword(s):

Machine Learning ◽

Big Data ◽

Risk Communication ◽

Large Scale ◽

Protein Identification ◽

Machine Learning Algorithms ◽

National Committee ◽

Learning Approaches ◽

Road Map ◽

Data Flows

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text