A Unifying Framework and Comparative Evaluation of Statistical and Machine Learning Approaches to Non-Specific Syndromic Surveillance

Monitoring the development of infectious diseases is of great importance for the prevention of major outbreaks. Syndromic surveillance aims at developing algorithms which can detect outbreaks as early as possible by monitoring data sources which allow to capture the occurrences of a certain disease. Recent research mainly concentrates on the surveillance of specific, known diseases, putting the focus on the definition of the disease pattern under surveillance. Until now, only little effort has been devoted to what we call non-specific syndromic surveillance, i.e., the use of all available data for detecting any kind of infectious disease outbreaks. In this work, we give an overview of non-specific syndromic surveillance from the perspective of machine learning and propose a unified framework based on global and local modeling techniques. We also present a set of statistical modeling techniques which have not been used in a local modeling context before and can serve as benchmarks for the more elaborate machine learning approaches. In an experimental comparison of different approaches to non-specific syndromic surveillance we found that these simple statistical techniques already achieve competitive results and sometimes even outperform more elaborate approaches. In particular, applying common syndromic surveillance methods in a non-specific setting seems to be promising.

Download Full-text

Machine learning for metagenomics: methods and tools

Metagenomics ◽

10.1515/metgen-2016-0001 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 15

Author(s):

Hayssam Soueidan ◽

Macha Nikolski

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Full Range ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Unified Framework ◽

Comparative Metagenomics ◽

Learning Techniques ◽

Ngs Data ◽

Modern Machine

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.

Download Full-text

Validation of a Derived International Patient Severity Algorithm to Support COVID-19 Analytics from Electronic Health Record Data

10.1101/2020.10.13.20201855 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jeffrey G Klann ◽

Griffin M Weber ◽

Hossein Estiri ◽

Bertrand Moal ◽

Paul Avillach ◽

...

Keyword(s):

Machine Learning ◽

Chart Review ◽

Learning Approach ◽

Learning Approaches ◽

Electronic Health Record Data ◽

Icu Admission ◽

Machine Learning Approach ◽

Proxy Measure ◽

Definition Of

AbstractIntroductionThe Consortium for Clinical Characterization of COVID-19 by EHR (4CE) includes hundreds of hospitals internationally using a federated computational approach to COVID-19 research using the EHR.ObjectiveWe sought to develop and validate a standard definition of COVID-19 severity from readily accessible EHR data across the Consortium.MethodsWe developed an EHR-based severity algorithm and validated it on patient hospitalization data from 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also used a machine learning approach to compare selected predictors of severity to the 4CE algorithm at one site.ResultsThe 4CE severity algorithm performed with pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of single code categories for acuity were unacceptably inaccurate - varying by up to 0.65 across sites. A multivariate machine learning approach identified codes resulting in mean AUC 0.956 (95% CI: 0.952, 0.959) compared to 0.903 (95% CI: 0.886, 0.921) using expert-derived codes. Billing codes were poor proxies of ICU admission, with 49% precision and recall compared against chart review at one partner institution.DiscussionWe developed a proxy measure of severity that proved resilient to coding variability internationally by using a set of 6 code classes. In contrast, machine-learning approaches may tend to overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold standard outcomes, possibly due to pandemic conditions.ConclusionWe developed an EHR-based algorithm for COVID-19 severity and validated it at 12 international sites.

Download Full-text

An Experimental Comparison between Deep Learning and Classical Machine Learning Approaches for Writer Identification in Medieval Documents

Journal of Imaging ◽

10.3390/jimaging6090089 ◽

2020 ◽

Vol 6 (9) ◽

pp. 89

Author(s):

Nicole Dalia Cilia ◽

Claudio De Stefano ◽

Francesco Fontanella ◽

Claudio Marrocco ◽

Mario Molinara ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Performance ◽

Ad Hoc ◽

Digital Images ◽

Experimental Comparison ◽

Learning Approaches ◽

Test Bed ◽

Ancient Manuscripts ◽

Ancient Documents

In the framework of palaeography, the availability of both effective image analysis algorithms, and high-quality digital images has favored the development of new applications for the study of ancient manuscripts and has provided new tools for decision-making support systems. The quality of the results provided by such applications, however, is strongly influenced by the selection of effective features, which should be able to capture the distinctive aspects to which the paleography expert is interested in. This process is very difficult to generalize due to the enormous variability in the type of ancient documents, produced in different historical periods with different languages and styles. The effect is that it is very difficult to define standard techniques that are general enough to be effectively used in any case, and this is the reason why ad-hoc systems, generally designed according to paleographers’ suggestions, have been designed for the analysis of ancient manuscripts. In recent years, there has been a growing scientific interest in the use of techniques based on deep learning (DL) for the automatic processing of ancient documents. This interest is not only due to their capability of designing high-performance pattern recognition systems, but also to their ability of automatically extracting features from raw data, without using any a priori knowledge. Moving from these considerations, the aim of this study is to verify if DL-based approaches may actually represent a general methodology for automatically designing machine learning systems for palaeography applications. To this purpose, we compared the performance of a DL-based approach with that of a “classical” machine learning one, in a particularly unfavorable case for DL, namely that of highly standardized schools. The rationale of this choice is to compare the obtainable results even when context information is present and discriminating: this information is ignored by DL approaches, while it is used by machine learning methods, making the comparison more significant. The experimental results refer to the use of a large sets of digital images extracted from an entire 12th-century Bibles, the “Avila Bible”. This manuscript, produced by several scribes who worked in different periods and in different places, represents a severe test bed to evaluate the efficiency of scribe identification systems.

Download Full-text

Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification

Human Computation ◽

10.15346/hc.v5i1.2 ◽

2018 ◽

Vol 5 ◽

pp. 13-30

Author(s):

Gloria Re Calegari ◽

Gioele Nasi ◽

Irene Celino

Keyword(s):

Machine Learning ◽

Image Classification ◽

Learning System ◽

Human Computation ◽

Experimental Comparison ◽

Learning Approaches ◽

Domain Experts ◽

To Come ◽

Combined Strategy

Image classification is a classical task heavily studied in computer vision and widely required in many concrete scientific and industrial scenarios. Is it better to rely on human eyes, thus asking people to classify pictures, or to train a machine learning system to automatically solve the task? The answer largely depends on the specific case and the required accuracy: humans may be more reliable - especially if they are domain experts - but automatic processing can be cheaper, even if less capable to demonstrate an "intelligent" behaviour.In this paper, we present an experimental comparison of different Human Computation and Machine Learning approaches to solve the same image classification task on a set of pictures used in light pollution research. We illustrate the adopted methods and the obtained results and we compare and contrast them in order to come up with a long term combined strategy to address the specific issue at scale: while it is hard to ensure a long-term engagement of users to exclusively rely on the Human Computation approach, the human classification is indispensable to overcome the "cold start" problem of automated data modelling.

Download Full-text

Don’t Dismiss Logistic Regression: The Case for Sensible Extraction of Interactions in the Era of Machine Learning

10.1101/2019.12.15.877134 ◽

2019 ◽

Cited By ~ 1

Author(s):

Joshua J. Levy ◽

A. James O’Malley

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Model Building ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Statistical Machine Learning ◽

Forest Model ◽

Learning Techniques ◽

Modeling Techniques

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.

Download Full-text

Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks

Frontiers in Microbiology ◽

10.3389/fmicb.2019.01722 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 2

Author(s):

Baiba Vilne ◽

Irēna Meistere ◽

Lelde Grantiņa-Ieviņa ◽

Juris Ķibilds

Keyword(s):

Machine Learning ◽

Disease Outbreaks ◽

Learning Approaches ◽

Food Borne

Download Full-text

Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification

Human Computation ◽

10.15346/hc.v5i1.93 ◽

2018 ◽

Vol 5 ◽

pp. 13-30

Author(s):

Gloria Re Calegari ◽

Gioele Nasi ◽

Irene Celino

Keyword(s):

Machine Learning ◽

Image Classification ◽

Learning System ◽

Human Computation ◽

Experimental Comparison ◽

Learning Approaches ◽

Domain Experts ◽

To Come ◽

Combined Strategy

Download Full-text

A Unified Framework for Automatic Detection of Wound Infection with Artificial Intelligence

Applied Sciences ◽

10.3390/app10155353 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5353

Author(s):

Jin-Ming Wu ◽

Chia-Jui Tsai ◽

Te-Wei Ho ◽

Feipei Lai ◽

Hao-Chih Tai ◽

...

Keyword(s):

Machine Learning ◽

Wound Infection ◽

Characteristic Curve ◽

Automatic Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Unified Framework ◽

Unique Problem

Background: The surgical wound is a unique problem requiring continuous postoperative care, and mobile health technology is implemented to bridge the care gap. Our study aim was to design an integrated framework to support the diagnosis of wound infection. Methods: We used a computer-vision approach based on supervised learning techniques and machine learning algorithms, to help detect the wound region of interest (ROI) and classify wound infection features. The intersection-union test (IUT) was used to evaluate the accuracy of the detection of color card and wound ROI. The area under the receiver operating characteristic curve (AUC) of our model was adopted in comparison with different machine learning approaches. Results: 480 wound photographs were taken from 100 patients for analysis. The average value of IUT on the validation set with fivefold stratification to detect wound ROI was 0.775. For prediction of wound infection, our model achieved a significantly higher AUC score (83.3%) than the other three methods (kernel support vector machines, 44.4%; random forest, 67.1%; gradient boosting classifier, 66.9%). Conclusions: Our evaluation of a prospectively collected wound database demonstrates the effectiveness and reliability of the proposed system, which has been developed for automatic detection of wound infections in patients undergoing surgical procedures.

Download Full-text