Introduction to Machine Learning in Digital Healthcare Epidemiology

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Erratum to: Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/giy043 ◽

2018 ◽

Vol 7 (7) ◽

Author(s):

Jonathan A Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

Application of Large-Scale Analysis Techniques to a Man-Portable Limited-Scope Environment

Review of Progress in Quantitative Nondestructive Evaluation ◽

10.1007/978-1-4615-3742-7_132 ◽

1991 ◽

pp. 2145-2147

Author(s):

Keith S. Pickens

Keyword(s):

Large Scale ◽

Scale Analysis ◽

Analysis Techniques ◽

Large Scale Analysis ◽

Limited Scope

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Understanding platelets

Thrombosis and Haemostasis ◽

10.1160/th05-02-0121 ◽

2005 ◽

Vol 94 (11) ◽

pp. 916-925 ◽

Cited By ~ 31

Author(s):

Marcus Dittrich ◽

Ingvild Birschmann ◽

Christiane Stuhlfelder ◽

Albert Sickmann ◽

Sabine Herterich ◽

...

Keyword(s):

Mass Spectrometry ◽

Large Scale ◽

Building Blocks ◽

Clinical Implications ◽

Scale Analysis ◽

New Techniques ◽

Analysis Techniques ◽

Large Scale Analysis ◽

Sage Data ◽

New Framework

SummaryNew large-scale analysis techniques such as bioinformatics, mass spectrometry and SAGE data analysis will allow a new framework for understanding platelets. This review analyses some important options and tasks for these tools and examines an outline of the new, refined picture of the platelet outlined by these new techniques. Looking at the platelet-specific building blocks of genome, (active) transcriptome and proteome (notably secretome and phospho-proteome), we summarize current bioinformatical and biochemical approaches, tasks as well as their limitations. Understanding the surprisingly complex platelet regarding compartmentalization, key cascades, and pathways including clinical implications will remain an exciting and hopefully fruitful challenge for the future.

Download Full-text

Analysis of Biological Screening Compounds with Single- or Multi-Target Activity via Diagnostic Machine Learning

Biomolecules ◽

10.3390/biom10121605 ◽

2020 ◽

Vol 10 (12) ◽

pp. 1605

Author(s):

Christian Feldmann ◽

Dimitar Yonchev ◽

Jürgen Bajorath

Keyword(s):

Machine Learning ◽

Large Scale ◽

Nearest Neighbor ◽

Structural Features ◽

Biological Screening ◽

Large Scale Analysis ◽

Structural Relationships ◽

Single Target ◽

Neighbor Relationship ◽

Target Activity

Predicting compounds with single- and multi-target activity and exploring origins of compound specificity and promiscuity is of high interest for chemical biology and drug discovery. We present a large-scale analysis of compound promiscuity including two major components. First, high-confidence datasets of compounds with multi- and corresponding single-target activity were extracted from biological screening data. Positive and negative assay results were taken into account and data completeness was ensured. Second, these datasets were investigated using diagnostic machine learning to systematically distinguish between compounds with multi- and single-target activity. Models built on the basis of chemical structure consistently produced meaningful predictions. These findings provided evidence for the presence of structural features differentiating promiscuous and non-promiscuous compounds. Machine learning under varying conditions using modified datasets revealed a strong influence of nearest neighbor relationship on the predictions. Many multi-target compounds were found to be more similar to other multi-target compounds than single-target compounds and vice versa, which resulted in consistently accurate predictions. The results of our study confirm the presence of structural relationships that differentiate promiscuous and non-promiscuous compounds.

Download Full-text

Large-Scale Analysis of the Head Proximity Effects on Antenna Performance Using Machine Learning Based Models

IEEE Access ◽

10.1109/access.2020.3017773 ◽

2020 ◽

Vol 8 ◽

pp. 154060-154071

Author(s):

Yinliang Diao ◽

Essam A. Rashed ◽

Akimasa Hirata

Keyword(s):

Machine Learning ◽

Large Scale ◽

Proximity Effects ◽

Scale Analysis ◽

Antenna Performance ◽

Large Scale Analysis

Download Full-text

Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/gix084 ◽

2017 ◽

Vol 6 (10) ◽

Cited By ~ 10

Author(s):

Jonathan A. Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E. Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

Predicting file downloading time in cellular network: Large-Scale analysis of machine learning approaches

Computer Networks ◽

10.1016/j.comnet.2018.09.002 ◽

2018 ◽

Vol 145 ◽

pp. 243-254

Author(s):

Alassane Samba ◽

Yann Busnel ◽

Alberto Blanc ◽

Philippe Dooze ◽

Gwendal Simon

Keyword(s):

Machine Learning ◽

Cellular Network ◽

Large Scale ◽

Learning Approaches ◽

Scale Analysis ◽

Large Scale Analysis

Download Full-text

Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Empiria Revista de metodología de ciencias sociales ◽

10.5944/empiria.42.2019.23254 ◽

2019 ◽

pp. 113 ◽

Cited By ~ 2

Author(s):

Carlos Arcila Calderón ◽

Félix Ortega Mohedano ◽

Mateo Álvarez ◽

Miguel Vicente Mariño

Keyword(s):

Machine Learning ◽

Big Data ◽

Sentiment Analysis ◽

Real Time ◽

Large Scale ◽

Apache Spark ◽

Audience Research ◽

Distributed Environment ◽

Cross Sectional ◽

Large Scale Analysis

The large-scale analysis of tweets in real-time using supervised sentiment analysis depicts a unique opportunity for communication and audience research. Bringing together machine learning and streaming analytics approaches in a distributed environment might help scholars to obtain valuable data from Twitter in order to immediately classify messages depending on the context with no restrictions of time or storage, empowering cross-sectional, longitudinal and experimental designs with new inputs. Even when communication and audience researchers begin to use computational methods, most of them remain unfamiliar with distributed technologies to face big data challenges. This paper describes the implementation of parallelized machine learning methods in Apache Spark to predict sentiments in real-time tweets and explains how this process can be scaled up using academic or commercial distributed computing when personal computers do not support computations and storage. We discuss the limitation of these methods and their implications in communication, audience and media studies.El análisis a gran escala de tweets en tiempo real utilizando el análisis de sentimiento supervisado representa una oportunidad única para la investigación de comunicación y audiencias. El poner juntos los enfoques de aprendizaje automático y de analítica en tiempo real en un entorno distribuido puede ayudar a los investigadores a obtener datos valiosos de Twitter con el fin de clasificar de forma inmediata mensajes en función de su contexto, sin restricciones de tiempo o almacenamiento, mejorando los diseños transversales, longitudinales y experimentales con nuevas fuentes de datos. A pesar de que los investigadores de comunicación y audiencias ya han comenzado a utilizar los métodos computacionales en sus rutinas, la mayoría desconocen el uso de las tecnologías de computo distribuido para afrontar retos de dimensión big data. Este artículo describe la implementación de métodos de aprendizaje automático paralelizados en Apache Spark para predecir sentimientos de tweets en tiempo real y explica cómo este proceso puede ser escalado usando computación distribuida tanto comercial como académica, cuando los ordenadores personales son insuficientes para almacenar y analizar los datos. Se discuten las limitaciones de estos métodos y sus implicaciones en los estudios de medios, comunicación y audiencias.

Download Full-text