An Introduction to Machine Learning Approaches for Biomedical Research

Machine learning (ML) approaches are a collection of algorithms that attempt to extract patterns from data and to associate such patterns with discrete classes of samples in the data—e.g., given a series of features describing persons, a ML model predicts whether a person is diseased or healthy, or given features of animals, it predicts weather an animal is treated or control, or whether molecules have the potential to interact or not, etc. ML approaches can also find such patterns in an agnostic manner, i.e., without having information about the classes. Respectively, those methods are referred to as supervised and unsupervised ML. A third type of ML is reinforcement learning, which attempts to find a sequence of actions that contribute to achieving a specific goal. All of these methods are becoming increasingly popular in biomedical research in quite diverse areas including drug design, stratification of patients, medical images analysis, molecular interactions, prediction of therapy outcomes and many more. We describe several supervised and unsupervised ML techniques, and illustrate a series of prototypical examples using state-of-the-art computational approaches. Given the complexity of reinforcement learning, it is not discussed in detail here, instead, interested readers are referred to excellent reviews on that topic. We focus on concepts rather than procedures, as our goal is to attract the attention of researchers in biomedicine toward the plethora of powerful ML methods and their potential to leverage basic and applied research programs.

Download Full-text

A State-of-the-Art Survey on Deep Learning Theory and Architectures

Electronics ◽

10.3390/electronics8030292 ◽

2019 ◽

Vol 8 (3) ◽

pp. 292 ◽

Cited By ~ 157

Author(s):

Md Zahangir Alom ◽

Tarek M. Taha ◽

Chris Yakopcic ◽

Stefan Westberg ◽

Paheding Sidike ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Reinforcement Learning ◽

Language Processing ◽

Large Scale ◽

Medical Information ◽

State Of The Art ◽

Generative Models ◽

Learning Approaches

In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.

Download Full-text

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Natural Language Engineering ◽

10.1017/s1351324920000352 ◽

2020 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Clément Dalloux ◽

Vincent Claveau ◽

Natalia Grabar ◽

Lucas Emanuel Silva Oliveira ◽

Claudia Maria Cabral Moro ◽

...

Keyword(s):

Machine Learning ◽

Information Extraction ◽

State Of The Art ◽

Automatic Detection ◽

Brazilian Portuguese ◽

Supervised Machine Learning ◽

Biomedical Domain ◽

Learning Approaches ◽

Cross Domain ◽

Automatic Methods

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

Download Full-text

Inference of an Integrative, Executable Network for Rheumatoid Arthritis Combining Data-Driven Machine Learning Approaches and a State-of-the-Art Mechanistic Disease Map

Journal of Personalized Medicine ◽

10.3390/jpm11080785 ◽

2021 ◽

Vol 11 (8) ◽

pp. 785

Author(s):

Quentin Miagoux ◽

Vidisha Singh ◽

Dereck de Mézquita ◽

Valerie Chaudru ◽

Mohamed Elati ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

State Of The Art ◽

Biological Information ◽

Response To Treatment ◽

Patient Specific ◽

Learning Approaches ◽

Data Types ◽

Disease Heterogeneity ◽

Combining Data

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.

Download Full-text

Applying a Deep Q Network for OpenAIs Car Racing Game

10.14293/s2199-1006.1.sor-.ppd7fvs.v1 ◽

2020 ◽

Author(s):

Ali Fakhry

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Transfer Learning ◽

State Of The Art ◽

Learning Techniques ◽

Car Racing ◽

Custom Made ◽

Learning Technique ◽

Reward Threshold

The applications of Deep Q-Networks are seen throughout the field of reinforcement learning, a large subsect of machine learning. Using a classic environment from OpenAI, CarRacing-v0, a 2D car racing environment, alongside a custom based modification of the environment, a DQN, Deep Q-Network, was created to solve both the classic and custom environments. The environments are tested using custom made CNN architectures and applying transfer learning from Resnet18. While DQNs were state of the art years ago, using it for CarRacing-v0 appears somewhat unappealing and not as effective as other reinforcement learning techniques. Overall, while the model did train and the agent learned various parts of the environment, attempting to reach the reward threshold for the environment with this reinforcement learning technique seems problematic and difficult as other techniques would be more useful.

Download Full-text

Label Enhancement for Label Distribution Learning via Prior Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/446 ◽

2020 ◽

Author(s):

Yongbiao Gao ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Emotion Recognition ◽

Prior Knowledge ◽

Decision Process ◽

Age Estimation ◽

State Of The Art ◽

Learning Agent ◽

Label Distribution Learning ◽

Label Distribution

Label distribution learning (LDL) is a novel machine learning paradigm that gives a description degree of each label to an instance. However, most of training datasets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. We propose to use the prior knowledge to recover the label distributions. The process of recovering the label distributions from the logical labels is called label enhancement. In this paper, we formulate the label enhancement as a dynamic decision process. Thus, the label distribution is adjusted by a series of actions conducted by a reinforcement learning agent according to sequential state representations. The target state is defined by the prior knowledge. Experimental results show that the proposed approach outperforms the state-of-the-art methods in both age estimation and image emotion recognition.

Download Full-text

Learning fair models and representations

Intelligenza Artificiale ◽

10.3233/ia-190034 ◽

2020 ◽

Vol 14 (1) ◽

pp. 151-178

Author(s):

Luca Oneto

Keyword(s):

Machine Learning ◽

Social Services ◽

Ethical Issues ◽

State Of The Art ◽

Online Advertising ◽

Radical Change ◽

Data Representation ◽

Learning Approaches ◽

Central Question ◽

Disparate Treatment

Machine learning based systems and products are reaching society at large in many aspects of everyday life, including financial lending, online advertising, pretrial and immigration detention, child maltreatment screening, health care, social services, and education. This phenomenon has been accompanied by an increase in concern about the ethical issues that may rise from the adoption of these technologies. In response to this concern, a new area of machine learning has recently emerged that studies how to address disparate treatment caused by algorithmic errors and bias in the data. The central question is how to ensure that the learned model does not treat subgroups in the population unfairly. While the design of solutions to this issue requires an interdisciplinary effort, fundamental progress can only be achieved through a radical change in the machine learning paradigm. In this work, we will describe the state of the art on algorithmic fairness using statistical learning theory, machine learning, and deep learning approaches that are able to learn fair models and data representation.

Download Full-text

From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00025 ◽

2018 ◽

Vol 6 ◽

pp. 343-356 ◽

Cited By ~ 2

Author(s):

Egoitz Laparra ◽

Dongfang Xu ◽

Steven Bethard

Keyword(s):

Neural Network ◽

Machine Learning ◽

Comparative Analysis ◽

State Of The Art ◽

Learning Approaches ◽

Semantic Parsing ◽

Time Intervals ◽

Semantic Composition ◽

Previous State ◽

New Scoring

This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.

Download Full-text

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Download Full-text

Of mice and models: improved animal models for biomedical research

Physiological Genomics ◽

10.1152/physiolgenomics.00067.2002 ◽

2002 ◽

Vol 11 (3) ◽

pp. 115-132 ◽

Cited By ~ 110

Author(s):

Ernesto Bockamp ◽

Marko Maringer ◽

Christian Spangenberg ◽

Stephan Fees ◽

Stuart Fraser ◽

...

Keyword(s):

Animal Models ◽

Mouse Models ◽

Biomedical Research ◽

Applied Research ◽

Mouse Genome ◽

Gene Knockout ◽

Genetic Disorders ◽

Murine Models ◽

Good State ◽

Basic And Applied Research

The ability to engineer the mouse genome has profoundly transformed biomedical research. During the last decade, conventional transgenic and gene knockout technologies have become invaluable experimental tools for modeling genetic disorders, assigning functions to genes, evaluating drugs and toxins, and by and large helping to answer fundamental questions in basic and applied research. In addition, the growing demand for more sophisticated murine models has also become increasingly evident. Good state-of-principle knowledge about the enormous potential of second-generation conditional mouse technology will be beneficial for any researcher interested in using these experimental tools. In this review we will focus on practice, pivotal principles, and progress in the rapidly expanding area of conditional mouse technology. The review will also present an internet compilation of available tetracycline-inducible mouse models as tools for biomedical research ( http://www.zmg.uni-mainz.de/tetmouse/ ).

Download Full-text

Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping

10.1101/053033 ◽

2016 ◽

Cited By ~ 12

Author(s):

Michael P. Pound ◽

Alexandra J. Burgess ◽

Michael H. Wilson ◽

Jonathan A. Atkinson ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Analysis ◽

Paradigm Shift ◽

State Of The Art ◽

Plant Phenotyping ◽

Learning Approaches ◽

Challenging Problem ◽

Feature Identification ◽

Art Performance

AbstractDeep learning is an emerging field that promises unparalleled results on many data analysis problems. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping, and demonstrate state-of-the-art results for root and shoot feature identification and localisation. We predict a paradigm shift in image-based phenotyping thanks to deep learning approaches.

Download Full-text