CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching

Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap do not scale with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ~17 minutes, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up various commonly used AIRR-based machine learning applications.

Download Full-text

Mining adaptive immune receptor repertoires for biological and clinical information using machine learning

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2020.10.010 ◽

2020 ◽

Vol 24 ◽

pp. 109-119 ◽

Cited By ~ 2

Author(s):

Victor Greiff ◽

Gur Yaari ◽

Lindsay G. Cowell

Keyword(s):

Machine Learning ◽

Clinical Information ◽

Immune Receptor ◽

Adaptive Immune

Download Full-text

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification

10.1101/2021.05.23.445346 ◽

2021 ◽

Author(s):

Chakravarthi Kanduri ◽

Milena Pavlović ◽

Lonneke Scheffer ◽

Keshav Motwani ◽

Maria Chernigovskaya ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Accuracy ◽

Immune Receptor ◽

Adaptive Immune ◽

High Prediction Accuracy ◽

Wide Range ◽

Benchmark Datasets ◽

High Prediction ◽

Penalized Logistic Regression

Background: Machine learning (ML) methodology development for classification of immune states in adaptive immune receptor repertoires (AIRR) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where further method development of more sophisticated ML approaches may be required. Results: To identify those scenarios where a baseline method is able to perform well for AIRR classification, we generated a collection of synthetic benchmark datasets encompassing a wide range of dataset architecture-associated and immune state-associated sequence pattern (signal) complexity. We trained ≈1300 ML models with varying assumptions regarding immune signal on ≈850 datasets with a total of ≈210000 repertoires containing ≈42 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50000 AIR sequences. Conclusions: We provide a reference benchmark to guide new AIRR ML classification methodology by: (i) identifying those scenarios characterised by immune signal and dataset complexity, where baseline methods already achieve high prediction accuracy and (ii) facilitating realistic expectations of the performance of AIRR ML models given training dataset properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark datasets for comprehensive benchmarking of AIRR ML methods.

Download Full-text

The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires

Nature Machine Intelligence ◽

10.1038/s42256-021-00413-z ◽

2021 ◽

Vol 3 (11) ◽

pp. 936-944

Author(s):

Milena Pavlović ◽

Lonneke Scheffer ◽

Keshav Motwani ◽

Chakravarthi Kanduri ◽

Radmila Kompova ◽

...

Keyword(s):

Machine Learning ◽

Immune Receptor ◽

Adaptive Immune ◽

Learning Analysis

Download Full-text

immuneML: an ecosystem for machine learning analysis of adaptive immune receptor repertoires

10.1101/2021.03.08.433891 ◽

2021 ◽

Cited By ~ 1

Author(s):

Milena Pavlović ◽

Lonneke Scheffer ◽

Keshav Motwani ◽

Chakravarthi Kanduri ◽

Radmila Kompova ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Antigen Specificity ◽

Immune Receptor ◽

Adaptive Immune ◽

User Adoption ◽

Extensive Documentation ◽

Command Line Tool ◽

Novel Method ◽

Learning Analysis

AbstractAdaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

Download Full-text

Machine learning applications for shock train diagnostics

AIAA Scitech 2021 Forum ◽

10.2514/6.2021-1878 ◽

2021 ◽

Author(s):

Jared Chin ◽

Mirko Gamba

Keyword(s):

Machine Learning ◽

Shock Train ◽

Machine Learning Applications

Download Full-text

A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods

Current Drug Targets ◽

10.2174/1389450119666181002143355 ◽

2019 ◽

Vol 20 (5) ◽

pp. 540-550 ◽

Cited By ~ 11

Author(s):

Jiu-Xin Tan ◽

Hao Lv ◽

Fang Wang ◽

Fu-Ying Dao ◽

Wei Chen ◽

...

Keyword(s):

Machine Learning ◽

Catalytic Mechanism ◽

Biological Function ◽

Learning Methods ◽

Biochemical Processes ◽

Machine Learning Methods ◽

Enzyme Family ◽

The Family ◽

Speed Up ◽

Family Classification

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.

Download Full-text

Exploring the Applications of Machine Learning in Healthcare

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666191220103417 ◽

2020 ◽

Vol 10 (4) ◽

pp. 458-472

Author(s):

Tausifa Jan Saleem ◽

Mohammad Ahsan Chishti

Keyword(s):

Machine Learning ◽

Disease Risk ◽

Disease Diagnosis ◽

Machine Intelligence ◽

Healthcare Applications ◽

Comprehensive Overview ◽

Machine Learning Applications ◽

Remote Healthcare ◽

Healthcare Monitoring ◽

Applications Of Machine Learning

The rapid progress in domains like machine learning, and big data has created plenty of opportunities in data-driven applications particularly healthcare. Incorporating machine intelligence in healthcare can result in breakthroughs like precise disease diagnosis, novel methods of treatment, remote healthcare monitoring, drug discovery, and curtailment in healthcare costs. The implementation of machine intelligence algorithms on the massive healthcare datasets is computationally expensive. However, consequential progress in computational power during recent years has facilitated the deployment of machine intelligence algorithms in healthcare applications. Motivated to explore these applications, this paper presents a review of research works dedicated to the implementation of machine learning on healthcare datasets. The studies that were conducted have been categorized into following groups (a) disease diagnosis and detection, (b) disease risk prediction, (c) health monitoring, (d) healthcare related discoveries, and (e) epidemic outbreak prediction. The objective of the research is to help the researchers in this field to get a comprehensive overview of the machine learning applications in healthcare. Apart from revealing the potential of machine learning in healthcare, this paper will serve as a motivation to foster advanced research in the domain of machine intelligence-driven healthcare.

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Machine Learning and Knowledge Extraction ◽

10.3390/make3020020 ◽

2021 ◽

Vol 3 (2) ◽

pp. 392-413

Author(s):

Stefan Studer ◽

Thanh Binh Bui ◽

Christian Drescher ◽

Alexander Hanuschkin ◽

Ludwig Winkler ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Process Model ◽

Practical Experience ◽

Special Focus ◽

Close Monitoring ◽

Machine Learning Applications ◽

Project Organizations ◽

Considerable Impact ◽

Learning Development

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

Download Full-text

How Do Machines Learn? Artificial Intelligence as a New Era in Medicine

Journal of Personalized Medicine ◽

10.3390/jpm11010032 ◽

2021 ◽

Vol 11 (1) ◽

pp. 32

Author(s):

Oliwia Koteluk ◽

Adrian Wartecki ◽

Sylwia Mazurek ◽

Iga Kołodziejczak ◽

Andrzej Mackiewicz

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Health Care ◽

Medical Data ◽

General Process ◽

New Era ◽

Automated Evaluation ◽

Machine Learning Applications ◽

And Training ◽

Current Standards

With an increased number of medical data generated every day, there is a strong need for reliable, automated evaluation tools. With high hopes and expectations, machine learning has the potential to revolutionize many fields of medicine, helping to make faster and more correct decisions and improving current standards of treatment. Today, machines can analyze, learn, communicate, and understand processed data and are used in health care increasingly. This review explains different models and the general process of machine learning and training the algorithms. Furthermore, it summarizes the most useful machine learning applications and tools in different branches of medicine and health care (radiology, pathology, pharmacology, infectious diseases, personalized decision making, and many others). The review also addresses the futuristic prospects and threats of applying artificial intelligence as an advanced, automated medicine tool.

Download Full-text