Neural Network Potentials: A Concise Overview of Methods

In the past two decades, machine learning potentials (MLPs) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics, and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences among MLPs, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like nonlocal charge transfer, and the type of descriptor used to represent the atomic structure, which can be either predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

The Basis of Navigation Across Species

Annual Review of Psychology ◽

10.1146/annurev-psych-020821-111311 ◽

2021 ◽

Vol 73 (1) ◽

Author(s):

Cody A. Freas ◽

Ken Cheng

Keyword(s):

Large Scale ◽

Error Reduction ◽

Annual Review ◽

Publication Date ◽

Additional Component ◽

Oscillatory Systems ◽

Neural Architecture ◽

Wide Range ◽

Reduction Strategies ◽

Cycling Behavior

Animals navigate a wide range of distances, from a few millimeters to globe-spanning journeys of thousands of kilometers. Despite this array of navigational challenges, similar principles underlie these behaviors across species. Here, we focus on the navigational strategies and supporting mechanisms in four well-known systems: the large-scale migratory behaviors of sea turtles and lepidopterans as well as navigation on a smaller scale by rats and solitarily foraging ants. In lepidopterans, rats, and ants we also discuss the current understanding of the neural architecture which supports navigation. The orientation and navigational behaviors of these animals are defined in terms of behavioral error-reduction strategies reliant on multiple goal-directed servomechanisms. We conclude by proposing to incorporate an additional component into this system: the observation that servomechanisms operate on oscillatory systems of cycling behavior. These oscillators and servomechanisms comprise the basis for directed orientation and navigational behaviors. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Machine learning potentials for extended systems: a perspective

The European Physical Journal B ◽

10.1140/epjb/s10051-021-00156-1 ◽

2021 ◽

Vol 94 (7) ◽

Author(s):

Jörg Behler ◽

Gábor Csányi

Keyword(s):

Machine Learning ◽

Electrostatic Interactions ◽

Large Scale ◽

Atomistic Simulations ◽

Extended Systems ◽

Empirical Potentials ◽

Materials Modelling ◽

Wide Range ◽

Long Range Interactions ◽

Non Local

Abstract In the past two and a half decades machine learning potentials have evolved from a special purpose solution to a broadly applicable tool for large-scale atomistic simulations. By combining the efficiency of empirical potentials and force fields with an accuracy close to first-principles calculations they now enable computer simulations of a wide range of molecules and materials. In this perspective, we summarize the present status of these new types of models for extended systems, which are increasingly used for materials modelling. There are several approaches, but they all have in common that they exploit the locality of atomic properties in some form. Long-range interactions, most prominently electrostatic interactions, can also be included even for systems in which non-local charge transfer leads to an electronic structure that depends globally on all atomic positions. Remaining challenges and limitations of current approaches are discussed. Graphic Abstract

Download Full-text

Machine Learning for Social Science: An Agnostic Approach

Annual Review of Political Science ◽

10.1146/annurev-polisci-053119-015921 ◽

2021 ◽

Vol 24 (1) ◽

Author(s):

Justin Grimmer ◽

Margaret E. Roberts ◽

Brandon M. Stewart

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Scientific Data ◽

Annual Review ◽

Publication Date ◽

Learning Methods ◽

Machine Learning Methods ◽

The Social ◽

Wide Range

Social scientists are now in an era of data abundance, and machine learning tools are increasingly used to extract meaning from data sets both massive and small. We explain how the inclusion of machine learning in the social sciences requires us to rethink not only applications of machine learning methods but also best practices in the social sciences. In contrast to the traditional tasks for machine learning in computer science and statistics, when machine learning is applied to social scientific data, it is used to discover new concepts, measure the prevalence of those concepts, assess causal effects, and make predictions. The abundance of data and resources facilitates the move away from a deductive social science to a more sequential, interactive, and ultimately inductive approach to inference. We explain how an agnostic approach to machine learning methods focused on the social science tasks facilitates progress across a wide range of questions. Expected final online publication date for the Annual Review of Political Science, Volume 24 is May 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Global soil moisture data derived through machine learning trained with in-situ measurements

Scientific Data ◽

10.1038/s41597-021-00964-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sungmin O. ◽

Rene Orth

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Large Scale ◽

Short Term Memory ◽

Temporal Dynamics ◽

Soil Moisture Data ◽

Wide Range ◽

Global Soil

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

Mechanical Patterning in Animal Morphogenesis

Annual Review of Cell and Developmental Biology ◽

10.1146/annurev-cellbio-120319-030931 ◽

2021 ◽

Vol 37 (1) ◽

Author(s):

Yonit Maroudas-Sacks ◽

Kinneret Keren

Keyword(s):

Pattern Formation ◽

Large Scale ◽

Coarse Grained ◽

Annual Review ◽

Publication Date ◽

Biochemical Processes ◽

Substantial Progress ◽

Biological Pattern Formation ◽

Effective Theories ◽

Biochemical Signals

Morphogenesis is one of the most remarkable examples of biological pattern formation. Despite substantial progress in the field, we still do not understand the organizational principles responsible for the robust convergence of the morphogenesis process across scales to form viable organisms under variable conditions. Achieving large-scale coordination requires feedback between mechanical and biochemical processes, spanning all levels of organization and relating the emerging patterns with the mechanisms driving their formation. In this review, we highlight the role of mechanics in the patterning process, emphasizing the active and synergistic manner in which mechanical processes participate in developmental patterning rather than merely following a program set by biochemical signals. We discuss the value of applying a coarse-grained approach toward understanding this complex interplay, which considers the large-scale dynamics and feedback as well as complementing the reductionist approach focused on molecular detail. A central challenge in this approach is identifying relevant coarse-grained variables and developing effective theories that can serve as a basis for an integrated framework for understanding this remarkable pattern-formation process. Expected final online publication date for the Annual Review of Cell and Developmental Biology, Volume 37 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Gang Research in the Twenty-First Century

Annual Review of Criminology ◽

10.1146/annurev-criminol-030920-094656 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Caylin Louis Moore ◽

Forrest Stuart

Keyword(s):

Large Scale ◽

Criminological Theory ◽

First Century ◽

Annual Review ◽

Publication Date ◽

Theoretical Frameworks ◽

Twenty First Century ◽

State Interventions ◽

And Behavior ◽

Analytical Approaches

For nearly a century, gang scholarship has remained foundational to criminological theory and method. Twenty-first-century scholarship continues to refine and, in some cases, supplant long-held axioms about gang formation, organization, and behavior. Recent advances can be traced to shifts in the empirical social reality and conditions within which gangs exist and act. We draw out this relationship—between the ontological and epistemological—by identifying key macrostructural shifts that have transformed gang composition and behavior and, in turn, forced scholars to revise dominant theoretical frameworks and analytical approaches. These shifts include large-scale economic transformations, the expansion of punitive state interventions, the proliferation of the Internet and social media, intensified globalization, and the increasing presence of women and LGBTQ individuals in gangs and gang research. By introducing historically unprecedented conditions and actors, these developments provide novel opportunities to reconsider previous analyses of gang structure, violence, and other related objects of inquiry. Expected final online publication date for the Annual Review of Criminology, Volume 5 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Bacterial Multicellularity: The Biology of Escherichia coli Building Large-Scale Biofilm Communities

Annual Review of Microbiology ◽

10.1146/annurev-micro-031921-055801 ◽

2021 ◽

Vol 75 (1) ◽

Author(s):

Diego O. Serra ◽

Regine Hengge

Keyword(s):

Escherichia Coli ◽

Large Scale ◽

Second Messengers ◽

Emergent Properties ◽

Annual Review ◽

Publication Date ◽

Growth And Survival ◽

Chemical Gradients ◽

Life Itself ◽

Scale Matrix

Biofilms are a widespread multicellular form of bacterial life. The spatial structure and emergent properties of these communities depend on a polymeric extracellular matrix architecture that is orders of magnitude larger than the cells that build it. Using as a model the wrinkly macrocolony biofilms of Escherichia coli, which contain amyloid curli fibers and phosphoethanolamine (pEtN)-modified cellulose as matrix components, we summarize here the structure, building, and function of this large-scale matrix architecture. Based on different sigma and other transcription factors as well as second messengers, the underlying regulatory network reflects the fundamental trade-off between growth and survival. It controls matrix production spatially in response to long-range chemical gradients, but it also generates distinct patterns of short-range matrix heterogeneity that are crucial for tissue-like elasticity and macroscopic morphogenesis. Overall, these biofilms confer protection and a potential for homeostasis, thereby reducing maintenance energy, which makes multicellularity an emergent property of life itself. Expected final online publication date for the Annual Review of Microbiology, Volume 75 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text