scholarly journals Augmenting Transformers with KNN-Based Composite Memory for Dialog

Author(s):  
Angela Fan ◽  
Claire Gardent ◽  
Chloé Braud ◽  
Antoine Bordes

Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialog modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images, and human-written dialog utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.

Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


2020 ◽  
Vol 4 (1) ◽  
Author(s):  
Jamie Miles ◽  
Janette Turner ◽  
Richard Jacques ◽  
Julia Williams ◽  
Suzanne Mason

Abstract Background The primary objective of this review is to assess the accuracy of machine learning methods in their application of triaging the acuity of patients presenting in the Emergency Care System (ECS). The population are patients that have contacted the ambulance service or turned up at the Emergency Department. The index test is a machine-learning algorithm that aims to stratify the acuity of incoming patients at initial triage. This is in comparison to either an existing decision support tool, clinical opinion or in the absence of these, no comparator. The outcome of this review is the calibration, discrimination and classification statistics. Methods Only derivation studies (with or without internal validation) were included. MEDLINE, CINAHL, PubMed and the grey literature were searched on the 14th December 2019. Risk of bias was assessed using the PROBAST tool and data was extracted using the CHARMS checklist. Discrimination (C-statistic) was a commonly reported model performance measure and therefore these statistics were represented as a range within each machine learning method. The majority of studies had poorly reported outcomes and thus a narrative synthesis of results was performed. Results There was a total of 92 models (from 25 studies) included in the review. There were two main triage outcomes: hospitalisation (56 models), and critical care need (25 models). For hospitalisation, neural networks and tree-based methods both had a median C-statistic of 0.81 (IQR 0.80-0.84, 0.79-0.82). Logistic regression had a median C-statistic of 0.80 (0.74-0.83). For critical care need, neural networks had a median C-statistic of 0.89 (0.86-0.91), tree based 0.85 (0.84-0.88), and logistic regression 0.83 (0.79-0.84). Conclusions Machine-learning methods appear accurate in triaging undifferentiated patients entering the Emergency Care System. There was no clear benefit of using one technique over another; however, models derived by logistic regression were more transparent in reporting model performance. Future studies should adhere to reporting guidelines and use these at the protocol design stage. Registration and funding This systematic review is registered on the International prospective register of systematic reviews (PROSPERO) and can be accessed online at the following URL: https://www.crd.york.ac.uk/PROSPERO/display_record.php?ID=CRD42020168696 This study was funded by the NIHR as part of a Clinical Doctoral Research Fellowship.


2011 ◽  
pp. 81-104 ◽  
Author(s):  
G. Camps-Valls ◽  
J. F. Guerrero-Martinez

In this chapter, we review the vast field of application of artificial neural networks in cardiac pathology discrimination based on electrocardiographic signals. We discuss advantages and drawbacks of neural and adaptive systems in cardiovascular medicine and catch a glimpse of forthcoming developments in machine learning models for the real clinical environment. Some problems are identified in the learning tasks of beat detection, feature selection/extraction, and classification, and some proposals and suggestions are given to alleviate the problems of interpretability, overfitting, and adaptation. These have become important problems in recent years and will surely constitute the basis of some investigations in the immediate future.


Author(s):  
Niall Rooney

The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks (ANNs) models for regression tasks. Ensemble learning is now a widely deployed and researched topic within the area of machine learning and data mining. Ensemble learning, as a general definition, refers to the concept of being able to apply more than one learning model to a particular machine learning problem using some method of integration. The desired goal of course is that the ensemble as a unit will outperform any of its individual members for the given learning task. Ensemble learning has been extended to cover other learning tasks such as classification (refer to Kuncheva, 2004 for a detailed overview of this area), online learning (Fern & Givan, 2003) and clustering (Strehl & Ghosh, 2003). The focus of this article is to review ensemble learning with respect to regression, where by regression, we refer to the supervised learning task of creating a model that relates a continuous output variable to a vector of input variables.


2021 ◽  
pp. 1-18
Author(s):  
Ilenna Simone Jones ◽  
Konrad Paul Kording

Abstract Physiological experiments have highlighted how the dendrites of biological neurons can nonlinearly process distributed synaptic inputs. However, it is unclear how aspects of a dendritic tree, such as its branched morphology or its repetition of presynaptic inputs, determine neural computation beyond this apparent nonlinearity. Here we use a simple model where the dendrite is implemented as a sequence of thresholded linear units. We manipulate the architecture of this model to investigate the impacts of binary branching constraints and repetition of synaptic inputs on neural computation. We find that models with such manipulations can perform well on machine learning tasks, such as Fashion MNIST or Extended MNIST. We find that model performance on these tasks is limited by binary tree branching and dendritic asymmetry and is improved by the repetition of synaptic inputs to different dendritic branches. These computational experiments further neuroscience theory on how different dendritic properties might determine neural computation of clearly defined tasks.


2021 ◽  
pp. 43-53
Author(s):  
admin admin ◽  
◽  
◽  
Adnan Mohsin Abdulazeez

Due to many new medical uses, the value of ECG classification is very demanding. There are some Machine Learning (ML) algorithms currently available that can be used for ECG data processing and classification. The key limitations of these ML studies, however, are the use of heuristic hand-crafted or engineered characteristics of shallow learning architectures. The difficulty lies in the probability of not having the most suitable functionality that will provide this ECG problem with good classification accuracy. One choice suggested is to use deep learning algorithms in which the first layer of CNN acts as a feature. This paper summarizes some of the key approaches of ECG classification in machine learning, assessing them in terms of the characteristics they use, the precision of classification important physiological keys ECG biomarkers derived from machine learning techniques, and statistical modeling and supported simulation.


2019 ◽  
Author(s):  
Jakub M. Bartoszewicz ◽  
Anja Seidel ◽  
Robert Rentzsch ◽  
Bernhard Y. Renard

AbstractMotivation:We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. What is more, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, limiting their performance on unknown, unrecognized, and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads even though the biological context is unavailable. However, modern neural architectures treat DNA as a simple character string and may predict conflicting labels for a given sequence and its reverse-complement. This undesirable property may impact model performance.Results:We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a universal, extensible framework for neural architectures ensuring identical predictions for any given DNA sequence and its reverse-complement. We implement reverse-complement convolutional neural networks and LSTMs, which outperform the state-of-the-art methods based on both sequence homology and machine learning. Combining a reverse-complement architecture with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art.Availability:The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC


2020 ◽  
Vol 8 (4) ◽  
pp. 551-573 ◽  
Author(s):  
Hao Yin ◽  
Austin R. Benson ◽  
Johan Ugander

AbstractRecent work studying triadic closure in undirected graphs has drawn attention to the distinction between measures that focus on the “center” node of a wedge (i.e., length-2 path) versus measures that focus on the “initiator,” a distinction with considerable consequences. Existing measures in directed graphs, meanwhile, have all been center-focused. In this work, we propose a family of eight directed closure coefficients that measure the frequency of triadic closure in directed graphs from the perspective of the node initiating closure. The eight coefficients correspond to different labeled wedges, where the initiator and center nodes are labeled, and we observe dramatic empirical variation in these coefficients on real-world networks, even in cases when the induced directed triangles are isomorphic. To understand this phenomenon, we examine the theoretical behavior of our closure coefficients under a directed configuration model. Our analysis illustrates an underlying connection between the closure coefficients and moments of the joint in- and out-degree distributions of the network, offering an explanation of the observed asymmetries. We also use our directed closure coefficients as predictors in two machine learning tasks. We find interpretable models with AUC scores above 0.92 in class-balanced binary prediction, substantially outperforming models that use traditional center-focused measures.


Earth ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 174-190
Author(s):  
Sujan Pal ◽  
Prateek Sharma

Machine learning (ML), as an artificial intelligence tool, has acquired significant progress in data-driven research in Earth sciences. Land Surface Models (LSMs) are important components of the climate models, which help to capture the water, energy, and momentum exchange between the land surface and the atmosphere, providing lower boundary conditions to the atmospheric models. The objectives of this review paper are to highlight the areas of improvement in land modeling using ML and discuss the crucial ML techniques in detail. Literature searches were conducted using the relevant key words to obtain an extensive list of articles. The bibliographic lists of these articles were also considered. To date, ML-based techniques have been able to upgrade the performance of LSMs and reduce uncertainties by improving evapotranspiration and heat fluxes estimation, parameter optimization, better crop yield prediction, and model benchmarking. Widely used ML techniques used for these purposes include Artificial Neural Networks and Random Forests. We conclude that further improvements in land modeling are possible in terms of high-resolution data preparation, parameter calibration, uncertainty reduction, efficient model performance, and data assimilation using ML. In addition to the traditional techniques, convolutional neural networks, long short-term memory, and other deep learning methods can be implemented.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Jayakrishnan Ajayakumar ◽  
Andrew J. Curtis ◽  
Vanessa Rouzier ◽  
Jean William Pape ◽  
Sandra Bempah ◽  
...  

Abstract Background The health burden in developing world informal settlements often coincides with a lack of spatial data that could be used to guide intervention strategies. Spatial video (SV) has proven to be a useful tool to collect environmental and social data at a granular scale, though the effort required to turn these spatially encoded video frames into maps limits sustainability and scalability. In this paper we explore the use of convolution neural networks (CNN) to solve this problem by automatically identifying disease related environmental risks in a series of SV collected from Haiti. Our objective is to determine the potential of machine learning in health risk mapping for these environments by assessing the challenges faced in adequately training the required classification models. Results We show that SV can be a suitable source for automatically identifying and extracting health risk features using machine learning. While well-defined objects such as drains, buckets, tires and animals can be efficiently classified, more amorphous masses such as trash or standing water are difficult to classify. Our results further show that variations in the number of image frames selected, the image resolution, and combinations of these can be used to improve the overall model performance. Conclusion Machine learning in combination with spatial video can be used to automatically identify environmental risks associated with common health problems in informal settlements, though there are likely to be variations in the type of data needed for training based on location. Success based on the risk type being identified are also likely to vary geographically. However, we are confident in identifying a series of best practices for data collection, model training and performance in these settings. We also discuss the next step of testing these findings in other environments, and how adding in the simultaneously collected geographic data could be used to create an automatic health risk mapping tool.


Sign in / Sign up

Export Citation Format

Share Document