Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text

Abstract Objective Extracting medical knowledge from electronic medical records requires automated approaches to combat scalability limitations and selection biases. However, existing machine learning approaches are often regarded by clinicians as black boxes. Moreover, training data for these automated approaches at often sparsely annotated at best. The authors target unsupervised learning for modeling clinical narrative text, aiming at improving both accuracy and interpretability. Methods The authors introduce a novel framework named subgraph augmented non-negative tensor factorization (SANTF). In addition to relying on atomic features (e.g., words in clinical narrative text), SANTF automatically mines higher-order features (e.g., relations of lymphoid cells expressing antigens) from clinical narrative text by converting sentences into a graph representation and identifying important subgraphs. The authors compose a tensor using patients, higher-order features, and atomic features as its respective modes. We then apply non-negative tensor factorization to cluster patients, and simultaneously identify latent groups of higher-order features that link to patient clusters, as in clinical guidelines where a panel of immunophenotypic features and laboratory results are used to specify diagnostic criteria. Results and Conclusion SANTF demonstrated over 10% improvement in averaged F-measure on patient clustering compared to widely used non-negative matrix factorization (NMF) and k-means clustering methods. Multiple baselines were established by modeling patient data using patient-by-features matrices with different feature configurations and then performing NMF or k-means to cluster patients. Feature analysis identified latent groups of higher-order features that lead to medical insights. We also found that the latent groups of atomic features help to better correlate the latent groups of higher-order features.

Download Full-text

Multilayer Soil Moisture Mapping at a Regional Scale from Multisource Data via a Machine Learning Method

Remote Sensing ◽

10.3390/rs11030284 ◽

2019 ◽

Vol 11 (3) ◽

pp. 284 ◽

Cited By ~ 1

Author(s):

Linglin Zeng ◽

Shun Hu ◽

Daxiang Xiang ◽

Xiang Zhang ◽

Deren Li ◽

...

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Regional Scale ◽

Remotely Sensed ◽

Temporal Variations ◽

Training Data ◽

Estimation Accuracy ◽

Learning Approaches ◽

Remotely Sensed Data ◽

Deep Soil

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.

Download Full-text

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

Deep Learning of Appearance Affinity for Multi-Object Tracking and Re-Identification: A Comparative View

Electronics ◽

10.3390/electronics9111757 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1757

Author(s):

María J. Gómez-Silva ◽

Arturo de la Escalera ◽

José M. Armingol

Keyword(s):

Deep Learning ◽

Object Tracking ◽

Loss Function ◽

Neural Model ◽

Training Data ◽

Learning Approaches ◽

The Core ◽

Triplet Loss ◽

Affinity Model

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.

Download Full-text

New active learning algorithms for near-infrared spectroscopy in agricultural applications

at - Automatisierungstechnik ◽

10.1515/auto-2020-0143 ◽

2021 ◽

Vol 69 (4) ◽

pp. 297-306

Author(s):

Julius Krause ◽

Maurice Günder ◽

Daniel Schulz ◽

Robin Gruna

Keyword(s):

Active Learning ◽

Near Infrared ◽

Agricultural Products ◽

Training Data ◽

Calibration Model ◽

Learning Approaches ◽

Training Samples ◽

Agricultural Applications ◽

Selection Of

Abstract The selection of training data determines the quality of a chemometric calibration model. In order to cover the entire parameter space of known influencing parameters, an experimental design is usually created. Nevertheless, even with a carefully prepared Design of Experiment (DoE), redundant reference analyses are often performed during the analysis of agricultural products. Because the number of possible reference analyses is usually very limited, the presented active learning approaches are intended to provide a tool for better selection of training samples.

Download Full-text

Dense Associative Memory Is Robust to Adversarial Inputs

Neural Computation ◽

10.1162/neco_a_01143 ◽

2018 ◽

Vol 30 (12) ◽

pp. 3151-3167 ◽

Cited By ~ 19

Author(s):

Dmitry Krotov ◽

John Hopfield

Keyword(s):

Objective Function ◽

Associative Memory ◽

Energy Function ◽

Human Subjects ◽

Human Perception ◽

Higher Order ◽

Human Vision ◽

Training Data ◽

Decision Boundary ◽

Interaction Vertex

Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods. Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.

Download Full-text

SCIENTIFIC APPROACH IN EFL YOUNG ADOLESCENTS TO PROMOTE HIGHER ORDER THINKING SKILLS: TEACHER’S STRATEGY, BENEFITS, AND CHALLENGES

JALL (Journal of Applied Linguistics and Literacy) ◽

10.25157/jall.v4i2.3857 ◽

2020 ◽

Vol 4 (2) ◽

Author(s):

Sri Setyarini

Keyword(s):

Teaching Practice ◽

Higher Order Thinking ◽

Thinking Skills ◽

Higher Order ◽

Research Report ◽

Learning Approaches ◽

Development Programs ◽

Young Adolescents ◽

Higher Order Thinking Skills ◽

Scientific Approach

Higher Order Thinking as one of the main agendas in the Curriculum 2013 introduces several learning approaches – one of which is scientific approach. However, so far, the majority of English teachers in Indonesia still face some challenges due to their insufficient knowledge and experience in implementing this approach. This paper presents a research report on strategies of promoting higher order thinking skills (HOTS) in EFL young adolescents’ classroom through scientific approach. It aimed to investigate how HOTS was promoted in the EFL classroom, identify benefits gained by the teacher and the students from the implemented approach, and find out teacher’s challenges and solutions from the teaching practice. This study employed a case study involving a class of seventh grade students as research participants. To collect data, three instruments were used such as classroom observation, interview with the teacher and the students, and document analysis. The findings revealed that scientific approach with its components (observing, questioning, associating, exploring, and communicating) may promote students’ HOTS as seen from their enthusiasm and active participation in the classroom. The students also focused more on showing ideas, arguments, and views toward the questions from other groups as proved by their statements in the interview claiming that they were trained to do analysis, evaluation, and creation through learning activities. Meanwhile, the teacher stated that her challenges in teaching dealt with her limited experience and knowledge to implement this approach. To overcome them, she committed to join professional development programs and improve her linguistic skills. Keywords: EFL Classroom, Higher Order Thinking Skills, Scientific Approach, The Curriculum 2013, Young Adolescents

Download Full-text

3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design

10.26434/chemrxiv.11833323.v2 ◽

2020 ◽

Author(s):

Paul Francoeur ◽

Tomohide Masuda ◽

David R. Koes

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Mean Squared Error ◽

Comprehensive Evaluation ◽

Training Data ◽

Learning Approaches ◽

Neural Network Models ◽

Structure Based Drug Design ◽

Affinity Prediction

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.

Download Full-text

Generating stable molecules using imitation and reinforcement learning

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ac3eb4 ◽

2021 ◽

Author(s):

Søren Ager Meldgaard ◽

Jonas Köhler ◽

Henrik Lund Mortensen ◽

Mads-Peter Verner Christiansen ◽

Frank Noé ◽

...

Keyword(s):

Reinforcement Learning ◽

Chemical Space ◽

Training Data ◽

Graph Representation ◽

Imitation Learning ◽

Training Set ◽

Machine Learning Methods ◽

Multiple Copies ◽

The Stability ◽

3D Information

Abstract Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

Download Full-text

Real-Time Automated Classification of Sky Conditions Using Deep Learning and Edge Computing

Remote Sensing ◽

10.3390/rs13193859 ◽

2021 ◽

Vol 13 (19) ◽

pp. 3859

Author(s):

Joby M. Prince Czarnecki ◽

Sathishkumar Samiappan ◽

Meilun Zhou ◽

Cary Daniel McCraine ◽

Louis L. Wasson

Keyword(s):

Neural Network ◽

Deep Learning ◽

Image Quality ◽

Convolutional Neural Network ◽

Precision Agriculture ◽

Edge Computing ◽

Training Data ◽

Learning Approaches ◽

Sky Conditions

The radiometric quality of remotely sensed imagery is crucial for precision agriculture applications because estimations of plant health rely on the underlying quality. Sky conditions, and specifically shadowing from clouds, are critical determinants in the quality of images that can be obtained from low-altitude sensing platforms. In this work, we first compare common deep learning approaches to classify sky conditions with regard to cloud shadows in agricultural fields using a visible spectrum camera. We then develop an artificial-intelligence-based edge computing system to fully automate the classification process. Training data consisting of 100 oblique angle images of the sky were provided to a convolutional neural network and two deep residual neural networks (ResNet18 and ResNet34) to facilitate learning two classes, namely (1) good image quality expected, and (2) degraded image quality expected. The expectation of quality stemmed from the sky condition (i.e., density, coverage, and thickness of clouds) present at the time of the image capture. These networks were tested using a set of 13,000 images. Our results demonstrated that ResNet18 and ResNet34 classifiers produced better classification accuracy when compared to a convolutional neural network classifier. The best overall accuracy was obtained by ResNet34, which was 92% accurate, with a Kappa statistic of 0.77. These results demonstrate a low-cost solution to quality control for future autonomous farming systems that will operate without human intervention and supervision.

Download Full-text

Identification of Adverse Drug Events in Chinese Clinical Narrative Text

Ubiquitous Computing Application and Wireless Sensor - Lecture Notes in Electrical Engineering ◽

10.1007/978-94-017-9618-7_62 ◽

2015 ◽

pp. 605-612 ◽

Cited By ~ 1

Author(s):

Caixia Ge ◽

Yinsheng Zhang ◽

Huilong Duan ◽

Haomin Li

Keyword(s):

Adverse Drug Events ◽

Narrative Text ◽

Clinical Narrative

Download Full-text