Training data sets for TensorFlow models from TeleEcho data.

Mapping Intimacies ◽

10.35543/osf.io/jrk4y ◽

2020 ◽

Author(s):

Anil Kumar Bheemaiah

Keyword(s):

Decision Support ◽

Gpu Computing ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Tensor Model ◽

Accelerometer Data ◽

Data Repositories ◽

Digital Medicine ◽

Mental Wellness

Abstract:Data streams are persisted and visualized for a practice of biofeedback based therapy, with the option of @edge decision support for premium services, in the form of on-demand telemedical services and CDS based decision support services, and integrated services like Amazon Pharmacy.Keywords: Digital Medicine, CDS HL7 webhooks, bio-feedback, LSL streams, AWS S3, Wolfram cloud, feature extraction functions, visualization of filters.What:Extraction of data by data-mining from hyperscale data from tele-echo data repositories, to create training data sets for a specific thread for Tensorflow model templates for transfer learning, with deployment of pre-trained networks using TensorFlow lite.Pre-Trained models are evaluated for prediction accuracy in integrated feature space and classification fitness models, for scalable deployment.How:We consider the use of TensorFlow Models, and train the models on an EC2 P3 image using GPU computing on SageMaker, using a Thread for the purpose.We consider the creation of the following : A MUSE 2 headset for PPG, Gyro Accelerometer data for breath and heart diagnostics is made using a python script and a 1D tensor model.(alexandrebarachant n.d.; “tf.nn.conv1d | TensorFlow Core r2.0” n.d., “tf.keras.layers.Conv1D | TensorFlow Core r2.0” n.d., “Tensorflow - Math behind 1D Convolution with Advanced Examples in TF | Tensorflow Tutorial” n.d.; Lee 2018)Why:Digital Medicine is accessible in the mental wellness community with an EEG wearable such as MUSE 2 , which has ppg and accelerometer data which can be data mined with a classifier 1D convolution Tensor Net for detecting any anomalies, requiring telemedicine.

Get full-text (via PubEx)

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Get full-text (via PubEx)

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Get full-text (via PubEx)

Motor Imagery EEG Classification with Biclustering Based Fuzzy Inference

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3040 ◽

2020 ◽

Vol 10 (7) ◽

pp. 1486-1493

Author(s):

Jianjun Sun

Keyword(s):

Fuzzy Inference ◽

Feature Space ◽

Small Sample ◽

Majority Voting ◽

Training Data ◽

Rule Base ◽

Data Sets ◽

Common Spatial Pattern ◽

Eeg Classification ◽

Subsequent Step

The rehabilitation of armless or footless patients is of great importance. One choice for such issue is using the electroencephalograph (EEG) brain computer interface to help the patients communicate with outside. Classifying the EEG signals generated from mental activity is one of the most important technologies. However, existing classification methods often suffer the overfitting problem caused by the small training data sets while big dimensionality of feature space. Fuzzy inference can imitate the human judgement, effectively dealing with uncertainty and small-sample learning problems. Besides, biclustering has shown excellent performance in constructing rule base. This paper proposes a novel biclustering based fuzzy inference method for EEG classification. It can be divided into five steps. The first step is generating features with common spatial pattern. The second step is searching local coherent patterns with column nearly constant biclustering. The third step is to transform the patterns to if-then rules with column averaging and majority voting strategy. Subsequent step is to employ Mamdani fuzzy inference to map the input feature vector into decimals. Finally, particle swarm optimization is utilized to generate optimal threshold for linear classification. Experiments on several commonly used data sets show that the proposed method has advantages over competitors in terms of classification accuracy.

Get full-text (via PubEx)

THE USE OF MACHINE LEARNING METHODS FOR BINARY CLASSIFICATION OF THE WORKING CONDITION OF BEARINGS USING THE SIGNALS OF VIBRATION ACCELERATION

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.03 ◽

2021 ◽

pp. 15-22

Author(s):

Ruslan Babudzhan ◽

Konstantyn Isaienkov ◽

Danilo Krasiy ◽

Oleksii Vodka ◽

Ivan Zadorozhny ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Fractal Dimensions ◽

Feature Space ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Vibration Acceleration ◽

K Nearest Neighbors

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.

Get full-text (via PubEx)

Nearest labelset using double distances for multi-label classification

PeerJ Computer Science ◽

10.7717/peerj-cs.242 ◽

2019 ◽

Vol 5 ◽

pp. e242

Author(s):

Hyukjun Gweon ◽

Matthias Schonlau ◽

Stefan H. Steiner

Keyword(s):

Maximum Likelihood ◽

Supervised Learning ◽

Feature Space ◽

Training Data ◽

Model Parameters ◽

Data Sets ◽

Weighted Sum ◽

Novel Approach ◽

Binomial Regression ◽

F Measure

Multi-label classification is a type of supervised learning where an instance may belong to multiple labels simultaneously. Predicting each label independently has been criticized for not exploiting any correlation between labels. In this article we propose a novel approach, Nearest Labelset using Double Distances (NLDD), that predicts the labelset observed in the training data that minimizes a weighted sum of the distances in both the feature space and the label space to the new instance. The weights specify the relative tradeoff between the two distances. The weights are estimated from a binomial regression of the number of misclassified labels as a function of the two distances. Model parameters are estimated by maximum likelihood. NLDD only considers labelsets observed in the training data, thus implicitly taking into account label dependencies. Experiments on benchmark multi-label data sets show that the proposed method on average outperforms other well-known approaches in terms of 0/1 loss, and multi-label accuracy and ranks second on the F-measure (after a method called ECC) and on Hamming loss (after a method called RF-PCT).

Get full-text (via PubEx)

An Introduction to Kernel Methods

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch170 ◽

2011 ◽

pp. 1097-1101

Author(s):

Gustavo Camps-Valls ◽

Manel Martínez-Ramón ◽

José Luis Rojo-Álvarez

Keyword(s):

Kernel Methods ◽

Adaptive Systems ◽

A Priori ◽

Feature Space ◽

Training Data ◽

High Dimensional ◽

Data Sets ◽

Practical Performance ◽

Active Research ◽

Innovative Techniques

Machine learning has experienced a great advance in the eighties and nineties due to the active research in artificial neural networks and adaptive systems. These tools have demonstrated good results in many real applications, since neither a priori knowledge about the distribution of the available data nor the relationships among the independent variables should be necessarily assumed. Overfitting due to reduced training data sets is controlled by means of a regularized functional which minimizes the complexity of the machine. Working with high dimensional input spaces is no longer a problem thanks to the use of kernel methods. Such methods also provide us with new ways to interpret the classification or estimation results. Kernel methods are emerging and innovative techniques that are based on first mapping the data from the original input feature space to a kernel feature space of higher dimensionality, and then solving a linear problem in that space. These methods allow us to geometrically design (and interpret) learning algorithms in the kernel space (which is nonlinearly related to the input space), thus combining statistics and geometry in an effective way. This theoretical elegance is also matched by their practical performance.

Get full-text (via PubEx)

SVDD-Based Pattern Denoising

Neural Computation ◽

10.1162/neco.2007.19.7.1919 ◽

2007 ◽

Vol 19 (7) ◽

pp. 1919-1938 ◽

Cited By ~ 36

Author(s):

Jooyoung Park ◽

Daesung Kang ◽

Jongho Kim ◽

James T. Kwok ◽

Ivor W. Tsang

Keyword(s):

Test Pattern ◽

Main Idea ◽

Feature Space ◽

Training Data ◽

Support Vector ◽

Support Vector Data Description ◽

Data Sets ◽

Decision Boundary ◽

Vector Data ◽

Real World Data

The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to pattern denoising. Combining the geodesic projection to the spherical decision boundary resulting from the SVDD, together with solving the preimage problem, we propose a new method for pattern denoising. We first solve SVDD for the training data and then for each noisy test pattern, obtain its denoised feature by moving its feature vector along the geodesic on the manifold to the nearest decision boundary of the SVDD ball. Finally we find the location of the denoised pattern by obtaining the pre-image of the denoised feature. The applicability of the proposed method is illustrated by a number of toy and real-world data sets.

Get full-text (via PubEx)

INCREMENTAL DEVELOPMENT OF FAULT PREDICTION MODELS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194013500447 ◽

2013 ◽

Vol 23 (10) ◽

pp. 1399-1425 ◽

Cited By ~ 4

Author(s):

YUE JIANG ◽

BOJAN CUKIC ◽

TIM MENZIES ◽

JIE LIN

Keyword(s):

Life Cycle ◽

Prediction Models ◽

Statistical Significance ◽

Fault Prediction ◽

Training Data ◽

Data Sets ◽

Data Repositories ◽

Software Fault Prediction ◽

Incremental Development ◽

Code Metrics

The identification of fault-prone modules has a significant impact on software quality assurance. In addition to prediction accuracy, one of the most important goals is to detect fault prone modules as early as possible in the development lifecycle. Requirements, design, and code metrics have been successfully used for predicting fault-prone modules. In this paper, we investigate the benefits of the incremental development of software fault prediction models. We compare the performance of these models as the volume of data and their life cycle origin (design, code, or their combination) evolve during project development. We analyze 14 data sets from publicly available software engineering data repositories. These data sets offer both design and code metrics. Using a number of modeling techniques and statistical significance tests, we confirm that increasing the volume of training data improves model performance. Further models built from code metrics typically outperform those that are built using design metrics only. However, both types of models prove to be useful as they can be constructed in different phases of the life cycle. Code-based models can be used to increase the effectiveness of assigning verification and validation activities late in the development life cycle. We also conclude that models that utilize a combination of design and code level metrics outperform models which use either one metric set exclusively.

Get full-text (via PubEx)

A Fast Logdet Divergence Based Metric Learning Algorithm for Large Data Sets Classification

Abstract and Applied Analysis ◽

10.1155/2014/463981 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Jiangyuan Mei ◽

Jian Hou ◽

Jicheng Chen ◽

Hamid Reza Karimi

Keyword(s):

Learning Algorithm ◽

Metric Learning ◽

Large Data ◽

Feature Space ◽

Industrial Applications ◽

Large Data Sets ◽

Training Data ◽

High Dimensional ◽

Data Sets ◽

Benchmark Data

Large data sets classification is widely used in many industrial applications. It is a challenging task to classify large data sets efficiently, accurately, and robustly, as large data sets always contain numerous instances with high dimensional feature space. In order to deal with this problem, in this paper we present an online Logdet divergence based metric learning (LDML) model by making use of the powerfulness of metric learning. We firstly generate a Mahalanobis matrix via learning the training data with LDML model. Meanwhile, we propose a compressed representation for high dimensional Mahalanobis matrix to reduce the computation complexity in each iteration. The final Mahalanobis matrix obtained this way measures the distances between instances accurately and serves as the basis of classifiers, for example, thek-nearest neighbors classifier. Experiments on benchmark data sets demonstrate that the proposed algorithm compares favorably with the state-of-the-art methods.

Get full-text (via PubEx)

Context-Driven Proactive Decision Support for Hybrid Teams

AI Magazine ◽

10.1609/aimag.v40i3.4810 ◽

2019 ◽

Vol 40 (3) ◽

pp. 41-57

Author(s):

Manisha Mishra ◽

Pujitha Mannaru ◽

David Sidoti ◽

Adam Bienkowski ◽

Lingyi Zhang ◽

...

Keyword(s):

Decision Support ◽

Situational Awareness ◽

Feature Space ◽

Human Interaction ◽

Superior Performance ◽

Uncertain Environments ◽

Cognitive States ◽

Contextual Elements ◽

Human Operators ◽

Smart Machine

A synergy between AI and the Internet of Things (IoT) will significantly improve sense-making, situational awareness, proactivity, and collaboration. However, the key challenge is to identify the underlying context within which humans interact with smart machines. Knowledge of the context facilitates proactive allocation among members of a human–smart machine (agent) collective that balances autonomy with human interaction, without displacing humans from their supervisory role of ensuring that the system goals are achievable. In this article, we address four research questions as a means of advancing toward proactive autonomy: how to represent the interdependencies among the key elements of a hybrid team; how to rapidly identify and characterize critical contextual elements that require adaptation over time; how to allocate system tasks among machines and agents for superior performance; and how to enhance the performance of machine counterparts to provide intelligent and proactive courses of action while considering the cognitive states of human operators. The answers to these four questions help us to illustrate the integration of AI and IoT applied to the maritime domain, where we define context as an evolving multidimensional feature space for heterogeneous search, routing, and resource allocation in uncertain environments via proactive decision support systems.

Get full-text (via PubEx)