AN ALGORITHM FOR IMPROVING REPRESENTATIVENESS OF TRAINING DATA SETS IN HAZARD ZONATION OF DIFFERENT TYPES OF SIMULTANEOUS SLOPE FAILURES AND ITS PRACTICAL APPLICATION

Affective bio-feedback can be an important instrument to enhance the game experience. Several studies have provided evidence of the usefulness of physiological signals for affective gaming; however, due to the limited knowledge about the distinctive autonomic signatures for every emotion, the pattern matching models employed are limited in the number of emotions they are able to classify. This paper presents a bio-affective gaming interface (BAGI) that can be used to customize a game experience according to the player’s emotional response. Its architecture offers important characteristics for gaming that are important because they make possible the reusability of previous findings and the inclusion of new models to the system. In order to prove the effectiveness of BAGI, two different types of neural networks have been trained to recognize emotions. They were incorporated into the system to customize, in real-time, the computer wallpaper according to the emotion experienced by the user. Best results were obtained with a probabilistic neural network with accuracy results of 84.46% on the training data and 78.38% on the validation for new independent data sets.

Download Full-text

Machine learning for improved data analysis of biological aerosol using the WIBS

10.5194/amt-2018-126 ◽

2018 ◽

Cited By ~ 1

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Andrew P. Morse ◽

Martin W. Gallagher

Keyword(s):

Ultraviolet Light ◽

Fungal Spores ◽

Training Data ◽

Gradient Boosting ◽

Classification Error ◽

Data Sets ◽

Data Preparation ◽

Different Types ◽

Laboratory Training ◽

Selection Of

Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will provide different responses in the presence of ultraviolet light which potentially could be used to discriminate between different types of biological aerosol. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has made is possible to collect size, morphology and fluorescence measurements in real-time. However, it is unclear without studying responses from the instrument in the laboratory, the extent to which we can discriminate between different types of particles. Collection of laboratory data is vital to validate any approach used to analyse the data and to ensure that the data available is utilised as effectively as possible. In this manuscript we test a variety of methodologies on traditional reference particles and a range of laboratory generated aerosols. Hierarchical Agglomerative Clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between the reference particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a worryingly large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable attain consistent results across the different sets of laboratory generated aerosol tested. The best results were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to this error was the pollen samples where 28.5 % of the samples were misclassified as fungal spores. The technique was also robust to changes in data preparation provided a fluorescent threshold was applied to the data. Where laboratory training data is unavailable, DBSCAN was found to be an potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data was left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated however for the other data set where 26.8 % of the data was not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring different selection of parameters dependent on the preparation used. Further analysis will also be required to confirm our selection of parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely improve on current discrimination between pollen, bacteria and fungal spores and even between their different types, however the need for extensive laboratory training data sets will grow as a result.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

Intelligent system of training data sets for current reported normality levels and physical fitness analysis

Science & Sports ◽

10.1016/j.scispo.2020.10.001 ◽

2021 ◽

Author(s):

S.-Y. Jang ◽

W.-Y. So ◽

T.T. Jeong

Keyword(s):

Physical Fitness ◽

Intelligent System ◽

Training Data ◽

Data Sets ◽

Fitness Analysis

Download Full-text

Data-based structural health monitoring using small training data sets

Structural Control and Health Monitoring ◽

10.1002/stc.1744 ◽

2015 ◽

Vol 22 (10) ◽

pp. 1240-1264 ◽

Cited By ~ 21

Author(s):

Luciana Balsamo ◽

Raimondo Betti

Keyword(s):

Structural Health Monitoring ◽

Health Monitoring ◽

Training Data ◽

Data Sets ◽

Structural Health

Download Full-text

Application of the C4.5 Algorithm to Predict the Types of Disease in Pigs Based on Android

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p14 ◽

2021 ◽

Vol 10 (1) ◽

pp. 105

Author(s):

I Gusti Ayu Purnami Indryaswari ◽

Ida Bagus Made Mahendra

Keyword(s):

Programming Language ◽

Test Data ◽

Training Data ◽

Data Sets ◽

Android Application ◽

C4.5 Algorithm ◽

Sqlite Database

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.

Download Full-text

Development of Reliable NARX Models of Gas Turbine Cold, Warm and Hot Start-Up

Volume 9: Oil and Gas Applications; Supercritical CO2 Power Cycles; Wind Energy ◽

10.1115/gt2017-63332 ◽

2017 ◽

Cited By ~ 2

Author(s):

Hilal Bahlawan ◽

Mirko Morini ◽

Michele Pinelli ◽

Pier Ruggero Spina ◽

Mauro Venturini

Keyword(s):

Gas Turbine ◽

Training Data ◽

Series Data ◽

Data Sets ◽

Control Logic ◽

Start Up ◽

Hot Start ◽

Narx Models ◽

Set Up ◽

Rapid Transients

This paper documents the set-up and validation of nonlinear autoregressive exogenous (NARX) models of a heavy-duty single-shaft gas turbine. The considered gas turbine is a General Electric PG 9351FA located in Italy. The data used for model training are time series data sets of several different maneuvers taken experimentally during the start-up procedure and refer to cold, warm and hot start-up. The trained NARX models are used to predict other experimental data sets and comparisons are made among the outputs of the models and the corresponding measured data. Therefore, this paper addresses the challenge of setting up robust and reliable NARX models, by means of a sound selection of training data sets and a sensitivity analysis on the number of neurons. Moreover, a new performance function for the training process is defined to weigh more the most rapid transients. The final aim of this paper is the set-up of a powerful, easy-to-build and very accurate simulation tool which can be used for both control logic tuning and gas turbine diagnostics, characterized by good generalization capability.

Download Full-text