An Entropy-Based Index Evaluation Scheme for Multiple Sensor Fusion in Classification Process

1999 ◽  
Vol 121 (4) ◽  
pp. 727-732 ◽  
Author(s):  
Y. Chen ◽  
E. Orady

Sensor fusion aims to identify useful information to facilitate decision-making using data from multiple sensors. Signals from each sensor are usually processed, through feature extraction, into different indices by which knowledge can be better represented. However, cautions should be placed in decision-making when multiple indices are used, since each index may carry different information or different aspects of the knowledge for the process/system under study. To this end, a practical scheme for index evaluation based on entropy and information gain is presented. This procedure is useful when index ranking is needed in designing a classifier for a complex system or process. Both regional entropy and class entropy are introduced based on a set of training data. Application of this scheme is illustrated by using a data set for a tapping process.

2019 ◽  
Vol 12 (1) ◽  
pp. 106 ◽  
Author(s):  
Romulus Costache ◽  
Quoc Bao Pham ◽  
Ehsan Sharifi ◽  
Nguyen Thi Thuy Linh ◽  
S.I. Abba ◽  
...  

Concerning the significant increase in the negative effects of flash-floods worldwide, the main goal of this research is to evaluate the power of the Analytical Hierarchy Process (AHP), fi (kNN), K-Star (KS) algorithms and their ensembles in flash-flood susceptibility mapping. To train the two stand-alone models and their ensembles, for the first stage, the areas affected in the past by torrential phenomena are identified using remote sensing techniques. Approximately 70% of these areas are used as a training data set along with 10 flash-flood predictors. It should be remarked that the remote sensing techniques play a crucial role in obtaining eight out of 10 flash-flood conditioning factors. The predictive capability of predictors is evaluated through the Information Gain Ratio (IGR) method. As expected, the slope angle results in the factor with the highest predictive capability. The application of the AHP model implies the construction of ten pair-wise comparison matrices for calculating the normalized weights of each flash-flood predictor. The computed weights are used as input data in kNN–AHP and KS–AHP ensemble models for calculating the Flash-Flood Potential Index (FFPI). The FFPI also is determined through kNN and KS stand-alone models. The performance of the models is evaluated using statistical metrics (i.e., sensitivity, specificity and accuracy) while the validation of the results is done by constructing the Receiver Operating Characteristics (ROC) Curve and Area Under Curve (AUC) values and by calculating the density of torrential pixels within FFPI classes. Overall, the best performance is obtained by the kNN–AHP ensemble model.


2010 ◽  
Vol 6 (3) ◽  
pp. 28-42 ◽  
Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.


2020 ◽  
Vol 16 ◽  
Author(s):  
Yifan Ying ◽  
Yongxi Jin ◽  
Xianchuan Wang ◽  
Jianshe Ma ◽  
Min Zeng ◽  
...  

Introduction: Hydrogen sulfide (H2S) is a lethal environmental and industrial poison. The mortality rate of occupational acute H2S poisoning reported in China is 23.1% ~ 50%. Due to the huge amount of information on metabolomics changes after body poisoning, it is important to use intelligent algorithms to mine multivariate interactions. Methods: This paper first uses GC-MS metabolomics to detect changes in the urine components of the poisoned group and control rats to form a metabolic data set, and then uses the SVM classification algorithm in machine learning to train the hydrogen sulfide poisoning training data set to obtain a classification recognition model. A batch of rats (n = 15) was randomly selected and exposed to 20 ppm H2S gas for 40 days (twice morning and evening, 1 hour each exposure) to prepare a chronic H2S rat poisoning model. The other rats (n = 15) were exposed to the same volume of air and 0 ppm hydrogen sulfide gas as the control group. The treated urine samples were tested using a GC-MS. Results: The method locates the optimal parameters of SVM, which improves the accuracy of SVM classification to 100%. This paper uses the information gain attribute evaluation method to screen out the top 6 biomarkers that contribute to the predicted category (Glycerol,β-Hydroxybutyric acid, arabinofuranose,Pentitol,L-Tyrosine,L-Proline). Conclusion: The SVM diagnostic model of hydrogen sulfide poisoning constructed in this work has training time and prediction accuracy; it has achieved excellent results and provided an intelligent decision-making method for the diagnosis of hydrogen sulfide poisoning.


2018 ◽  
Vol 24 (3) ◽  
Author(s):  
VALENTIN STOYANOV ◽  
IVAYLO STOYANOV ◽  
TEODOR ILIEV

<p>Modeling of solar radiation with neural network could be used for real-time calculations of the radiation on tilted surfaces with different orientations. In the artificial neural network (ANN), latitude, day of the year, slope, surface azimuth and average daily radiation on horizontal surface are inputs, and average daily radiation on tilted surface of definite orientation is output. The possible ANN structure, the size of training data set, the number of hidden neurons, and the type of training algorithms were analyzed in order to identify the most appropriate model. The same ANN structure was trained and tested using data generated from the Klein and Theilacker model and long-term measurements. Reasonable accuracy was obtained for all predictions for practical need.</p>


Author(s):  
H. Sheikhian ◽  
M. R. Delavar ◽  
A. Stein

Uncertainty is one of the main concerns in geospatial data analysis. It affects different parts of decision making based on such data. In this paper, a new methodology to handle uncertainty for multi-criteria decision making problems is proposed. It integrates hierarchical rough granulation and rule extraction to build an accurate classifier. Rough granulation provides information granules with a detailed quality assessment. The granules are the basis for the rule extraction in granular computing, which applies quality measures on the rules to obtain the best set of classification rules. The proposed methodology is applied to assess seismic physical vulnerability in Tehran. Six effective criteria reflecting building age, height and material, topographic slope and earthquake intensity of the North Tehran fault have been tested. The criteria were discretized and the data set was granulated using a hierarchical rough method, where the best describing granules are determined according to the quality measures. The granules are fed into the granular computing algorithm resulting in classification rules that provide the highest prediction quality. This detailed uncertainty management resulted in 84% accuracy in prediction in a training data set. It was applied next to the whole study area to obtain the seismic vulnerability map of Tehran. A sensitivity analysis proved that earthquake intensity is the most effective criterion in the seismic vulnerability assessment of Tehran.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1384
Author(s):  
Yuyu Yuan ◽  
Wen Wen ◽  
Jincui Yang

In algorithmic trading, adequate training data set is key to making profits. However, stock trading data in units of a day can not meet the great demand for reinforcement learning. To address this problem, we proposed a framework named data augmentation based reinforcement learning (DARL) which uses minute-candle data (open, high, low, close) to train the agent. The agent is then used to guide daily stock trading. In this way, we can increase the instances of data available for training in hundreds of folds, which can substantially improve the reinforcement learning effect. But not all stocks are suitable for this kind of trading. Therefore, we propose an access mechanism based on skewness and kurtosis to select stocks that can be traded properly using this algorithm. In our experiment, we find proximal policy optimization (PPO) is the most stable algorithm to achieve high risk-adjusted returns. Deep Q-learning (DQN) and soft actor critic (SAC) can beat the market in Sharp Ratio.


Author(s):  
Delisman Laia ◽  
Efori Buulolo ◽  
Matias Julyus Fika Sirait

PT. Go-Jek Indonesia is a service company. Go-jek online is a technology-based motorcycle taxi service that leads the transportation industry revolution. Predictions on ordering go-jek drivers using data mining algorithms are used to solve problems faced by the company PT. Go-Jek Indonesia to predict the level of ordering of online go-to drivers. In determining the crowded and lonely time. The proposed method is Naive Bayes. Naive Bayes algorithm aims to classify data in certain classes. The purpose of this study is to look at the prediction patterns of each of the attributes contained in the data set by using the naive algorithm and testing the training data on testing data to see whether the data pattern is good or not. what will be predicted is to collect the data of the previous driver ordering, which is based on the day, time for one month. The Naive Bayes algorithm is used to predict the ordering of online go-to-go drivers that will be experienced every day by seeing each order such as morning, afternoon and evening. The results of this study are to make it easier for the company to analyze the data of each go-jek driver booking in taking policies to ensure that both drivers and consumers or customers.Keywords: Go-jek Driver, Data Mining, Naive Bayes


Author(s):  
Alina Köchling ◽  
Shirin Riazy ◽  
Marius Claus Wehner ◽  
Katharina Simbeck

AbstractThe study aims to identify whether algorithmic decision making leads to unfair (i.e., unequal) treatment of certain protected groups in the recruitment context. Firms increasingly implement algorithmic decision making to save costs and increase efficiency. Moreover, algorithmic decision making is considered to be fairer than human decisions due to social prejudices. Recent publications, however, imply that the fairness of algorithmic decision making is not necessarily given. Therefore, to investigate this further, highly accurate algorithms were used to analyze a pre-existing data set of 10,000 video clips of individuals in self-presentation settings. The analysis shows that the under-representation concerning gender and ethnicity in the training data set leads to an unpredictable overestimation and/or underestimation of the likelihood of inviting representatives of these groups to a job interview. Furthermore, algorithms replicate the existing inequalities in the data set. Firms have to be careful when implementing algorithmic video analysis during recruitment as biases occur if the underlying training data set is unbalanced.


2014 ◽  
Vol 11 (2) ◽  
Author(s):  
Pavol Král’ ◽  
Lukáš Sobíšek ◽  
Mária Stachová

Data quality can be seen as a very important factor for the validity of information extracted from data sets using statistical or data mining procedures. In the paper we propose a description of data quality allowing us to characterize data quality of the whole data set, as well as data quality of particular variables and individual cases. On the basis of the proposed description, we define a distance based measure of data quality for individual cases as a distance of the cases from the ideal one. Such a measure can be used as additional information for preparation of a training data set, fitting models, decision making based on results of analyses etc. It can be utilized in different ways ranging from a simple weighting function to belief functions.


Sign in / Sign up

Export Citation Format

Share Document