scholarly journals An Enhanced Hidden Semi-Markov model for Outlier Detection in Multivariate Datasets

Author(s):  
G Manoharan ◽  
K Sivakumar

Outlier detection in data mining is an important arena where detection models are developed to discover the objects that do not confirm the expected behavior. The generation of huge data in real time applications makes the outlier detection process into more crucial and challenging. Traditional detection techniques based on mean and covariance are not suitable to handle large amount of data and the results are affected by outliers. So it is essential to develop an efficient outlier detection model to detect outliers in the large dataset. The objective of this research work is to develop an efficient outlier detection model for multivariate data employing the enhanced Hidden Semi-Markov Model (HSMM). It is an extension of conventional Hidden Markov Model (HMM) where the proposed model allows arbitrary time distribution in its states to detect outliers. Experimental results demonstrate the better performance of proposed model in terms of detection accuracy, detection rate. Compared to conventional Hidden Markov Model based outlier detection the detection accuracy of proposed model is obtained as 98.62% which is significantly better for large multivariate datasets.

2021 ◽  
Vol 15 (4) ◽  
pp. 18-30
Author(s):  
Om Prakash Samantray ◽  
Satya Narayan Tripathy

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.


2019 ◽  
Vol 16 (5) ◽  
pp. 172988141987679
Author(s):  
Kohjiro Hashimoto ◽  
Tetsuyasu Yamada ◽  
Takeshi Tsuchiya ◽  
Kae Doki ◽  
Yuki Funabora ◽  
...  

With increase in the number of elderly people in the Japanese society, traffic accidents caused by elderly driver is considered problematic. The primary factor of the traffic accidents is a reduction in their driving cognitive performance. Therefore, a system that supports the cognitive performance of drivers can greatly contribute in preventing accidents. Recently, the development of devices for visually providing information, such as smart glasses or head up display, is in progress. These devices can provide more effective supporting information for cognitive performance. In this article, we focus on the selection problem of information to be presented for drivers to realize the cognitive support system. It has been reported that the presentation of excessive information to a driver reduces the judgment ability of the driver and makes the information less trustworthy. Thus, indiscriminate presentation of information in the vision of the driver is not an effective cognitive support. Therefore, a mechanism for determining the information to be presented to the driver based on the current driving situation is required. In this study, the object that contributes to execution of avoidance driving operation is regarded as the object that drivers must recognize and present for drivers. This object is called as contributing object. In this article, we propose a method that selects contributing objects among the appeared objects on the current driving scene. The proposed method expresses the relation between the time series change of an appeared object and avoidance operation of the driver by a mathematical model. This model can predict execution timing of avoidance driving operation and estimate contributing object based on the prediction result of driving operation. This model named as contributing model consisted of multi-hidden Markov models. Hidden Markov model is time series probabilistic model with high readability. This is because that model parameters express the probabilistic distribution and its statistics. Therefore, the characteristics of contributing model are that it enables the designer to understand the basis for the output decision. In this article, we evaluated detection accuracy of contributing object based on the proposed method, and readability of contributing model through several experiments. According to the results of these experiments, high detection accuracy of contributing object was confirmed. Moreover, it was confirmed that the basis of detected contributing object judgment can be understood from contributing model.


2011 ◽  
Vol 63-64 ◽  
pp. 178-181
Author(s):  
Hong Zhi Liu ◽  
Li Gao

A new method of Quality Control for Information Engineering Surveillance based on Hidden Markov Model (HMM) has been proposed and the related model been built by us. The process of information engineering quality surveillance can be seen as a two-layered random process. The five elements of HMM correspond with the process of quality surveillance through abstracting the characteristics of the surveillance process. Software quality can be estimated under the model. In this paper, we divided the five elements. Therefore, the model was improved from single dimension to multi-dimension, trained by Baum-Welch algorithm. Experimental results show that the proposed model proves to be feasible and real-time when it is used for quality control.


Data Mining is a method for detecting network intrusion detection in networks. It brings ideas from variety of areas including statistics, machine learning and database processes. Decreasing price of digital networking is now economically viable for network intrusion detection. This analysis chiefly examines the system intrusion detection with machine learning and DM methods. To improve the accuracy and efficiency of SHMM, we are collecting multiple observation in SHMM that will be called as Multiple Hidden Markov Model (MHMM). It is used to improve better Detection accuracy compare with SHMM. In the standard Hidden Markov Model, we have observed three fundamental problems are Evaluation and decoding another one is learning problem. The Evaluation problem can be used for word recognition. And the Decoding problem is related to constant attention and also the segmentation. In this Proposed Research, the primary purpose is to model the sequence of observation in Network log and credit card log transactions process using Enhanced Hidden Markov Model (EHMM). And show how it can be used for intrusion detection in Network. In this procedure, an EHMM is primarily trained with the conventional manners of a intruders. If the trained EHMM does not recognize an incoming Intruder transaction with adequately high probability, it is thought to be fraudulent.


2018 ◽  
Vol 7 (2.32) ◽  
pp. 153
Author(s):  
N Arunachalam ◽  
P Prabavathy ◽  
S Priyatharshini

Credit card fake detection has raised unique challenges due to the streaming, imbalanced, and non-stationary nature of the data that has been transacted. It had additionally included an active learning step, since the labeling (fake or genuine) use of a subset on transactions is obtained in near-real time through human investigators contacted the cardholders. In this paper, the Hidden Markov Model (HMM) algorithm has been used for sequence of Credit card operations for transaction processing and the fake can be detected by using the fake detection model during transaction processing. HMM, Fake detection model and image process had played an imperative role in the detection of credit card fake in online transactions. In fake detection, most challenging is a data problem, due to two major reasons – first, the profiles of cardholders are normal and fake lent behaviors changed constantly and secondly, credit card fake data sets are highly changed its position. Using fake detection (FD) algorithm the performance of detection in credit card transactions had highly affected by the sampling approach on dataset, selection of HMM, Fake detection model. Using fake detection (FD) algorithm an image technique had been used. A reliable augmentation of the target scarce population of fakes are  important considering issues such as labeling cost; algorithm HMM, fake detection and outlines in the data streamed source. We have approached several scenarios which showed the feasibility of improving detection capabilities evaluated by means of receiver operating characteristic (ROC) curves and several key performance indicators (KPI) commonly used in financial business.  


2014 ◽  
Vol 93 (18) ◽  
pp. 26-31 ◽  
Author(s):  
Hemlata Sukhwani ◽  
Vikas Sharma ◽  
Sanjay Sharma

Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2670 ◽  
Author(s):  
Yan Li ◽  
Fan Wang ◽  
Hui Ke ◽  
Li-li Wang ◽  
Cheng-cheng Xu

Lane changing is considered as one of the most dangerous driving behaviors because drivers have to deal with the traffic conflicts on both the current and target lanes. This study aimed to propose a method of predicting the driving risks during the lane-changing process using drivers’ physiology measurement data and vehicle dynamic data. All the data used in the proposed model were obtained by portable sensors with the capability of recording data in the actual driving process. A hidden Markov model (HMM) was proposed to link driving risk with drivers’ physiology information and vehicle dynamic data. The two-factor indicators were established to evaluate the performances of eye movement, heart rate variability, and vehicle dynamic parameters on driving risk. The standard deviation of normal to normal R–R intervals of the heart rate (SDNN), fixation duration, saccade range, and average speed were then selected as the input of the HMM. The HMM was trained and tested using field-observed data collected in Xi’an City. The proposed model using the data from the physiology measurement sensor can identify dangerous driving state from normal driving state and predict the transition probability between these two states. The results match the perceptions of the tested drivers with an accuracy rate of 90.67%. The proposed model can be used to develop proactive crash prevention strategies.


2021 ◽  
Vol 13 (10) ◽  
pp. 5391
Author(s):  
Yinsheng Yang ◽  
Gang Yuan ◽  
Jiaxiang Cai ◽  
Silin Wei

Disassembly waste generation forecasting is the foundation for determining disassembly waste treatment and process formulation and is also an important prerequisite for optimizing waste management. The prediction of disassembly waste generation is a complex process which is affected by potential time, environment, and economy characteristic variables. Uncertainty features, such as disassembly amount, disassembly component status, and workshop scheduling, play an important role in predicting the fluctuation of disassembly waste generation. We therefore focus on revealing the trend of waste generation in disassembly remanufacturing that faces significant influences of technology and economic changes to achieve circular industry sustainable development. To dynamically predict the generation of disassembly waste under uncertainty, this work proposes a statistical method driven by a probabilistic model, which integrates the digital twinning, Gaussian mixture, and the hidden Markov model (DG-HMM). First, digital twinning technology is used for real-time data interaction between simulation prediction and decision evaluation. Then, the Gaussian mixture and HMM are used to dynamically predict the generation of disassembly waste. In order to effectively predict the amount of disassembly waste generation, real data collected from a disassembly enterprise are used to train and verify the model. Finally, the proposed model is compared with other general prediction models to illustrate the correctness and feasibility of the proposed model. The comparison results show that DG-HMM has better prediction accuracy for the actual disassembly waste generation.


Sign in / Sign up

Export Citation Format

Share Document