Robust Mahalanobis-distance based spatial outlier detection on discrete GNSS velocity fields

Author(s):  
Balint Magyar ◽  
Ambrus Kenyeres ◽  
Sandor Toth ◽  
Istvan Hajdu

<p>The GNSS velocity field filtering topic can be identified as a multi-dimensional unsupervised spatial outlier detection problem. In the discussed case, we jointly interpreted the horizontal and vertical velocity fields and its uncertainties as a six dimensional space. To detect and classify the spatial outliers, we performed an orthogonal linear transformation technique called Principal Component Analysis (PCA) to dynamically project the data to a lower dimensional subspace, while redacting the most (~99%) of the explained variance of the input data.</p><p>Therefore, the resulting component space can be seen as an attribute function, which describes the investigated deformation patterns. Then we constructed two subspace mapping functions, respectively the k-nearest neighbor (k-NN) and median based neighbor function with Haversine metric, and the samplewise comparison function which compares the samples with the properties of its k-NN environment. Consequently, the resulting comparison function scores highlights the significantly different observations as outliers. Assuming that the data comes from Multivariate Gaussian Distribution (MVD), we evaluated the corresponding Mahalanobis-distance with the estimation of the robust covariance matrix of the investigated area. Then, as the main result of the Robust Mahalanobis-distance (RMD) based approach, we implemented the binary classification via the p-value and critical Mahalanobis-distance thresholding.</p><p>Compared to the formerly investigated and applied One-Class Support Vector machine (OCSVM) approach, the RMD based solution gives <em>~ 17%</em> more accurate results of the European scaled velocity field filtering (like EPN D1933), as well as it corrects the ambiguities and non-desired features (like overfitting) of the former OCSVM approach.</p><p>The results will be also presented as an interactive web page of the velocity fields of the latest version of EPN D2050 filtered with the introduced RMD approach.</p>

Author(s):  
Wenjuan An ◽  
Mangui Liang ◽  
He Liu

Outlier detection, as a type of one-class classification problem, is one of important research topics in data mining and machine learning. Its task is to identify sample points markedly deviating from the normal data. A reliable outlier detector needs to build a model which encloses the normal data tightly. In this paper, an improved one-class SVM (OC-SVM) classifier is proposed for outlier detection problems. We name this method OC-SVM with minimum within-class scatter (OC-WCSSVM), which exploits the inner-class structure of the training set via minimizing the within-class scatter of the training data. This can construct a more accurate hyperplane for outlier detection, such that the margin between the training data and the origin in a higher dimensional space is as large as possible, while at the same time the decision boundary around the normal data is as tight as possible. Experimental results on a synthetic dataset and 10 real-world datasets demonstrate that our proposed OC-WCSSVM algorithm is effective and superior to the compared algorithms.


2021 ◽  
Vol 38 (2) ◽  
pp. 261-268
Author(s):  
Oana-Diana Hrisca-Eva ◽  
Anca Mihaela Lazar

The purpose of this research is to evaluate the performances of some features extraction methods and classification algorithms for the electroencephalographic (EEG) signals recorded in a motor task imagery paradigm. The sessions were performed by the same subject in eight consecutive years. Modeling the EEG signal as an autoregressive process (by means of Itakura distance and symmetric Itakura distance), amplitude modulation (using the amplitude modulation energy index) and phase synchronization (measuring phase locking value, phase lag index and weighted phase lag index) are the methods used for getting the appropriate information. The extracted features are classified using linear discriminant analysis, quadratic discriminant analysis, Mahalanobis distance, support vector machine and k nearest neighbor classifiers. The highest classifications rates are achieved when Itakura distance with Mahalanobis distance based classifier are applied. The outcomes of this research may improve the design of assistive devices for restoration of movement and communication strength for physically disabled patients in order to rehabilitate their lost motor abilities and to improve the quality of their daily life.


Author(s):  
S. Vijaya Rani ◽  
G. N. K. Suresh Babu

The illegal hackers  penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods  available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.


2019 ◽  
Vol 20 (5) ◽  
pp. 488-500 ◽  
Author(s):  
Yan Hu ◽  
Yi Lu ◽  
Shuo Wang ◽  
Mengying Zhang ◽  
Xiaosheng Qu ◽  
...  

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world&#039;s highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. </P><P> Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. </P><P> Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. </P><P> Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Polymers ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1205
Author(s):  
Ruiqi Wang ◽  
Riqiang Duan ◽  
Haijun Jia

This publication focuses on the experimental validation of film models by comparing constructed and experimental velocity fields based on model and elementary experimental data. The film experiment covers Kapitza numbers Ka = 278.8 and Ka = 4538.6, a Reynolds number range of 1.6–52, and disturbance frequencies of 0, 2, 5, and 7 Hz. Compared to previous publications, the applied methodology has boundary identification procedures that are more refined and provide additional adaptive particle image velocimetry (PIV) method access to synthetic particle images. The experimental method was validated with a comparison with experimental particle image velocimetry and planar laser induced fluorescence (PIV/PLIF) results, Nusselt’s theoretical prediction, and experimental particle tracking velocimetry (PTV) results of flat steady cases, and a good continuity equation reproduction of transient cases proves the method’s fidelity. The velocity fields are reconstructed based on different film flow model velocity profile assumptions such as experimental film thickness, flow rates, and their derivatives, providing a validation method of film model by comparison between reconstructed velocity experimental data and experimental velocity data. The comparison results show that the first-order weighted residual model (WRM) and regularized model (RM) are very similar, although they may fail to predict the velocity field in rapidly changing zones such as the front of the main hump and the first capillary wave troughs.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Aaron Frederick Bulagang ◽  
James Mountstephens ◽  
Jason Teo

Abstract Background Emotion prediction is a method that recognizes the human emotion derived from the subject’s psychological data. The problem in question is the limited use of heart rate (HR) as the prediction feature through the use of common classifiers such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF) in emotion prediction. This paper aims to investigate whether HR signals can be utilized to classify four-class emotions using the emotion model from Russell’s in a virtual reality (VR) environment using machine learning. Method An experiment was conducted using the Empatica E4 wristband to acquire the participant’s HR, a VR headset as the display device for participants to view the 360° emotional videos, and the Empatica E4 real-time application was used during the experiment to extract and process the participant's recorded heart rate. Findings For intra-subject classification, all three classifiers SVM, KNN, and RF achieved 100% as the highest accuracy while inter-subject classification achieved 46.7% for SVM, 42.9% for KNN and 43.3% for RF. Conclusion The results demonstrate the potential of SVM, KNN and RF classifiers to classify HR as a feature to be used in emotion prediction in four distinct emotion classes in a virtual reality environment. The potential applications include interactive gaming, affective entertainment, and VR health rehabilitation.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Jun Meng ◽  
Qiang Kang ◽  
Zheng Chang ◽  
Yushi Luan

Abstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.


Sign in / Sign up

Export Citation Format

Share Document