Feature Extraction of Sequence of Keystrokes in Fixed Text Using the Multivariate Hawkes Process

In this paper, we propose a new method of extracting the features of keystrokes. The Hawkes process based on exponential excitation kernel was used to model the sequence of keystrokes in fixed text, and the intensity function vector and adjacency matrix of the model obtained through training were regarded as the characteristics of the keystrokes. A visual analysis was carried out on the CMU keystroke raw data and the feature data extracted using the proposed method. We used one-class classifier to compare the classification effect of CMU keystroke raw data and the feature data extracted by the Hawkes process model and POHMM model. The experimental results show that the feature data extracted using the proposed method contains rich information to distinguish users. In addition, the feature data extracted using the proposed method has a slightly better classification performance than the original CMU keystroke data for some users who are not easy to distinguish.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Data Quality Visual Analysis (DQVA) A tool to process and pinspot raw data irregularities

2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC) ◽

10.1109/ccwc51732.2021.9375961 ◽

2021 ◽

Author(s):

Celio Carvalho ◽

Rui S. Moreira ◽

Jose Manuel Torres

Keyword(s):

Data Quality ◽

Visual Analysis ◽

Raw Data

Download Full-text

Predicting Water Pipe Failures with a Recurrent Neural Hawkes Process Model

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9282941 ◽

2020 ◽

Author(s):

Jeroen Verheugd ◽

Paulo R de Oliveira da Costa ◽

Reza Refaei Afshar ◽

Yingqian Zhang ◽

Sjoerd Boersma

Keyword(s):

Process Model ◽

Hawkes Process ◽

Water Pipe

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

AR-Tri-Training: Tri-Training with Assistant Strategy

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1840 ◽

2014 ◽

Vol 513-517 ◽

pp. 1840-1844 ◽

Cited By ~ 1

Author(s):

Long Jie Cui ◽

Hong Li Wang ◽

Rong Yi Cui

Keyword(s):

Learning Strategy ◽

Voice Recognition ◽

Classification Performance ◽

Experimental Results ◽

Training Algorithm ◽

Information Strategy ◽

Testing Rate ◽

Rich Information ◽

Validation Set

The classification performance of the classifier is weakened because the noise samples are introduced for the use of unlabeled samples in Tri-training. In this paper a new Tri-training style algorithm named AR-Tri-training (Tri-training with assistant and rich strategy) is proposed. Firstly, the assistant learning strategy is posed. Then the supporting learner is designed by combining the assistant learning strategy with rich information strategy. The number of mislabeled samples produced in the iterations of three classifiers mutually labeling are reduced by use of the supporting learner, moreover the unlabeled samples and the misclassified samples of validation set can be fully used. The proposed algorithm is applied to voice recognition. The experimental results show that AR-Tri-training algorithm can compensate for the shortcomings of Tri-training algorithm, further improve the testing rate.

Download Full-text

Information Overload: How Technology Can Help Convert Raw Data into Rich Information for Transitional Justice Processes

International Journal of Transitional Justice ◽

10.1093/ijtj/ijy029 ◽

2018 ◽

Vol 13 (1) ◽

pp. 71-91 ◽

Cited By ~ 1

Author(s):

Daniela Gavshon ◽

Erol Gorur

Keyword(s):

Transitional Justice ◽

Information Overload ◽

Raw Data ◽

Rich Information

Download Full-text

SEMANTIC SEGMENTATION OF BENTHIC COMMUNITIES FROM ORTHO-MOSAIC MAPS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w10-151-2019 ◽

2019 ◽

Vol XLII-2/W10 ◽

pp. 151-158 ◽

Cited By ~ 5

Author(s):

G. Pavoni ◽

M. Corsini ◽

M. Callieri ◽

M. Palma ◽

R. Scopigno

Keyword(s):

Visual Analysis ◽

Marine Organism ◽

Benthic Communities ◽

Semantic Segmentation ◽

Classification Performance ◽

Training Dataset ◽

Non Invasive ◽

Visual Sampling ◽

Organism Identification

Abstract. Visual sampling techniques represent a valuable resource for a rapid, non-invasive data acquisition for underwater monitoring purposes. Long-term monitoring projects usually requires the collection of large quantities of data, and the visual analysis of a human expert operator remains, in this context, a very time consuming task. It has been estimated that only the 1-2% of the acquired images are later analyzed by scientists (Beijbom et al., 2012). Strategies for the automatic recognition of benthic communities are required to effectively exploit all the information contained in visual data. Supervised learning methods, the most promising classification techniques in this field, are commonly affected by two recurring issues: the wide diversity of marine organism, and the small amount of labeled data. In this work, we discuss the advantages offered by the use of annotated high resolution ortho-mosaics of seabed to classify and segment the investigated specimens, and we suggest several strategies to obtain a considerable per-pixel classification performance although the use of a reduced training dataset composed by a single ortho-mosaic. The proposed methodology can be applied to a large number of different species, making the procedure of marine organism identification an highly adaptable task.

Download Full-text

Efficient pan-cancer whole-slide image classification and outlier detection using convolutional neural networks

10.1101/633123 ◽

2019 ◽

Cited By ~ 1

Author(s):

Seda Bilaloglu ◽

Joyce Wu ◽

Eduardo Fierro ◽

Raul Delgado Sanchez ◽

Paolo Santiago Ocampo ◽

...

Keyword(s):

Visual Analysis ◽

Classification Problem ◽

Classification Performance ◽

Neoplastic Tissue ◽

Multiple Tumor ◽

Slide Image ◽

Prediction Systems ◽

Multi Class Classification ◽

The Many ◽

Whole Slide Images

AbstractVisual analysis of solid tissue mounted on glass slides is currently the primary method used by pathologists for determining the stage, type and subtypes of cancer. Although whole slide images are usually large (10s to 100s thousands pixels wide), an exhaustive though time-consuming assessment is necessary to reduce the risk of misdiagnosis. In an effort to address the many diagnostic challenges faced by trained experts, recent research has been focused on developing automatic prediction systems for this multi-class classification problem. Typically, complex convolutional neural network (CNN) architectures, such as Google’s Inception, are used to tackle this problem. Here, we introduce a greatly simplified CNN architecture, PathCNN, which allows for more efficient use of computational resources and better classification performance. Using this improved architecture, we trained simultaneously on whole-slide images from multiple tumor sites and corresponding non-neoplastic tissue. Dimensionality reduction analysis of the weights of the last layer of the network capture groups of images that faithfully represent the different types of cancer, highlighting at the same time differences in staining and capturing outliers, artifacts and misclassification errors. Our code is available online at: https://github.com/sedab/PathCNN.

Download Full-text

Multivariate Hawkes process model of market participants behavior in the high frequency world

International Journal of Financial Engineering ◽

10.1142/s2424786320500541 ◽

2021 ◽

Vol 08 (01) ◽

pp. 2050054

Author(s):

Sugato Chakravarty ◽

Kiseop Lee ◽

Yang Xi

Keyword(s):

Public Sector ◽

High Frequency ◽

Process Model ◽

Hawkes Process ◽

Transaction Data ◽

The Public ◽

Market Participants ◽

Public Sector Banks ◽

Average Return ◽

Causality Relationship

We propose a multivariate Hawkes process to model the interaction between the non-high frequency traders (NHFTs) behavior (Buy and sell) and high frequency traders (HFTs) behavior (Buy and sell). We apply our model to the intraday transaction data of the public sector banks stock in India, which is sampled from March 2012 to June 2012. We find that the mutually-exciting NHFT and HFT behaviors benefit the stocks, which have better average return above the average return of the public sector bank index. We further identify the granger causality relationship for mutually exciting dominating stocks that HFTs activities cause the activities of NHFTs. In other words, NHFTs are market followers in those stocks.

Download Full-text

Independent Analysis of Decelerations and Resting Periods through CEEMDAN and Spectral-Based Feature Extraction Improves Cardiotocographic Assessment

Applied Sciences ◽

10.3390/app9245421 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5421 ◽

Cited By ~ 2

Author(s):

Patricio Fuentealba ◽

Alfredo Illanes ◽

Frank Ortmeier

Keyword(s):

Feature Extraction ◽

Uterine Contraction ◽

Fetal Monitoring ◽

Ensemble Empirical Mode Decomposition ◽

Classification Performance ◽

Compensatory Mechanisms ◽

Main Hypothesis ◽

Mode Decomposition ◽

Independent Analysis ◽

Adaptive Noise

Fetal monitoring is commonly based on the joint recording of the fetal heart rate (FHR) and uterine contraction signals obtained with a cardiotocograph (CTG). Unfortunately, CTG analysis is difficult, and the interpretation problems are mainly associated with the analysis of FHR decelerations. From that perspective, several approaches have been proposed to improve its analysis; however, the results obtained are not satisfactory enough for their implementation in clinical practice. Current clinical research indicates that a correct CTG assessment requires a good understanding of the fetal compensatory mechanisms. In previous works, we have shown that the complete ensemble empirical mode decomposition with adaptive noise, in combination with time-varying autoregressive modeling, may be useful for the analysis of those characteristics. In this work, based on this methodology, we propose to analyze the FHR deceleration episodes separately. The main hypothesis is that the proposed feature extraction strategy applied separately to the complete signal, deceleration episodes, and resting periods (between contractions), improves the CTG classification performance compared with the analysis of only the complete signal. Results reveal that by considering the complete signal, the classification performance achieved 81.7% quality. Then, including information extracted from resting periods, it improved to 83.2%.

Download Full-text