Malware Variant Identification Using Incremental Clustering

Dynamic analysis and pattern matching techniques are widely used in industry, and they provide a straightforward method for the identification of malware samples. Yara is a pattern matching technique that can use sandbox memory dumps for the identification of malware families. However, pattern matching techniques fail silently due to minor code variations, leading to unidentified malware samples. This paper presents a two-layered Malware Variant Identification using Incremental Clustering (MVIIC) process and proposes clustering of unidentified malware samples to enable the identification of malware variants and new malware families. The novel incremental clustering algorithm is used in the identification of new malware variants from the unidentified malware samples. This research shows that clustering can provide a higher level of performance than Yara rules, and that clustering is resistant to small changes introduced by malware variants. This paper proposes a hybrid approach, using Yara scanning to eliminate known malware, followed by clustering, acting in concert, to allow the identification of new malware variants. F1 score and V-Measure clustering metrics are used to evaluate our results.

Download Full-text

Secured packet inspection with hierarchical pattern matching implemented using incremental clustering algorithm

2014 International Conference on High Performance Computing and Applications (ICHPCA) ◽

10.1109/ichpca.2014.7045309 ◽

2014 ◽

Author(s):

Purna Chandra Sethi ◽

Prafulla Kumar Behera

Keyword(s):

Pattern Matching ◽

Clustering Algorithm ◽

Incremental Clustering ◽

Packet Inspection ◽

Hierarchical Pattern

Download Full-text

Suspended Sediment Modeling Using a Heuristic Regression Method Hybridized with Kmeans Clustering

Sustainability ◽

10.3390/su13094648 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4648

Author(s):

Rana Muhammad Adnan ◽

Kulwinder Singh Parmar ◽

Salim Heddam ◽

Shamsuddin Shahid ◽

Ozgur Kisi

Keyword(s):

Clustering Algorithm ◽

Suspended Sediments ◽

Ecological Impacts ◽

Accurate Estimation ◽

Data Sets ◽

The Novel ◽

Neuro Fuzzy ◽

Adaptive Regression ◽

Novel Method ◽

Model Training

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.

Download Full-text

On the Semi-incremental Clustering Algorithm based on Kalman Filter and Bayes Approach

2019 Chinese Control Conference (CCC) ◽

10.23919/chicc.2019.8866599 ◽

2019 ◽

Author(s):

Wei Zhou ◽

Daqing Zhang

Keyword(s):

Kalman Filter ◽

Clustering Algorithm ◽

Incremental Clustering ◽

Bayes Approach

Download Full-text

Planning a holistic summative eHealth evaluation in an interdisciplinary and multi-national setting: a case study and propositions for guideline development

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01399-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Monika Jurkeviciute ◽

Amia Enam ◽

Johanna Torres-Bonilla ◽

Henrik Eriksson

Keyword(s):

Pattern Matching ◽

Quality Evaluation ◽

Empirical Process ◽

Guideline Development ◽

Evaluation Process ◽

Feasibility Analysis ◽

National Team ◽

Matching Technique ◽

Interdisciplinary Setting

Abstract Background Summative eHealth evaluations frequently lack quality, which affects the generalizability of the evidence, and its use in practice and further research. To guarantee quality, a number of activities are recommended in the guidelines for evaluation planning. This study aimed to examine a case of an eHealth evaluation planning in a multi-national and interdisciplinary setting and to provide recommendations for eHealth evaluation planning guidelines. Methods An empirical eHealth evaluation process was developed through a case study. The empirical process was compared with selected guidelines for eHealth evaluation planning using a pattern-matching technique. Results Planning in the interdisciplinary and multi-national team demanded extensive negotiation and alignment to support the future use of the evidence created. The evaluation planning guidelines did not provide specific strategies for different set-ups of the evaluation teams. Further, they did not address important aspects of quality evaluation, such as feasibility analysis of the outcome measures and data collection, monitoring of data quality, and consideration of the methods and measures employed in similar evaluations. Conclusions Activities to prevent quality problems need to be incorporated in the guidelines for evaluation planning. Additionally, evaluators could benefit from guidance in evaluation planning related to the different set-ups of the evaluation teams.

Download Full-text

The Novel Improved Hybrid Clustering Algorithm of Particle Swarm and K-Means Considering Applications

2021 6th International Conference on Communication and Electronics Systems (ICCES) ◽

10.1109/icces51350.2021.9489040 ◽

2021 ◽

Author(s):

Shuying Liu

Keyword(s):

Clustering Algorithm ◽

Particle Swarm ◽

The Novel ◽

Hybrid Clustering

Download Full-text

Iconic indexing using generalized pattern matching techniques

Computer Vision Graphics and Image Processing ◽

10.1016/0734-189x(86)90007-1 ◽

1986 ◽

Vol 35 (3) ◽

pp. 383-403 ◽

Cited By ~ 16

Author(s):

William I. Grosky ◽

Yi Lu

Keyword(s):

Pattern Matching ◽

Generalized Pattern Matching ◽

Matching Techniques

Download Full-text

A Hybrid Deep Clustering Approach for Robust Cell Type Profiling Using Single-cell RNA-seq Data

10.1101/511626 ◽

2019 ◽

Cited By ~ 2

Author(s):

Suhas Srinivasan ◽

Nathan T. Johnson ◽

Dmitry Korkin

Keyword(s):

Deep Learning ◽

Single Cell ◽

Clustering Algorithm ◽

Hybrid Approach ◽

Feature Learning ◽

Specific Cell ◽

Clustering Methods ◽

Model Based Clustering ◽

Clustering And Classification ◽

Living Organisms

AbstractSingle-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. It routinely uses machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amounts of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. Here, we develop a new hybrid approach, Deep Unsupervised Single-cell Clustering (DUSC), that integrates feature generation based on a deep learning architecture with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. The method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.

Download Full-text