Machine-Learning Classifiers for Malware Detection Using Data Features

Saleh Abdulaziz Habtor; Ahmed Haidarah Hasan Dahah

doi:10.5614/itbj.ict.res.appl.2021.15.3.5

Machine-Learning Classifiers for Malware Detection Using Data Features

Journal of ICT Research and Applications ◽

10.5614/itbj.ict.res.appl.2021.15.3.5 ◽

2021 ◽

Vol 15 (3) ◽

pp. 265-290

Author(s):

Saleh Abdulaziz Habtor ◽

Ahmed Haidarah Hasan Dahah

Keyword(s):

Machine Learning ◽

Decision Making ◽

Data Processing ◽

Evaluation Framework ◽

Artificial Learning ◽

Processing Module ◽

N Gram ◽

Using Data ◽

Complete Process

The spread of ransomware has risen exponentially over the past decade, causing huge financial damage to multiple organizations. Various anti-ransomware firms have suggested methods for preventing malware threats. The growing pace, scale and sophistication of malware provide the anti-malware industry with more challenges. Recent literature indicates that academics and anti-virus organizations have begun to use artificial learning as well as fundamental modeling techniques for the research and identification of malware. Orthodox signature-based anti-virus programs struggle to identify unfamiliar malware and track new forms of malware. In this study, a malware evaluation framework focused on machine learning was adopted that consists of several modules: dataset compiling in two separate classes (malicious and benign software), file disassembly, data processing, decision making, and updated malware identification. The data processing module uses grey images, functions for importing and Opcode n-gram to remove malware functionality. The decision making module detects malware and recognizes suspected malware. Different classifiers were considered in the research methodology for the detection and classification of malware. Its effectiveness was validated on the basis of the accuracy of the complete process.

Download Full-text

Classification of sentiment reviews using n-gram machine learning approach

Expert Systems with Applications ◽

10.1016/j.eswa.2016.03.028 ◽

2016 ◽

Vol 57 ◽

pp. 117-126 ◽

Cited By ~ 189

Author(s):

Abinash Tripathy ◽

Ankit Agrawal ◽

Santanu Kumar Rath

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Machine Learning Approach ◽

N Gram

Download Full-text

Optimizing taxonomic classification of marker gene amplicon sequences

10.7287/peerj.preprints.3208v2 ◽

2018 ◽

Cited By ~ 4

Author(s):

Nicholas A Bokulich ◽

Benjamin D Kaehler ◽

Jai Ram Rideout ◽

Matthew Dillon ◽

Evan Bolyen ◽

...

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Marker Gene ◽

Parameter Tuning ◽

Operating Conditions ◽

Evaluation Framework ◽

Taxonomic Classification ◽

Consensus Methods ◽

Learning Classifier

Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed the accuracy of existing methods for marker-gene amplicon sequence classification. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods of VSEARCH, BLAST+, and SortMeRNA) for classification of marker-gene amplicon sequence data. Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.

Download Full-text

Optimizing taxonomic classification of marker gene sequences

10.7287/peerj.preprints.3208v1 ◽

2017 ◽

Cited By ~ 4

Author(s):

Nicholas A Bokulich ◽

Benjamin D Kaehler ◽

Jai Ram Rideout ◽

Matthew Dillon ◽

Evan Bolyen ◽

...

Keyword(s):

Machine Learning ◽

Marker Gene ◽

Parameter Tuning ◽

Operating Conditions ◽

Evaluation Framework ◽

Taxonomic Classification ◽

Gene Sequences ◽

Learning Classifier ◽

Classifier Performance

Background. Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results. We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed classification accuracy of existing methods. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, BLAST+, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and VSEARCH and SortMeRNA alignment-based methods). Conclusions. Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make explicit recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.

Download Full-text

Retail Giant Sales Forecasting using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1277.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 2408-2411

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Learning Model ◽

Accurate Prediction ◽

Machine Learning Model ◽

Different Types ◽

Using Data ◽

Near Future ◽

Sales Trend

Sales forecasting is widely recognized and plays a major role in an organization’s decision making. It is an integral part in business execution of retail giants, so that they can change their strategy to improve sales in the near future. This helps in better management of their resources like machine, money and manpower. Forecasting the sales will help in managing the revenue and inventory accordingly. This paper proposes a model that can forecast most profitable segments at granular level. As most retail giants have many branches in different locations, consolidation of sales are hard using data mining. Instead using machine learning model helps in getting reliable and accurate results. This paper helps in understanding the sales trend to monitor or predict future applicable on different types of sales patterns and products to produce accurate prediction results.

Download Full-text

Prediction of Blood Donors Population Using Data Mining Classification Technique

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217519 ◽

2021 ◽

pp. 25-27

Author(s):

Aman Paul ◽

Daljeet Singh

Keyword(s):

Machine Learning ◽

Data Mining ◽

Association Rules ◽

User Interfaces ◽

Relational Data ◽

Easy Access ◽

Data Mining Technique ◽

Open Source Data ◽

Using Data

Data mining is a technique that finds relationships and trends in large datasets to promote decision support. Classification is a data mining technique that maps data into predefined classes often referred as supervised learning because classes are determined before examining data. Different classification algorithms have been proposed for the effective classification of data. Among others, Weka is an open-source data mining software with which classification can be achieved. It is also well suited for developing new machine learning schemes. It allows users to quickly compare different machine learning methods on new datasets. It has several graphical user interfaces that enable easy access to the underlying functionality. CBA is a data mining tool which not only produces an accurate classifier for prediction, but it is also able to mine various forms of association rules. It has better classification accuracy and faster mining speed. It can build accurate classifiers from relational data and mine association rules from relational data and transactional data. CBA also has many other features like cross validation for evaluating classifiers and allows the user to view and to query the discovered rules.

Download Full-text

Can machine learning augment clinician adjudication of events in cardiovascular trials? A case study of major adverse cardiovascular events (MACE) across CVRM trials

European Heart Journal ◽

10.1093/eurheartj/ehab724.3061 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

H Lea ◽

E Hutchinson ◽

A Meeson ◽

S Nampally ◽

G Dennis ◽

...

Keyword(s):

Machine Learning ◽

Clinical Trials ◽

Ischemic Stroke ◽

Cardiovascular Events ◽

Learning Algorithms ◽

Model Performance ◽

Machine Learning Algorithms ◽

Classification Models ◽

Using Data

Abstract Background and introduction Accurate identification of clinical outcome events is critical to obtaining reliable results in cardiovascular outcomes trials (CVOTs). Current processes for event adjudication are expensive and hampered by delays. As part of a larger project to more reliably identify outcomes, we evaluated the use of machine learning to automate event adjudication using data from the SOCRATES trial (NCT01994720), a large randomized trial comparing ticagrelor and aspirin in reducing risk of major cardiovascular events after acute ischemic stroke or transient ischemic attack (TIA). Purpose We studied whether machine learning algorithms could replicate the outcome of the expert adjudication process for clinical events of ischemic stroke and TIA. Could classification models be trained on historical CVOT data and demonstrate performance comparable to human adjudicators? Methods Using data from the SOCRATES trial, multiple machine learning algorithms were tested using grid search and cross validation. Models tested included Support Vector Machines, Random Forest and XGBoost. Performance was assessed on a validation subset of the adjudication data not used for training or testing in model development. Metrics used to evaluate model performance were Receiver Operating Characteristic (ROC), Matthews Correlation Coefficient, Precision and Recall. The contribution of features, attributes of data used by the algorithm as it is trained to classify an event, that contributed to a classification were examined using both Mutual Information and Recursive Feature Elimination. Results Classification models were trained on historical CVOT data using adjudicator consensus decision as the ground truth. Best performance was observed on models trained to classify ischemic stroke (ROC 0.95) and TIA (ROC 0.97). Top ranked features that contributed to classification of Ischemic Stroke or TIA corresponded to site investigator decision or variables used to define the event in the trial charter, such as duration of symptoms. Model performance was comparable across the different machine learning algorithms tested with XGBoost demonstrating the best ROC on the validation set for correctly classifying both stroke and TIA. Conclusions Our results indicate that machine learning may augment or even replace clinician adjudication in clinical trials, with potential to gain efficiencies, speed up clinical development, and retain reliability. Our current models demonstrate good performance at binary classification of ischemic stroke and TIA within a single CVOT with high consistency and accuracy between automated and clinician adjudication. Further work will focus on harmonizing features between multiple historical clinical trials and training models to classify several different endpoint events across trials. Our aim is to utilize these clinical trial datasets to optimize the delivery of CVOTs in further cardiovascular drug development. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): AstraZenca Plc

Download Full-text

LOCALIZATION OF SOIL MONITORING DATA PROCESSING

EurasianUnionScientists ◽

10.31618/esu.2413-9335.2021.1.86.1351 ◽

2021 ◽

pp. 22-25

Author(s):

V. Fartukov ◽

N. Hanov

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Processing ◽

The State ◽

Monitoring Data ◽

Soil Monitoring ◽

Monitoring Data Processing ◽

State Of The Field

A tree of data analysis for the formation and preprocessing, storage and protection of data based on Big Data and Blockchain technologies has been developed. The developed algorithm allows for the classification of data on the state of the field, split testing of data, forecasting and machine learning for the implementation of differential irrigation with sprinklers.

Download Full-text

Machine Learning–Based Predictive Farmland Optimization and Crop Monitoring System

Scientifica ◽

10.1155/2020/9428281 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Marion Olubunmi Adebiyi ◽

Roseline Oluwaseun Ogundokun ◽

Aneoghena Amarachi Abokhai

Keyword(s):

Machine Learning ◽

Decision Making ◽

Mobile Application ◽

Mobile System ◽

Random Forest Algorithm ◽

User Input ◽

Crop Monitoring ◽

Integration Of Technology ◽

Crop Type ◽

Using Data

E-agriculture is the integration of technology and digital mechanisms into agricultural processes for more efficient output. This study provided a machine learning–aided mobile system for farmland optimization, using various inputs such as location, crop type, soil type, soil pH, and spacing. Random forest algorithm and BigML were employed to analyze and classify datasets containing crop features that generated subclasses based on random crop feature parameters. The subclasses were further grouped into three main classes to match the crops using data from the companion crops. The study concluded that the approach aided decision making and also assisted in the design of a mobile application using Appery.io. This Appery.io then took in some user input parameters, thereby offering various optimization sets. It was also deduced that the system led to users’ optimization of information when implemented on their farmlands.

Download Full-text

Machine Learning Based Automotive Forensic Analysis for Mobile Applications Using Data Mining

TELKOMNIKA Indonesian Journal of Electrical Engineering ◽

10.11591/tijee.v16i2.1623 ◽

2015 ◽

Vol 16 (2) ◽

pp. 350

Author(s):

MD. Hussain Khan ◽

G. Pradeepini

Keyword(s):

Machine Learning ◽

Data Mining ◽

Mobile Apps ◽

Forensic Analysis ◽

Evaluation Framework ◽

Security Evaluation ◽

Analysis Framework ◽

Security Issues ◽

The People ◽

Using Data

<p>Phone is a device which provides communication between the people through voice, text, video etc. Now a day’s people may leave without food but not without using phones. No of operating systems are working with various versions and various security issues are working. Security is very important task in Mobiles and mobile apps. To improve the security status of mobiles, existing methodology is using cloud computing and data mining. Out traditional method is named as MobSafe to identify the mobile apps antagonism or graciousness. In the proposed system, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF).In this paper, our proposed system works on machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.</p>

Download Full-text

Semantic N-Gram Feature Analysis and Machine Learning–Based Classification of Drivers’ Hazardous Actions at Signal-Controlled Intersections

Journal of Computing in Civil Engineering ◽

10.1061/(asce)cp.1943-5487.0000895 ◽

2020 ◽

Vol 34 (4) ◽

pp. 04020015 ◽

Cited By ~ 1

Author(s):

Keneth Morgan Kwayu ◽

Valerian Kwigizile ◽

Jiansong Zhang ◽

Jun-Seok Oh

Keyword(s):

Machine Learning ◽

Feature Analysis ◽

N Gram

Download Full-text