scholarly journals Machine-Learning Classifiers for Malware Detection Using Data Features

2021 ◽  
Vol 15 (3) ◽  
pp. 265-290
Author(s):  
Saleh Abdulaziz Habtor ◽  
Ahmed Haidarah Hasan Dahah

The spread of ransomware has risen exponentially over the past decade, causing huge financial damage to multiple organizations. Various anti-ransomware firms have suggested methods for preventing malware threats. The growing pace, scale and sophistication of malware provide the anti-malware industry with more challenges. Recent literature indicates that academics and anti-virus organizations have begun to use artificial learning as well as fundamental modeling techniques for the research and identification of malware. Orthodox signature-based anti-virus programs struggle to identify unfamiliar malware and track new forms of malware. In this study, a malware evaluation framework focused on machine learning was adopted that consists of several modules: dataset compiling in two separate classes (malicious and benign software), file disassembly, data processing, decision making, and updated malware identification. The data processing module uses grey images, functions for importing and Opcode n-gram to remove malware functionality. The decision making module detects malware and recognizes suspected malware. Different classifiers were considered in the research methodology for the detection and classification of malware. Its effectiveness was validated on the basis of the accuracy of the complete process.

2016 ◽  
Vol 57 ◽  
pp. 117-126 ◽  
Author(s):  
Abinash Tripathy ◽  
Ankit Agrawal ◽  
Santanu Kumar Rath

Author(s):  
Nicholas A Bokulich ◽  
Benjamin D Kaehler ◽  
Jai Ram Rideout ◽  
Matthew Dillon ◽  
Evan Bolyen ◽  
...  

Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed the accuracy of existing methods for marker-gene amplicon sequence classification. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods of VSEARCH, BLAST+, and SortMeRNA) for classification of marker-gene amplicon sequence data. Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.


Author(s):  
Nicholas A Bokulich ◽  
Benjamin D Kaehler ◽  
Jai Ram Rideout ◽  
Matthew Dillon ◽  
Evan Bolyen ◽  
...  

Background. Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results. We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed classification accuracy of existing methods. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, BLAST+, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and VSEARCH and SortMeRNA alignment-based methods). Conclusions. Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make explicit recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2408-2411

Sales forecasting is widely recognized and plays a major role in an organization’s decision making. It is an integral part in business execution of retail giants, so that they can change their strategy to improve sales in the near future. This helps in better management of their resources like machine, money and manpower. Forecasting the sales will help in managing the revenue and inventory accordingly. This paper proposes a model that can forecast most profitable segments at granular level. As most retail giants have many branches in different locations, consolidation of sales are hard using data mining. Instead using machine learning model helps in getting reliable and accurate results. This paper helps in understanding the sales trend to monitor or predict future applicable on different types of sales patterns and products to produce accurate prediction results.


Author(s):  
Aman Paul ◽  
Daljeet Singh

Data mining is a technique that finds relationships and trends in large datasets to promote decision support. Classification is a data mining technique that maps data into predefined classes often referred as supervised learning because classes are determined before examining data. Different classification algorithms have been proposed for the effective classification of data. Among others, Weka is an open-source data mining software with which classification can be achieved. It is also well suited for developing new machine learning schemes. It allows users to quickly compare different machine learning methods on new datasets. It has several graphical user interfaces that enable easy access to the underlying functionality. CBA is a data mining tool which not only produces an accurate classifier for prediction, but it is also able to mine various forms of association rules. It has better classification accuracy and faster mining speed. It can build accurate classifiers from relational data and mine association rules from relational data and transactional data. CBA also has many other features like cross validation for evaluating classifiers and allows the user to view and to query the discovered rules.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
H Lea ◽  
E Hutchinson ◽  
A Meeson ◽  
S Nampally ◽  
G Dennis ◽  
...  

Abstract Background and introduction Accurate identification of clinical outcome events is critical to obtaining reliable results in cardiovascular outcomes trials (CVOTs). Current processes for event adjudication are expensive and hampered by delays. As part of a larger project to more reliably identify outcomes, we evaluated the use of machine learning to automate event adjudication using data from the SOCRATES trial (NCT01994720), a large randomized trial comparing ticagrelor and aspirin in reducing risk of major cardiovascular events after acute ischemic stroke or transient ischemic attack (TIA). Purpose We studied whether machine learning algorithms could replicate the outcome of the expert adjudication process for clinical events of ischemic stroke and TIA. Could classification models be trained on historical CVOT data and demonstrate performance comparable to human adjudicators? Methods Using data from the SOCRATES trial, multiple machine learning algorithms were tested using grid search and cross validation. Models tested included Support Vector Machines, Random Forest and XGBoost. Performance was assessed on a validation subset of the adjudication data not used for training or testing in model development. Metrics used to evaluate model performance were Receiver Operating Characteristic (ROC), Matthews Correlation Coefficient, Precision and Recall. The contribution of features, attributes of data used by the algorithm as it is trained to classify an event, that contributed to a classification were examined using both Mutual Information and Recursive Feature Elimination. Results Classification models were trained on historical CVOT data using adjudicator consensus decision as the ground truth. Best performance was observed on models trained to classify ischemic stroke (ROC 0.95) and TIA (ROC 0.97). Top ranked features that contributed to classification of Ischemic Stroke or TIA corresponded to site investigator decision or variables used to define the event in the trial charter, such as duration of symptoms. Model performance was comparable across the different machine learning algorithms tested with XGBoost demonstrating the best ROC on the validation set for correctly classifying both stroke and TIA. Conclusions Our results indicate that machine learning may augment or even replace clinician adjudication in clinical trials, with potential to gain efficiencies, speed up clinical development, and retain reliability. Our current models demonstrate good performance at binary classification of ischemic stroke and TIA within a single CVOT with high consistency and accuracy between automated and clinician adjudication. Further work will focus on harmonizing features between multiple historical clinical trials and training models to classify several different endpoint events across trials. Our aim is to utilize these clinical trial datasets to optimize the delivery of CVOTs in further cardiovascular drug development. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): AstraZenca Plc


Author(s):  
V. Fartukov ◽  
N. Hanov

A tree of data analysis for the formation and preprocessing, storage and protection of data based on Big Data and Blockchain technologies has been developed. The developed algorithm allows for the classification of data on the state of the field, split testing of data, forecasting and machine learning for the implementation of differential irrigation with sprinklers.


Scientifica ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Marion Olubunmi Adebiyi ◽  
Roseline Oluwaseun Ogundokun ◽  
Aneoghena Amarachi Abokhai

E-agriculture is the integration of technology and digital mechanisms into agricultural processes for more efficient output. This study provided a machine learning–aided mobile system for farmland optimization, using various inputs such as location, crop type, soil type, soil pH, and spacing. Random forest algorithm and BigML were employed to analyze and classify datasets containing crop features that generated subclasses based on random crop feature parameters. The subclasses were further grouped into three main classes to match the crops using data from the companion crops. The study concluded that the approach aided decision making and also assisted in the design of a mobile application using Appery.io. This Appery.io then took in some user input parameters, thereby offering various optimization sets. It was also deduced that the system led to users’ optimization of information when implemented on their farmlands.


2015 ◽  
Vol 16 (2) ◽  
pp. 350
Author(s):  
MD. Hussain Khan ◽  
G. Pradeepini

<p>Phone is a device which provides communication between the people through voice, text, video etc. Now a day’s people may leave without food but not without using phones. No of operating systems are working with various versions and various security issues are working. Security is very important task in Mobiles and mobile apps. To improve the security status of mobiles, existing methodology is using cloud computing and data mining. Out traditional method is named as MobSafe to identify the mobile apps antagonism or graciousness. In the proposed system, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF).In this paper, our proposed system works on machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.</p>


Sign in / Sign up

Export Citation Format

Share Document