High Performance Classification Model to Identify Ransomware Payments for Heterogeneous Bitcoin Networks

The Bitcoin cryptocurrency is a worldwide prevalent virtualized digital currency conceptualized in 2008 as a distributed transactions system. Bitcoin transactions make use of peer-to-peer network nodes without a third-party intermediary, and the transactions can be verified by the node. Although Bitcoin networks have exhibited high efficiency in the financial transaction systems, their payment transactions are vulnerable to several ransomware attacks. For that reason, investigators have been working on developing ransomware payment identification techniques for bitcoin transactions’ networks to prevent such harmful cyberattacks. In this paper, we propose a high performance Bitcoin transaction predictive system that investigates the Bitcoin payment transactions to learn data patterns that can recognize and classify ransomware payments for heterogeneous bitcoin networks. Specifically, our system makes use of two supervised machine learning methods to learn the distinguishing patterns in Bitcoin payment transactions, namely, shallow neural networks (SNN) and optimizable decision trees (ODT). To validate the effectiveness of our solution approach, we evaluate our machine learning based predictive models on a recent Bitcoin transactions dataset in terms of classification accuracy as a key performance indicator and other key evaluation metrics such as the confusion matrix, positive predictive value, true positive rate, and the corresponding prediction errors. As a result, our superlative experimental result was registered to the model-based decision trees scoring 99.9% and 99.4% classification detection (two-class classifier) and accuracy (multiclass classifier), respectively. Hence, the obtained model accuracy results are superior as they surpassed many state-of-the-art models developed to identify ransomware payments in bitcoin transactions.

Download Full-text

Genetic Programming as Supervised Machine Learning Algorithm

Optimized Genetic Programming Applications - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-5225-6005-0.ch002 ◽

2018 ◽

pp. 48-101

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

High Performance ◽

Learning Algorithm ◽

Confusion Matrix ◽

Computer Programs ◽

Multiclass Classification ◽

Supervised Machine Learning ◽

Learning Problems ◽

Machine Learning Algorithm

This chapter presents the theory and procedures behind supervised machine learning and how genetic programming can be applied to be an effective machine learning algorithm. Due to simple and powerful concept of computer programs, genetic programming can solve many supervised machine learning problems, especially regression and classifications. The chapter starts with theory of supervised machine learning by describing the three main groups of modelling: regression, binary, and multiclass classification. Through those kinds of modelling, the most important performance parameters and skill scores are introduced. The chapter also describes procedures of the model evaluation and construction of confusion matrix for binary and multiclass classification. The second part describes in detail how to use genetic programming in order to build high performance GP models for regression and classifications. It also describes the procedure of generating computer programs for binary and multiclass calcification problems by introducing the concept of predefined root node.

Download Full-text

A Scalable Machine Learning Pipeline for Paddy Rice Classification Using Multi-Temporal Sentinel Data

Remote Sensing ◽

10.3390/rs13091769 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1769

Author(s):

Vasileios Sitokonstantinou ◽

Alkiviadis Koukos ◽

Thanassis Drivas ◽

Charalampos Kontoes ◽

Ioannis Papoutsis ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

High Performance ◽

Large Scale ◽

Paddy Rice ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Rice Area ◽

Multi Temporal

The demand for rice production in Asia is expected to increase by 70% in the next 30 years, which makes evident the need for a balanced productivity and effective food security management at a national and continental level. Consequently, the timely and accurate mapping of paddy rice extent and its productivity assessment is of utmost significance. In turn, this requires continuous area monitoring and large scale mapping, at the parcel level, through the processing of big satellite data of high spatial resolution. This work designs and implements a paddy rice mapping pipeline in South Korea that is based on a time-series of Sentinel-1 and Sentinel-2 data for the year of 2018. There are two challenges that we address; the first one is the ability of our model to manage big satellite data and scale for a nationwide application. The second one is the algorithm’s capacity to cope with scarce labeled data to train supervised machine learning algorithms. Specifically, we implement an approach that combines unsupervised and supervised learning. First, we generate pseudo-labels for rice classification from a single site (Seosan-Dangjin) by using a dynamic k-means clustering approach. The pseudo-labels are then used to train a Random Forest (RF) classifier that is fine-tuned to generalize in two other sites (Haenam and Cheorwon). The optimized model was then tested against 40 labeled plots, evenly distributed across the country. The paddy rice mapping pipeline is scalable as it has been deployed in a High Performance Data Analytics (HPDA) environment using distributed implementations for both k-means and RF classifiers. When tested across the country, our model provided an overall accuracy of 96.69% and a kappa coefficient 0.87. Even more, the accurate paddy rice area mapping was returned early in the year (late July), which is key for timely decision-making. Finally, the performance of the generalized paddy rice classification model, when applied in the sites of Haenam and Cheorwon, was compared to the performance of two equivalent models that were trained with locally sampled labels. The results were comparable and highlighted the success of the model’s generalization and its applicability to other regions.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm

Applied Sciences ◽

10.3390/app11156728 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6728

Author(s):

Muhammad Asfand Hafeez ◽

Muhammad Rashid ◽

Hassan Tariq ◽

Zain Ul Abideen ◽

Saud S. Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Tabu Search ◽

Decision Tree ◽

Decision Trees ◽

Search Algorithm ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Tabu Search Algorithm

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

Analysis of Decision Tree Induction Algorithms

Research Society and Development ◽

10.33448/rsd-v8i11.1473 ◽

2019 ◽

Vol 8 (11) ◽

pp. e298111473

Author(s):

Hugo Kenji Rodrigues Okada ◽

Andre Ricardo Nascimento das Neves ◽

Ricardo Shitsuka

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Decision Trees ◽

Quantitative Study ◽

Data Structures ◽

Execution Time ◽

Supervised Machine Learning ◽

Decision Tree Induction ◽

Classification And Regression ◽

Cart Algorithm

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.

Download Full-text

An Interpretable Machine Learning Model Enhanced Integrated CPU-GPU DVFS Governor

ACM Transactions on Embedded Computing Systems ◽

10.1145/3470974 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-28

Author(s):

Jurn-Gyu Park ◽

Nikil Dutt ◽

Sung-Soo Lim

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Linear Models ◽

Piecewise Linear ◽

Prediction Errors ◽

Linear Regression Models ◽

Mobile Games ◽

Interpretable Machine Learning ◽

Mathematical Formulas

Modern heterogeneous CPU-GPU-based mobile architectures, which execute intensive mobile gaming/graphics applications, use software governors to achieve high performance with energy-efficiency. However, existing governors typically utilize simple statistical or heuristic models, assuming linear relationships using a small unbalanced dataset of mobile games; and the limitations result in high prediction errors for dynamic and diverse gaming workloads on heterogeneous platforms. To overcome these limitations, we propose an interpretable machine learning (ML) model enhanced integrated CPU-GPU governor: (1) It builds tree-based piecewise linear models (i.e., model trees) offline considering both high accuracy (low error) and interpretable ML models based on mathematical formulas using a simulatability operation counts quantitative metric. And then (2) it deploys the selected models for online estimation into an integrated CPU-GPU Dynamic Voltage Frequency Scaling governor. Our experiments on a test set of 20 mobile games exhibiting diverse characteristics show that our governor achieved significant energy efficiency gains of over 10% (up to 38%) improvements on average in energy-per-frame with a surprising-but-modest 3% improvement in Frames-per-Second performance, compared to a typical state-of-the-art governor that employs simple linear regression models.

Download Full-text

Machine Learning-Based Coding Decision Making in H.265/HEVC CTU Division and Intra Prediction

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2020040103 ◽

2020 ◽

Vol 11 (2) ◽

pp. 41-60

Author(s):

Wenchan Jiang ◽

Ming Yang ◽

Ying Xie ◽

Zhigang Li

Keyword(s):

Machine Learning ◽

Video Coding ◽

High Efficiency ◽

Main Tool ◽

Decision Time ◽

Mode Decision ◽

Supervised Machine Learning ◽

High Efficiency Video Coding ◽

Neuron Network ◽

Prediction Mode

High efficiency video coding (HEVC) has been deemed as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. In this research project, in compliance with H.265 standard, the authors focused on improving the performance of encode/decode by optimizing the partition of prediction block in coding unit with the help of supervised machine learning. The authors used Keras library as the main tool to implement the experiments. Key parameters were tuned for the model in the convolution neuron network. The coding tree unit mode decision time produced in the model was compared with that produced in the reference software for HEVC, and it was proven to have improved significantly. The intra-picture prediction mode decision was also investigated with modified model and yielded satisfactory results.

Download Full-text

2107. Decision Trees vs. Neural Networks for Supervised Machine Learning-Based Prediction of Healthcare-Associated Urinary Tract Infections

Open Forum Infectious Diseases ◽

10.1093/ofid/ofy210.1763 ◽

2018 ◽

Vol 5 (suppl_1) ◽

pp. S618-S618

Author(s):

Philip Zachariah ◽

Elioth Mirsha Sanabria Buenaventura ◽

Jianfang Liu ◽

Bevin Cohen ◽

David Yao ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Urinary Tract ◽

Decision Trees ◽

Urinary Tract Infections ◽

Supervised Machine Learning ◽

Tract Infections ◽

Healthcare Associated

Download Full-text

The prototype device for non-invasive diagnosis of arteriovenous fistula condition using machine learning methods

Scientific Reports ◽

10.1038/s41598-020-72336-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Marcin Grochowina ◽

Lucyna Leniowska ◽

Agnieszka Gala-Błądzińska

Keyword(s):

Machine Learning ◽

Arteriovenous Fistula ◽

Low Cost ◽

Principal Component ◽

Special Kind ◽

Classification Model ◽

Supervised Machine Learning ◽

Signal Acquisition ◽

Non Invasive ◽

Prototype Device

Abstract Pattern recognition and automatic decision support methods provide significant advantages in the area of health protection. The aim of this work is to develop a low-cost tool for monitoring arteriovenous fistula (AVF) with the use of phono-angiography method. This article presents a developed and diagnostic device that implements classification algorithms to identify 38 patients with end stage renal disease, chronically hemodialysed using an AVF, at risk of vascular access stenosis. We report on the design, fabrication, and preliminary testing of a prototype device for non-invasive diagnosis which is very important for hemodialysed patients. The system includes three sub-modules: AVF signal acquisition, information processing and classification and a unit for presenting results. This is a non-invasive and inexpensive procedure for evaluating the sound pattern of bruit produced by AVF. With a special kind of head which has a greater sensitivity than conventional stethoscope, a sound signal from fistula was recorded. The proces of signal acquisition was performed by a dedicated software, written specifically for the purpose of our study. From the obtained phono-angiogram, 23 features were isolated for vectors used in a decision-making algorithm, including 6 features based on the waveform of time domain, and 17 features based on the frequency spectrum. Final definition of the feature vector composition was obtained by using several selection methods: the feature-class correlation, forward search, Principal Component Analysis and Joined-Pairs method. The supervised machine learning technique was then applied to develop the best classification model.

Download Full-text