Detecting Illicit Entities in Bitcoin using Supervised Learning of Ensemble Decision Trees

2019 ◽

pp. 176-190

Author(s):

Maryna Nehrey ◽

Taras Hnot

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Unsupervised Learning ◽

Decision Trees ◽

Supervised Learning ◽

Data Science ◽

Business Processes ◽

Business Decision ◽

Logistic Regression Models ◽

Using Data

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.

Download Full-text

Data Science Tools Application for Business Processes Modelling in Aviation

Research Anthology on Reliability and Safety in Aviation Systems, Spacecraft, and Air Transport ◽

10.4018/978-1-7998-5357-2.ch024 ◽

2021 ◽

pp. 617-631

Author(s):

Maryna Nehrey ◽

Taras Hnot

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Unsupervised Learning ◽

Decision Trees ◽

Supervised Learning ◽

Data Science ◽

Business Processes ◽

Business Decision ◽

Logistic Regression Models ◽

Using Data

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.

Download Full-text

Weakly supervised learning with decision trees applied to fisheries acoustics.

2010 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2010.5495855 ◽

2010 ◽

Cited By ~ 1

Author(s):

R. Lefort ◽

R. Fablet ◽

J-M. Boucher

Keyword(s):

Decision Trees ◽

Supervised Learning ◽

Weakly Supervised Learning ◽

Fisheries Acoustics ◽

Weakly Supervised

Download Full-text

On Speed of Stochastic CART Model Search

Austrian Journal of Statistics ◽

10.17713/ajs.v36i1.318 ◽

2016 ◽

Vol 36 (1) ◽

Author(s):

Márton Ispány ◽

Ilona Krasznahorkay

Keyword(s):

Decision Trees ◽

Supervised Learning ◽

Mixing Time ◽

Bayesian Framework ◽

Stochastic Search ◽

Predictor Variables ◽

Mcmc Method ◽

Model Search ◽

Cart Model

Decision trees have proved to be commonly used nonlinear tools for supervised learning. This technique is a way to divide the space of the predictor variables into bricks in order to achieve as homogeneous partitions as possible. We improved the CART method proposed by Breiman et al. (1984) using a stochastic search, first suggested by Chipman et al. (1998) in the Bayesian framework. In this paper estimates are given for the rate ofconvergence and the mixing time of the MCMC method defined on decision trees.

Download Full-text

Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

Atmospheric Measurement Techniques ◽

10.5194/amt-10-695-2017 ◽

2017 ◽

Vol 10 (2) ◽

pp. 695-708 ◽

Cited By ~ 25

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Paul H. Kaye ◽

Warren R. Stanley ◽

...

Keyword(s):

Neural Networks ◽

Decision Trees ◽

Supervised Learning ◽

Ensemble Methods ◽

Gradient Boosting ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Shape Information ◽

Accuracy Of Measurements

Abstract. Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.

Download Full-text

Security in Mission Critical Communication Systems

Advances in Wireless Technologies and Telecommunication - Multimedia Services and Applications in Mission Critical Communication Systems ◽

10.4018/978-1-5225-2113-6.ch012 ◽

2017 ◽

pp. 270-291 ◽

Cited By ~ 3

Author(s):

Karen Medhat ◽

Rabie A. Ramadan ◽

Ihab Talkhan

Keyword(s):

Wireless Communications ◽

Decision Tree ◽

Unsupervised Learning ◽

Decision Trees ◽

Supervised Learning ◽

Binary Tree ◽

Communication Systems ◽

Clustering Algorithm ◽

Critical Systems ◽

Mission Critical

This chapter introduces two different algorithms to detect intrusions in mission critical communication systems to guarantee their security. The first algorithm is a classification algorithm which applies the concept of supervised learning. The second algorithm is a clustering algorithm which applies the concept of unsupervised learning. The algorithms detect intrusions using a set of detection rules that are structured in the form of decision trees. The algorithms are described in details and their results on well-known dataset are introduced. An enhancement for the J48algorithm is also introduced, where the decision tree for the algorithm is changed to a binary tree. The change enhances the complexity to reach a decision. The chapter includes a brief introduction about the security in Mission critical systems and the reason behind securing such systems. It introduces different methodologies that were introduced to detect intrusions in wireless communications.

Download Full-text

Supervised Learning: Decision Trees, Rule Algorithms, and Their Hybrids

Data Mining ◽

10.1007/978-0-387-36795-8_12 ◽

2007 ◽

pp. 381-417 ◽

Cited By ~ 2

Author(s):

Krzysztof J. Cios ◽

Roman W. Swiniarski ◽

Witold Pedrycz ◽

Lukasz A. Kurgan

Keyword(s):

Decision Trees ◽

Supervised Learning

Download Full-text

Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria

2013 IEEE International Conference on Computer Vision ◽

10.1109/iccv.2013.232 ◽

2013 ◽

Cited By ~ 4

Author(s):

Christoph Straehle ◽

Ullrich Koethe ◽

Fred A. Hamprecht

Keyword(s):

Decision Trees ◽

Supervised Learning ◽

Weakly Supervised Learning ◽

Weakly Supervised

Download Full-text

A Hybrid Cancer Prognosis System Based on Semi-Supervised Learning and Decision Trees

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-42042-9_79 ◽

2013 ◽

pp. 640-648 ◽

Cited By ~ 1

Author(s):

Yonghyun Nam ◽

Hyunjung Shin

Keyword(s):

Decision Trees ◽

Supervised Learning ◽

Cancer Prognosis

Download Full-text

Prediksi Waktu Sandar Kapal Di Pelabuhan Batu Ampar, Kota Batam, Provinsi Kepulauan Riau

Jurnal Sistem Cerdas ◽

10.37396/jsc.v2i2.26 ◽

2019 ◽

Vol 2 (2) ◽

pp. 119-134

Author(s):

Saiful Rizal ◽

Candra Kurniawan ◽

Fahrur Rozi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Decision Trees ◽

Supervised Learning ◽

Multiple Regression ◽

Analysis Of Variance ◽

Dwelling Time

Pelabuhan Batu Ampar merupakan pelabuhan barang terbesar di Kota Batam yang memiliki lalu lintas tertinggi baik untuk kegiatan ekspor maupun kegiatan impor. Waktu tunggu (dwelling time) masih menjadi masalah dalam layanan pelabuhan. Waktu tunggu merupakan salah satu indikator efisiensi pengelolaan pelabuhan. Rata-rata waktu tunggu pelabuhan Batu Ampar untuk kegiatan bongkar pada triwulan I-2015 adalah 7 hari, sedangkan kegiatan muatnya adalah 5 hari. Hal ini yang menjadikan kinerja pelabuhan Batu Ampar masih banyak dikeluhkan, sehingga berakibat banyaknya antrian kapal. Untuk itu, perlu dilakukan analisis guna menghasilkan model yang bisa memberikan gambaran waktu tunggu di pelabuhan dan melakukan evaluasi terhadap model analitik yang telah dibangun. Analisa data sekunder pelabuhan Batu Ampar menggunakan data mining. Metode data mining yang dilakukan menggunakan algoritma supervised learning, yaitu multiple regression dan decision trees. Tujuan umum dari multiple regression adalah untuk mempelajari lebih lanjut tentang hubungan antara beberapa variabel independen atau prediktor dan variabel dependen atau kriteria. Decision trees yang digunakan untuk eksplorasi data pelabuhan ini menggunakan klasifikasi. Klasifikasi decision trees dapat menemukan apakah data mengandung kelas objek yang dipisahkan dengan baik, sehingga kelas dapat diinterpretasikan secara bermakna dalam konteks teori substantif. Dua metode evaluasi model dilakukan untuk dua hasil permodelan yang dibangun. Uji Analysis of Variance (Anova) digunakan untuk evaluasi model multiple regression, sedangkan untuk model decision tree dievaluasi dengan confussion matrix. Hasil analisis data menunjukkan lamanya waktu kapal melakukan bongkar/muat dipengaruhi oleh tiga variabel yaitu jenis ekspedisi, bendera, dan volume. Dengan menggunakan regresi berganda maka dihasilkan model prediksi waktu sandar kapal. Hasil evaluasi model menunjukkan bahwa model yang dibuat signifikan. Dengan tingkat kepercayaan 95% model prediktif yang dibuat akan merepresentasikan nilai sebenarnya. Untuk decision tree, evaluasi menunjukkan model yang dibuat sudah fit, dengan presisi 84,50%.

Download Full-text