scholarly journals Detecting Illicit Entities in Bitcoin using Supervised Learning of Ensemble Decision Trees

Author(s):  
Pranav Nerurkar ◽  
Yann Busnel ◽  
Romaric Ludinard ◽  
Kunjal Shah ◽  
Sunil Bhirud ◽  
...  
Author(s):  
Maryna Nehrey ◽  
Taras Hnot

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.


Author(s):  
Maryna Nehrey ◽  
Taras Hnot

Successful business involves making decisions under uncertainty using a lot of information. Modern modeling approaches based on data science algorithms are a necessity for the effective management of business processes in aviation. Data science involves principles, processes, and techniques for understanding business processes through the analysis of data. The main goal of this chapter is to improve decision making using data science algorithms. There are sets of frequently used algorithms described in the chapter: linear, logistic regression models, decision trees as a classical example of supervised learning, and k-means and hierarchical clustering as unsupervised learning. Application of data science algorithms gives an opportunity for deep analyses and understanding of business processes in aviation, gives structuring of problems, provides systematization of business processes. Business processes modeling, based on the data science algorithms, enables us to substantiate solutions and even automate the processes of business decision making.


2016 ◽  
Vol 36 (1) ◽  
Author(s):  
Márton Ispány ◽  
Ilona Krasznahorkay

Decision trees have proved to be commonly used nonlinear tools for supervised learning. This technique is a way to divide the space of the predictor variables into bricks in order to achieve as homogeneous partitions as possible. We improved the CART method proposed by Breiman et al. (1984) using a stochastic search, first suggested by Chipman et al. (1998) in the Bayesian framework. In this paper estimates are given for the rate ofconvergence and the mixing time of the MCMC method defined on decision trees.


2017 ◽  
Vol 10 (2) ◽  
pp. 695-708 ◽  
Author(s):  
Simon Ruske ◽  
David O. Topping ◽  
Virginia E. Foot ◽  
Paul H. Kaye ◽  
Warren R. Stanley ◽  
...  

Abstract. Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.


Author(s):  
Karen Medhat ◽  
Rabie A. Ramadan ◽  
Ihab Talkhan

This chapter introduces two different algorithms to detect intrusions in mission critical communication systems to guarantee their security. The first algorithm is a classification algorithm which applies the concept of supervised learning. The second algorithm is a clustering algorithm which applies the concept of unsupervised learning. The algorithms detect intrusions using a set of detection rules that are structured in the form of decision trees. The algorithms are described in details and their results on well-known dataset are introduced. An enhancement for the J48algorithm is also introduced, where the decision tree for the algorithm is changed to a binary tree. The change enhances the complexity to reach a decision. The chapter includes a brief introduction about the security in Mission critical systems and the reason behind securing such systems. It introduces different methodologies that were introduced to detect intrusions in wireless communications.


Data Mining ◽  
2007 ◽  
pp. 381-417 ◽  
Author(s):  
Krzysztof J. Cios ◽  
Roman W. Swiniarski ◽  
Witold Pedrycz ◽  
Lukasz A. Kurgan

2019 ◽  
Vol 2 (2) ◽  
pp. 119-134
Author(s):  
Saiful Rizal ◽  
Candra Kurniawan ◽  
Fahrur Rozi

Pelabuhan Batu Ampar merupakan pelabuhan barang terbesar di Kota Batam yang memiliki lalu lintas tertinggi baik untuk kegiatan ekspor maupun kegiatan impor. Waktu tunggu (dwelling time) masih menjadi masalah dalam layanan pelabuhan. Waktu tunggu merupakan salah satu indikator efisiensi pengelolaan pelabuhan. Rata-rata waktu tunggu pelabuhan Batu Ampar untuk kegiatan bongkar pada triwulan I-2015 adalah 7 hari, sedangkan kegiatan muatnya adalah 5 hari. Hal ini yang menjadikan kinerja pelabuhan Batu Ampar masih banyak dikeluhkan, sehingga berakibat banyaknya antrian kapal. Untuk itu, perlu dilakukan analisis guna menghasilkan model yang bisa memberikan gambaran waktu tunggu di pelabuhan dan melakukan evaluasi terhadap model analitik yang telah dibangun. Analisa data sekunder pelabuhan Batu Ampar menggunakan data mining. Metode data mining yang dilakukan menggunakan algoritma supervised learning, yaitu multiple regression dan decision trees. Tujuan umum dari multiple regression adalah untuk mempelajari lebih lanjut tentang hubungan antara beberapa variabel independen atau prediktor dan variabel dependen atau kriteria. Decision trees yang digunakan untuk eksplorasi data pelabuhan ini menggunakan klasifikasi. Klasifikasi decision trees dapat menemukan apakah data mengandung kelas objek yang dipisahkan dengan baik, sehingga kelas dapat diinterpretasikan secara bermakna dalam konteks teori substantif. Dua metode evaluasi model dilakukan untuk dua hasil permodelan yang dibangun. Uji Analysis of Variance (Anova) digunakan untuk evaluasi model multiple regression, sedangkan untuk model decision tree dievaluasi dengan confussion matrix. Hasil analisis data menunjukkan lamanya waktu kapal melakukan bongkar/muat dipengaruhi oleh tiga variabel yaitu jenis ekspedisi, bendera, dan volume. Dengan menggunakan regresi berganda maka dihasilkan model prediksi waktu sandar kapal. Hasil evaluasi model menunjukkan bahwa model yang dibuat signifikan. Dengan tingkat kepercayaan 95% model prediktif yang dibuat akan merepresentasikan nilai sebenarnya. Untuk decision tree, evaluasi menunjukkan model yang dibuat sudah fit, dengan presisi 84,50%.


Sign in / Sign up

Export Citation Format

Share Document