Synchronous and Open, Real World, Vehicle, ADAS, and Infrastructure Data Streams for Automotive Machine Learning Algorithms Research

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Online Machine Learning Algorithms over Data Streams

Encyclopedia of Big Data Technologies ◽

10.1007/978-3-319-63962-8_329-1 ◽

2018 ◽

pp. 1-9

Author(s):

András A. Benczúr ◽

Levente Kocsis ◽

Róbert Pálovics

Keyword(s):

Machine Learning ◽

Data Streams ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Supervised Machine Learning Algorithms for Credit Card Fraudulent Transaction Detection

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195274 ◽

2019 ◽

pp. 394-401

Author(s):

Karthik R ◽

Navinkumar R ◽

Rammkumar U ◽

Mothilal K. C.

Keyword(s):

Machine Learning ◽

Real World ◽

Credit Card ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Transaction Data ◽

Online Transactions ◽

Spending Behavior ◽

Financial Transactions

Cashless transactions such as online transactions, credit card transactions, and mobile wallet are becoming more popular in financial transactions nowadays. With increased number of such cashless transaction, number of fraudulent transactions is also increasing. Fraud can be distinguished by analyzing spending behavior of customers (users) from previous transaction data. Credit card fraud has highly imbalanced publicly available datasets. In this paper, we apply many supervised machine learning algorithms to detect credit card fraudulent transactions using a real-world dataset. Furthermore, we employ these algorithms to implement a super classifier using ensemble learning methods. We identify the most important variables that may lead to higher accuracy in credit card fraudulent transaction detection. Additionally, we compare and discuss the performance of various supervised machine learning algorithms that exist in literature against the super classifier that we implemented in this paper.

Download Full-text

Classification and Analysis of Real-World Earthquake Data Using Various Machine Learning Algorithms

Lecture Notes in Electrical Engineering - Advances in Data Sciences, Security and Applications ◽

10.1007/978-981-15-0372-6_1 ◽

2019 ◽

pp. 1-14

Author(s):

Manka Vasti ◽

Amita Dev

Keyword(s):

Machine Learning ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Earthquake Data

Download Full-text

Employing traditional machine learning algorithms for big data streams analysis: The case of object trajectory prediction

Journal of Systems and Software ◽

10.1016/j.jss.2016.06.016 ◽

2017 ◽

Vol 127 ◽

pp. 249-257 ◽

Cited By ~ 25

Author(s):

Angelos Valsamis ◽

Konstantinos Tserpes ◽

Dimitrios Zissis ◽

Dimosthenis Anagnostopoulos ◽

Theodora Varvarigou

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Streams ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Trajectory Prediction ◽

Big Data Streams

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Use of Supervised Machine Learning for GNSS Signal Spoofing Detection with Validation on Real-World Meaconing and Spoofing Data—Part II

Sensors ◽

10.3390/s20071806 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1806

Author(s):

Silvio Semanjski ◽

Ivana Semanjski ◽

Wim De Wilde ◽

Sidharta Gautama

Keyword(s):

Machine Learning ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Added Value ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Correlation Pattern ◽

The Real

Global Navigation Satellite System (GNSS) meaconing and spoofing are being considered as the key threats to the Safety-of-Life (SoL) applications that mostly rely upon the use of open service (OS) signals without signal or data-level protection. While a number of pre and post correlation techniques have been proposed so far, possible utilization of the supervised machine learning algorithms to detect GNSS meaconing and spoofing is currently being examined. One of the supervised machine learning algorithms, the Support Vector Machine classification (C-SVM), is proposed for utilization at the GNSS receiver level due to fact that at that stage of signal processing, a number of measurements and observables exists. It is possible to establish the correlation pattern among those GNSS measurements and observables and monitor it with use of the C-SVM classification, the results of which we present in this paper. By adding the real-world spoofing and meaconing datasets to the laboratory-generated spoofing datasets at the training stage of the C-SVM, we complement the experiments and results obtained in Part I of this paper, where the training was conducted solely with the use of laboratory-generated spoofing datasets. In two experiments presented in this paper, the C-SVM algorithm was cross-fed with the real-world meaconing and spoofing datasets, such that the meaconing addition to the training was validated by the spoofing dataset, and vice versa. The comparative analysis of all four experiments presented in this paper shows promising results in two aspects: (i) the added value of the training dataset enrichment seems to be relevant for real-world GNSS signal manipulation attempt detection and (ii) the C-SVM-based approach seems to be promising for GNSS signal manipulation attempt detection, as well as in the context of potential federated learning applications.

Download Full-text

Analysis of Various Machine Learning Algorithms for Enhanced Opinion Mining Using Twitter Data Streams

2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE) ◽

10.1109/icmete.2016.19 ◽

2016 ◽

Cited By ~ 5

Author(s):

Praveen Kumar ◽

Tanupriya Choudhury ◽

Seema Rawat ◽

Shobhna Jayaraman

Keyword(s):

Machine Learning ◽

Data Streams ◽

Opinion Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Twitter Data

Download Full-text

Recognizing Power-law Graphs by Machine Learning Algorithms using a Reduced Set of Structural Features

10.5753/eniac.2019.9319 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alane Lima ◽

André Vignatti ◽

Murilo Silva

Keyword(s):

Machine Learning ◽

Power Law ◽

Real World ◽

Optimization Problems ◽

Learning Algorithms ◽

Structural Features ◽

Machine Learning Algorithms ◽

Graph Representation ◽

Gradient Boosting ◽

Graph Properties

The empirical study of large real world networks in the last 20 years showed that a variety of real-world graphs are power-law. There are evidence that optimization problems seem easier in these graphs; however, for a given graph, classifying it as power-law is a problem in itself. In this work, we propose using machine learning algorithms (KNN, SVM, Gradient Boosting and Random Forests) for this task. We suggest a graph representation based on [Canning et al. 2018], but using a much simplified set of structural properties of the graph. We compare the proposed representation with the one generated by the graph2vec framework. The experiments attained high accuracy, indicating that a reduced set of structural graph properties is enough for the presented problem.

Download Full-text

Supporting Real World Decision Making in Coronary Diseases Using Machine Learning

INQUIRY The Journal of Health Care Organization Provision and Financing ◽

10.1177/0046958021997338 ◽

2021 ◽

Vol 58 ◽

pp. 004695802199733

Author(s):

Peter Kokol ◽

Jan Jurman ◽

Tajda Bogovič ◽

Tadej Završnik ◽

Jernej Završnik ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Decision Making ◽

Decision Trees ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Second Phase ◽

Learning Approaches ◽

Support Decision Making

Cardiovascular diseases are one of the leading global causes of death. Following the positive experiences with machine learning in medicine we performed a study in which we assessed how machine learning can support decision making regarding coronary artery diseases. While a plethora of studies reported high accuracy rates of machine learning algorithms (MLA) in medical applications, the majority of the studies used the cleansed medical data bases without the presence of the “real world noise.” Contrary, the aim of our study was to perform machine learning on the routinely collected Anonymous Cardiovascular Database (ACD), extracted directly from a hospital information system of the University Medical Centre Maribor). Many studies used tens of different machine learning approaches with substantially varying results regarding accuracy (ACU), hence they were not usable as a base to validate the results of our study. Thus, we decided, that our study will be performed in the 2 phases. During the first phase we trained the different MLAs on a comparable University of California Irvine UCI Heart Disease Dataset. The aim of this phase was first to define the “standard” ACU values and second to reduce the set of all MLAs to the most appropriate candidates to be used on the ACD, during the second phase. Seven MLAs were selected and the standard ACUs for the 2-class diagnosis were 0.85. Surprisingly, the same MLAs achieved the ACUs around 0.96 on the ACD. A general comparison of both databases revealed that different machine learning algorithms performance differ significantly. The accuracy on the ACD reached the highest levels using decision trees and neural networks while Liner regression and AdaBoost performed best in UCI database. This might indicate that decision trees based algorithms and neural networks are better in coping with real world not “noise free” clinical data and could successfully support decision making concerned with coronary diseasesmachine learning.

Download Full-text