Synchronous and Open, Real World, Vehicle, ADAS, and Infrastructure Data Streams for Automotive Machine Learning Algorithms Research

2020 ◽  
Author(s):  
Aaron I. Rabinowitz ◽  
Tushar Gaikwad ◽  
Samantha White ◽  
Thomas Bradley ◽  
Zachary Asher
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.


Author(s):  
András A. Benczúr ◽  
Levente Kocsis ◽  
Róbert Pálovics

Author(s):  
Karthik R ◽  
Navinkumar R ◽  
Rammkumar U ◽  
Mothilal K. C.

Cashless transactions such as online transactions, credit card transactions, and mobile wallet are becoming more popular in financial transactions nowadays. With increased number of such cashless transaction, number of fraudulent transactions is also increasing. Fraud can be distinguished by analyzing spending behavior of customers (users) from previous transaction data. Credit card fraud has highly imbalanced publicly available datasets. In this paper, we apply many supervised machine learning algorithms to detect credit card fraudulent transactions using a real-world dataset. Furthermore, we employ these algorithms to implement a super classifier using ensemble learning methods. We identify the most important variables that may lead to higher accuracy in credit card fraudulent transaction detection. Additionally, we compare and discuss the performance of various supervised machine learning algorithms that exist in literature against the super classifier that we implemented in this paper.


2017 ◽  
Vol 127 ◽  
pp. 249-257 ◽  
Author(s):  
Angelos Valsamis ◽  
Konstantinos Tserpes ◽  
Dimitrios Zissis ◽  
Dimosthenis Anagnostopoulos ◽  
Theodora Varvarigou

2018 ◽  
Vol 210 ◽  
pp. 04019 ◽  
Author(s):  
Hyontai SUG

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1806
Author(s):  
Silvio Semanjski ◽  
Ivana Semanjski ◽  
Wim De Wilde ◽  
Sidharta Gautama

Global Navigation Satellite System (GNSS) meaconing and spoofing are being considered as the key threats to the Safety-of-Life (SoL) applications that mostly rely upon the use of open service (OS) signals without signal or data-level protection. While a number of pre and post correlation techniques have been proposed so far, possible utilization of the supervised machine learning algorithms to detect GNSS meaconing and spoofing is currently being examined. One of the supervised machine learning algorithms, the Support Vector Machine classification (C-SVM), is proposed for utilization at the GNSS receiver level due to fact that at that stage of signal processing, a number of measurements and observables exists. It is possible to establish the correlation pattern among those GNSS measurements and observables and monitor it with use of the C-SVM classification, the results of which we present in this paper. By adding the real-world spoofing and meaconing datasets to the laboratory-generated spoofing datasets at the training stage of the C-SVM, we complement the experiments and results obtained in Part I of this paper, where the training was conducted solely with the use of laboratory-generated spoofing datasets. In two experiments presented in this paper, the C-SVM algorithm was cross-fed with the real-world meaconing and spoofing datasets, such that the meaconing addition to the training was validated by the spoofing dataset, and vice versa. The comparative analysis of all four experiments presented in this paper shows promising results in two aspects: (i) the added value of the training dataset enrichment seems to be relevant for real-world GNSS signal manipulation attempt detection and (ii) the C-SVM-based approach seems to be promising for GNSS signal manipulation attempt detection, as well as in the context of potential federated learning applications.


Author(s):  
Alane Lima ◽  
André Vignatti ◽  
Murilo Silva

The empirical study of large real world networks in the last 20 years showed that a variety of real-world graphs are power-law. There are evidence that optimization problems seem easier in these graphs; however, for a given graph, classifying it as power-law is a problem in itself. In this work, we propose using machine learning algorithms (KNN, SVM, Gradient Boosting and Random Forests) for this task. We suggest a graph representation based on [Canning et al. 2018], but using a much simplified set of structural properties of the graph. We compare the proposed representation with the one generated by the graph2vec framework. The experiments attained high accuracy, indicating that a reduced set of structural graph properties is enough for the presented problem.


Author(s):  
Peter Kokol ◽  
Jan Jurman ◽  
Tajda Bogovič ◽  
Tadej Završnik ◽  
Jernej Završnik ◽  
...  

Cardiovascular diseases are one of the leading global causes of death. Following the positive experiences with machine learning in medicine we performed a study in which we assessed how machine learning can support decision making regarding coronary artery diseases. While a plethora of studies reported high accuracy rates of machine learning algorithms (MLA) in medical applications, the majority of the studies used the cleansed medical data bases without the presence of the “real world noise.” Contrary, the aim of our study was to perform machine learning on the routinely collected Anonymous Cardiovascular Database (ACD), extracted directly from a hospital information system of the University Medical Centre Maribor). Many studies used tens of different machine learning approaches with substantially varying results regarding accuracy (ACU), hence they were not usable as a base to validate the results of our study. Thus, we decided, that our study will be performed in the 2 phases. During the first phase we trained the different MLAs on a comparable University of California Irvine UCI Heart Disease Dataset. The aim of this phase was first to define the “standard” ACU values and second to reduce the set of all MLAs to the most appropriate candidates to be used on the ACD, during the second phase. Seven MLAs were selected and the standard ACUs for the 2-class diagnosis were 0.85. Surprisingly, the same MLAs achieved the ACUs around 0.96 on the ACD. A general comparison of both databases revealed that different machine learning algorithms performance differ significantly. The accuracy on the ACD reached the highest levels using decision trees and neural networks while Liner regression and AdaBoost performed best in UCI database. This might indicate that decision trees based algorithms and neural networks are better in coping with real world not “noise free” clinical data and could successfully support decision making concerned with coronary diseasesmachine learning.


Sign in / Sign up

Export Citation Format

Share Document