Deep Ensemble Neural Network Approach for Federal Highway Administration Axle-Based Vehicle Classification Using Advanced Single Inductive Loops

The Federal Highway Administration (FHWA) vehicle classification scheme is designed to serve various transportation needs such as pavement design, emission estimation, and transportation planning. Many transportation agencies rely on Weigh-In-Motion and Automatic Vehicle Classification sites to collect these essential vehicle classification counts. However, the spatial coverage of these detection sites across the highway network is limited by high installation and maintenance costs. One cost-effective approach has been the use of single inductive loop sensors as an alternative to obtaining FHWA vehicle classification data. However, most data sets used to develop such models are skewed since many classes associated with larger truck configurations are less commonly observed in the roadway network. This makes it more difficult to accurately classify under-represented classes, even though many of these minority classes may have disproportionately adverse effects on pavement infrastructure and the environment. Therefore, previous models have been unable to adequately classify under-represented classes, and the overall performance of the models is often masked by excellent classification accuracy of majority classes, such as passenger vehicles and five-axle tractor-trailers. To resolve the challenge of imbalanced data sets in the FHWA vehicle classification, this paper constructed a bootstrap aggregating deep neural network model on a truck-focused data set using single inductive loop signatures. The proposed method significantly improved the model performance on several truck classes, especially minority classes such as Classes 7 and 11 which were overlooked in previous research. The model was tested on a distinct data set obtained from four spatially independent sites and achieved an accuracy of 0.87 and an average F1 score of 0.72.

Download Full-text

Automatic detection of volcanic eruptions in Doppler radar observations using a neural network approach

10.5194/egusphere-egu2020-11123 ◽

2020 ◽

Author(s):

Matthias Hort ◽

Daniel Uhle ◽

Fabio Venegas ◽

Lea Scharff ◽

Jan Walda ◽

...

Keyword(s):

Neural Network ◽

Doppler Radar ◽

Volcanic Eruptions ◽

Radar Data ◽

Visual Observation ◽

Data Sets ◽

Network Approach ◽

Neural Network Approach ◽

Data Set ◽

The Impact

<p>Immediate detection of volcanic eruptions is essential when trying to mitigate the impact on the health of people living in the vicinity of a volcano or the impact on infrastructure and aviation. Eruption detection is most often done by either visual observation or the analysis of acoustic data. While visual observation is often difficult due to environmental conditions, infrasound data usually provide the onset of an event. Doppler radar data, admittedly not available for a lot of volcanoes, however, provide information on the dynamics of the eruption and the amount of material released. Eruptions can be easily detected in the data by visual analysis and here we present a neural network approach for the automatic detection of eruptions in Doppler radar data. We use data recorded at Colima volcano in Mexico in 2014/2015 and a data set recorded at Turrialba volcano between 2017 and 2019. In a first step we picked eruptions, rain and typical noise in both data sets, which were the used for training two networks (training data set) and testing the performance of the network using a separate test data set. The accuracy for classifying the different type of signals was between 95 and 98% for both data sets, which we consider quite successful. In case of the Turriabla data set eruptions were picked based on observations of OVSICORI data. When classifying the complete data set we have from Turriabla using the trained network, an additional 40 eruptions were found, which were not in the OVSICORI catalogue.</p><p>In most cases data from the instruments are transmitted to an observatory by radio, so the amount of data available is an issue. We therefore tested by what amount the data could be reduced to still be able to successfully detect an eruption. We also kept the network as small as possible to ideally run it on a small computer (e.g. a Rasberry Pi architecture) for eruption detection on site, so only the information that an eruption is detected needs to be transmitted.</p>

Download Full-text

Implementation of convolutional neural network approach for COVID-19 disease detection

Physiological Genomics ◽

10.1152/physiolgenomics.00084.2020 ◽

2020 ◽

Vol 52 (12) ◽

pp. 590-601

Author(s):

Emrah Irmak

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Clinical Data ◽

Data Sets ◽

Grid Search ◽

Neural Network Approach ◽

Data Set ◽

X Ray ◽

Average Accuracy ◽

Chest X Ray

In this paper, two novel, powerful, and robust convolutional neural network (CNN) architectures are designed and proposed for two different classification tasks using publicly available data sets. The first architecture is able to decide whether a given chest X-ray image of a patient contains COVID-19 or not with 98.92% average accuracy. The second CNN architecture is able to divide a given chest X-ray image of a patient into three classes (COVID-19 versus normal versus pneumonia) with 98.27% average accuracy. The hyperparameters of both CNN models are automatically determined using Grid Search. Experimental results on large clinical data sets show the effectiveness of the proposed architectures and demonstrate that the proposed algorithms can overcome the disadvantages mentioned above. Moreover, the proposed CNN models are fully automatic in terms of not requiring the extraction of diseased tissue, which is a great improvement of available automatic methods in the literature. To the best of the author’s knowledge, this study is the first study to detect COVID-19 disease from given chest X-ray images, using CNN, whose hyperparameters are automatically determined by the Grid Search. Another important contribution of this study is that it is the first CNN-based COVID-19 chest X-ray image classification study that uses the largest possible clinical data set. A total of 1,524 COVID-19, 1,527 pneumonia, and 1524 normal X-ray images are collected. It is aimed to collect the largest number of COVID-19 X-ray images that exist in the literature until the writing of this research paper.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

A Comparison Study of Mahalanobis-Taguchi System and Neural Network for Multivariate Pattern Recognition

Design Engineering, Parts A and B ◽

10.1115/imece2005-80029 ◽

2005 ◽

Cited By ~ 10

Author(s):

Jungeui Hong ◽

Elizabeth A. Cudney ◽

Genichi Taguchi ◽

Rajesh Jugulum ◽

Kioumars Paryani ◽

...

Keyword(s):

Neural Network ◽

Small Data ◽

Data Sets ◽

Comparison Study ◽

Data Set ◽

Set Size ◽

Breast Cancer Study ◽

Discriminant Ability ◽

Small Data Sets ◽

Multivariate Pattern

The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to compare the ability of the Mahalanobis-Taguchi System and a neural network to discriminate using small data sets. We examine the discriminant ability as a function of data set size using an application area where reliable data is publicly available. The study uses the Wisconsin Breast Cancer study with nine attributes and one class.

Download Full-text

Neural Network Approach for Predicting Ship Speed and Fuel Consumption

Journal of Marine Science and Engineering ◽

10.3390/jmse9020119 ◽

2021 ◽

Vol 9 (2) ◽

pp. 119

Author(s):

Lúcia Moreira ◽

Roberto Vettor ◽

Carlos Guedes Soares

Keyword(s):

Neural Network ◽

Fuel Consumption ◽

Computing System ◽

Fuel Oil ◽

Network Computing ◽

Peak Period ◽

Neural Network Approach ◽

Data Set ◽

Navigation Data ◽

Ship Speed

In this paper, simulations of a ship travelling on a given oceanic route were performed by a weather routing system to provide a large realistic navigation data set, which could represent a collection of data obtained on board a ship in operation. This data set was employed to train a neural network computing system in order to predict ship speed and fuel consumption. The model was trained using the Levenberg–Marquardt backpropagation scheme to establish the relation between the ship speed and the respective propulsion configuration for the existing sea conditions, i.e., the output torque of the main engine, the revolutions per minute of the propulsion shaft, the significant wave height, and the peak period of the waves, together with the relative angle of wave encounter. Additional results were obtained by also using the model to train the relationship between the same inputs used to determine the speed of the ship and the fuel consumption. A sensitivity analysis was performed to analyze the artificial neural network capability to forecast the ship speed and fuel oil consumption without information on the status of the engine (the revolutions per minute and torque) using as inputs only the information of the sea state. The results obtained with the neural network model show very good accuracy both in the prediction of the speed of the vessel and the fuel consumption.

Download Full-text

The Application of Probabilistic Neural Network in Speech Recognition Based on Partition Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.2173 ◽

2012 ◽

Vol 263-266 ◽

pp. 2173-2178

Author(s):

Xin Guang Li ◽

Min Feng Yao ◽

Li Rui Jian ◽

Zhen Jiang Li

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Clustering Algorithm ◽

Probabilistic Neural Network ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Partition Clustering

A probabilistic neural network (PNN) speech recognition model based on the partition clustering algorithm is proposed in this paper. The most important advantage of PNN is that training is easy and instantaneous. Therefore, PNN is capable of dealing with real time speech recognition. Besides, in order to increase the performance of PNN, the selection of data set is one of the most important issues. In this paper, using the partition clustering algorithm to select data is proposed. The proposed model is tested on two data sets from the field of spoken Arabic numbers, with promising results. The performance of the proposed model is compared to single back propagation neural network and integrated back propagation neural network. The final comparison result shows that the proposed model performs better than the other two neural networks, and has an accuracy rate of 92.41%.

Download Full-text

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Genes ◽

10.3390/genes11070717 ◽

2020 ◽

Vol 11 (7) ◽

pp. 717

Author(s):

Garba Abdulrauf Sharifai ◽

Zurinahni Zainol

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Imbalanced Data ◽

High Dimensional ◽

Data Sets ◽

Biomedical Data ◽

Data Set ◽

Grasshopper Optimization Algorithm ◽

Imbalanced Class ◽

Grasshopper Optimization

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.

Download Full-text

A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-1014-6 ◽

2020 ◽

Vol 20 (1) ◽

Cited By ~ 8

Author(s):

André M. Carrington ◽

Paul W. Fieguth ◽

Hammad Qazi ◽

Andreas Holzinger ◽

Helen H. Chen ◽

...

Keyword(s):

Roc Curve ◽

Diagnostic Testing ◽

Real Life ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Data Sets ◽

Data Set ◽

Partial Auc ◽

C Statistic ◽

Future Work

Abstract Background In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives. Methods We derive and propose a new concordant partial AUC and a new partial c statistic for ROC data—as foundational measures and methods to help understand and explain parts of the ROC plot and AUC. Our partial measures are continuous and discrete versions of the same measure, are derived from the AUC and c statistic respectively, are validated as equal to each other, and validated as equal in summation to whole measures where expected. Our partial measures are tested for validity on a classic ROC example from Fawcett, a variation thereof, and two real-life benchmark data sets in breast cancer: the Wisconsin and Ljubljana data sets. Interpretation of an example is then provided. Results Results show the expected equalities between our new partial measures and the existing whole measures. The example interpretation illustrates the need for our newly derived partial measures. Conclusions The concordant partial area under the ROC curve was proposed and unlike previous partial measure alternatives, it maintains the characteristics of the AUC. The first partial c statistic for ROC plots was also proposed as an unbiased interpretation for part of an ROC curve. The expected equalities among and between our newly derived partial measures and their existing full measure counterparts are confirmed. These measures may be used with any data set but this paper focuses on imbalanced data with low prevalence. Future work Future work with our proposed measures may: demonstrate their value for imbalanced data with high prevalence, compare them to other measures not based on areas; and combine them with other ROC measures and techniques.

Download Full-text

THE STAG OIL FIELD FORMATION EVALUATION: A NEURAL NETWORK APPROACH

The APPEA Journal ◽

10.1071/aj98026 ◽

1999 ◽

Vol 39 (1) ◽

pp. 451 ◽

Cited By ~ 5

Author(s):

H. Crocker ◽

C.C. Fung ◽

K.W. Wong

Keyword(s):

Neural Network ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Oil Field ◽

Training Data ◽

Ann Model ◽

Neural Network Approach ◽

Data Set ◽

Formation Evaluation ◽

Core Data

The producing M. australis Sandstone of the Stag Oil Field is a bioturbated glauconitic sandstone that is difficult to evaluate using conventional methods. Well log and core data are available for the Stag Field and for the nearby Centaur–1 well. Eight wells have log data; six also have core data.In the past few years artificial intelligence has been applied to formation evaluation. In particular, artificial neural networks (ANN) used to match log and core data have been studied. The ANN approach has been used to analyse the producing Stag Field sands. In this paper, new ways of applying the ANN are reported. Results from simple ANN approach are unsatisfactory. An integrated ANN approach comprising the unsupervised Self-Organising Map (SOM) and the Supervised Back Propagation Neural Network (BPNN) appears to give a more reasonable analysis.In this case study the mineralogical and petrophysical characteristics of a cored well are predicted from the 'training' data set of the other cored wells in the field. The prediction from the ANN model is then used for comparison with the known core data. In this manner, the accuracy of the prediction is determined and a prediction qualifier computed.This new approach to formation evaluation should provide a match between log and core data that may be used to predict the characteristics of a similar uncored interval. Although the results for the Stag Field are satisfactory, further study applying the method to other fields is required.

Download Full-text

Precision-Recall versus Accuracy and the Role of Large Data Sets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014039 ◽

2019 ◽

Vol 33 ◽

pp. 4039-4048 ◽

Cited By ~ 8

Author(s):

Brendan Juba ◽

Hai S. Le

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Large Data ◽

Constant Factor ◽

Data Sets ◽

Data Set ◽

Small Constant ◽

Classifier Performance ◽

Necessary And Sufficient

Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that a larger number of examples is necessary and sufficient to address class imbalance, a finding we also illustrate empirically.

Download Full-text