Recognition of overlapping elliptical objects in a binary image

AbstractRecognition of overlapping objects is required in many applications in the field of computer vision. Examples include cell segmentation, bubble detection and bloodstain pattern analysis. This paper presents a method to identify overlapping objects by approximating them with ellipses. The method is intended to be applied to complex-shaped regions which are believed to be composed of one or more overlapping objects. The method has two primary steps. First, a pool of candidate ellipses are generated by applying the Euclidean distance transform on a compressed image and the pool is filtered by an overlaying method. Second, the concave points on the contour of the region of interest are extracted by polygon approximation to divide the contour into segments. Then, the optimal ellipses are selected from among the candidates by choosing a minimal subset that best fits the identified segments. We propose the use of the adjusted Rand index, commonly applied in clustering, to compare the fitting result with ground truth. Through a set of computational and optimization efficiencies, we are able to apply our approach in complex images comprised of a number of overlapped regions. Experimental results on a synthetic data set, two types of cell images and bloodstain patterns show superior accuracy and flexibility of our method in ellipse recognition, relative to other methods.

Download Full-text

Comparative Analysis of Supervised and Unsupervised Approaches Applied to Large-Scale “In The Wild” Face Verification

Symmetry ◽

10.3390/sym12111832 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1832

Author(s):

Tomasz Hachaj ◽

Patryk Mazurek

Keyword(s):

Pattern Recognition ◽

Large Scale ◽

Statistical Significance ◽

Ground Truth ◽

Classification Algorithm ◽

Adjusted Rand Index ◽

Face Verification ◽

Data Set ◽

In The Wild ◽

Unsupervised Approaches

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.

Download Full-text

So you think you can PLS-DA?

BMC Bioinformatics ◽

10.1186/s12859-019-3310-7 ◽

2020 ◽

Vol 21 (S1) ◽

Author(s):

Daniel Ruiz-Perez ◽

Haibin Guan ◽

Purnima Madhivanan ◽

Kalai Mathee ◽

Giri Narasimhan

Keyword(s):

Feature Selection ◽

Signal To Noise Ratio ◽

Synthetic Data ◽

Principal Component ◽

Ground Truth ◽

Close Relative ◽

Data Set ◽

Series Of Experiments ◽

Feature Selector ◽

Class Labels

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Download Full-text

A data set of bloodstain patterns for teaching and research in bloodstain pattern analysis: Impact beating spatters

Data in Brief ◽

10.1016/j.dib.2018.02.070 ◽

2018 ◽

Vol 18 ◽

pp. 648-654 ◽

Cited By ~ 6

Author(s):

Daniel Attinger ◽

Yu Liu ◽

Tyler Bybee ◽

Kris De Brabanter

Keyword(s):

Pattern Analysis ◽

Bloodstain Pattern Analysis ◽

Data Set ◽

Teaching And Research ◽

Bloodstain Pattern

Download Full-text

Mixture Modeling with Pairwise, Instance-Level Class Constraints

Neural Computation ◽

10.1162/0899766054796914 ◽

2005 ◽

Vol 17 (11) ◽

pp. 2482-2507 ◽

Cited By ~ 14

Author(s):

Qi Zhao ◽

David J. Miller

Keyword(s):

Synthetic Data ◽

Ground Truth ◽

Mixture Modeling ◽

Data Set ◽

Ground Truth Data ◽

New Class ◽

Recent Approach ◽

Semisupervised Clustering ◽

The One ◽

Number Of Classes

The goal of semisupervised clustering/mixture modeling is to learn the underlying groups comprising a given data set when there is also some form of instance-level supervision available, usually in the form of labels or pairwise sample constraints. Most prior work with constraints assumes the number of classes is known, with each learned cluster assumed to be a class and, hence, subject to the given class constraints. When the number of classes is unknown or when the one-cluster-per-class assumption is not valid, the use of constraints may actually be deleterious to learning the ground-truth data groups. We address this by (1) allowing allocation of multiple mixture components to individual classes and (2) estimating both the number of components and the number of classes. We also address new class discovery, with components void of constraints treated as putative unknown classes. For both real-world and synthetic data, our method is shown to accurately estimate the number of classes and to give favorable comparison with the recent approach of Shental, Bar-Hillel, Hertz, and Weinshall (2003).

Download Full-text

Fuel Prediction and Reduction in Public Transportation by Sensor Monitoring and Bayesian Networks

Sensors ◽

10.3390/s21144733 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4733

Author(s):

Federico Delussu ◽

Faisal Imran ◽

Christian Mattia ◽

Rosa Meo

Keyword(s):

Bayesian Networks ◽

Fuel Consumption ◽

Granger Causality ◽

Public Transportation ◽

Heuristic Algorithms ◽

Synthetic Data ◽

Ground Truth ◽

Area Network ◽

Data Set ◽

Sensor Monitoring

We exploit the use of a controller area network (CAN-bus) to monitor sensors on the buses of local public transportation in a big European city. The aim is to advise fleet managers and policymakers on how to reduce fuel consumption so that air pollution is controlled and public services are improved. We deploy heuristic algorithms and exhaustive ones to generate Bayesian networks among the monitored variables. The aim is to describe the relevant relationships between the variables, to discover and confirm the possible cause–effect relationships, to predict the fuel consumption dependent on the contextual conditions of traffic, and to enable an intervention analysis to be conducted on the variables so that our goals are achieved. We propose a validation technique using Bayesian networks based on Granger causality: it relies upon observations of the time series formed by successive values of the variables in time. We use the same method based on Granger causality to rank the Bayesian networks obtained as well. A comparison of the Bayesian networks discovered against the ground truth is proposed in a synthetic data set, specifically generated for this study: the results confirm the validity of the Bayesian networks that agree on most of the existing relationships.

Download Full-text

A data set of bloodstain patterns for teaching and research in bloodstain pattern analysis: Gunshot backspatters

Data in Brief ◽

10.1016/j.dib.2018.11.075 ◽

2019 ◽

Vol 22 ◽

pp. 269-278 ◽

Cited By ~ 3

Author(s):

Daniel Attinger ◽

Yu Liu ◽

Ricky Faflak ◽

Yalin Rao ◽

Bryce A. Struttman ◽

...

Keyword(s):

Pattern Analysis ◽

Bloodstain Pattern Analysis ◽

Data Set ◽

Teaching And Research ◽

Bloodstain Pattern

Download Full-text

COMAP: A SYNTHETIC DATASET FOR COLLECTIVE MULTI-AGENT PERCEPTION OF AUTONOMOUS DRIVING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2021-255-2021 ◽

2021 ◽

Vol XLIII-B2-2021 ◽

pp. 255-263

Author(s):

Y. Yuan ◽

M. Sester

Keyword(s):

Synthetic Data ◽

Simulated Data ◽

Vehicle Detection ◽

Ground Truth ◽

Autonomous Driving ◽

Superior Performance ◽

Detection Accuracy ◽

Data Set ◽

Cloud Data ◽

Data Generator

Abstract. Collective perception of connected vehicles can sufficiently increase the safety and reliability of autonomous driving by sharing perception information. However, collecting real experimental data for such scenarios is extremely expensive. Therefore, we built a computational efficient co-simulation synthetic data generator through CARLA and SUMO simulators. The simulated data contain image and point cloud data as well as ground truth for object detection and semantic segmentation tasks. To verify the superior performance gain of collective perception over single-vehicle perception, we conducted experiments of vehicle detection, which is one of the most important perception tasks for autonomous driving, on this data set. A 3D object detector and a Bird’s Eye View (BEV) detector are trained and then test with different configurations of the number of cooperative vehicles and vehicle communication ranges. The experiment results showed that collective perception can not only dramatically increase the overall mean detection accuracy but also the localization accuracy of detected bounding boxes. Besides, a vehicle detection comparison experiment showed that the detection performance drop caused by sensor observation noise can be canceled out by redundant information collected by multiple vehicles.

Download Full-text

Localizing microseismic events on field data using a U-Net based convolutional neural network trained on synthetic data

Geophysics ◽

10.1190/geo2020-0868.1 ◽

2021 ◽

pp. 1-47

Author(s):

N. A. Vinard ◽

G. G. Drijkoningen ◽

D. J. Verschuur

Keyword(s):

Hydraulic Fracturing ◽

Field Data ◽

Synthetic Data ◽

Region Of Interest ◽

Velocity Model ◽

Mitigation Measures ◽

Large Field ◽

Data Sets ◽

Data Set ◽

Full Waveforms

Hydraulic fracturing plays an important role when it comes to the extraction of resources in unconventional reservoirs. The microseismic activity arising during hydraulic fracturing operations needs to be monitored to both improve productivity and to make decisions about mitigation measures. Recently, deep learning methods have been investigated to localize earthquakes given field-data waveforms as input. For optimal results, these methods require large field data sets that cover the entire region of interest. In practice, such data sets are often scarce. To overcome this shortcoming, we propose initially to use a (large) synthetic data set with full waveforms to train a U-Net that reconstructs the source location as a 3D Gaussian distribution. As field data set for our study we use data recorded during hydraulic fracturing operations in Texas. Synthetic waveforms were modelled using a velocity model from the site that was also used for a conventional diffraction-stacking (DS) approach. To increase the U-Nets ability to localize seismic events, we augmented the synthetic data with different techniques, including the addition of field noise. We select the best performing U-Net using 22 events that have previously been identified to be confidently localized by DS and apply that U-Net to all 1245 events. We compare our predicted locations to DS and the DS locations refined by a relative location (DSRL) method. The U-Net based locations are better constrained in depth compared to DS and the mean hypocenter difference with respect to DSRL locations is 163 meters. This shows potential for the use of synthetic data to complement or replace field data for training. Furthermore, after training, the method returns the source locations in near real-time given the full waveforms, alleviating the need to pick arrival times.

Download Full-text

So you think you can PLS-DA?

10.1101/207225 ◽

2017 ◽

Cited By ~ 3

Author(s):

Daniel Ruiz-Perez ◽

Haibin Guan ◽

Purnima Madhivanan ◽

Kalai Mathee ◽

Giri Narasimhan

Keyword(s):

Feature Selection ◽

Signal To Noise Ratio ◽

Synthetic Data ◽

Principal Component ◽

Ground Truth ◽

Close Relative ◽

Data Set ◽

Series Of Experiments ◽

Feature Selector ◽

Class Labels

AbstractBackgroundPartial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA).ResultsWe demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsdaConclusionsOur results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Download Full-text

Fig Plant Segmentation from Aerial Images Using a Deep Convolutional Encoder-Decoder Network

Remote Sensing ◽

10.3390/rs11101157 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1157 ◽

Cited By ~ 8

Author(s):

Jorge Fuentes-Pacheco ◽

Juan Torres-Olivares ◽

Edgar Roman-Rangel ◽

Salvador Cervantes ◽

Porfirio Juarez-Lopez ◽

...

Keyword(s):

Precision Agriculture ◽

Image Data ◽

Ground Truth ◽

Aerial Images ◽

Aerial Image ◽

Data Set ◽

Visual Appearance ◽

Aerial Robots ◽

Lighting Conditions ◽

Convolutional Encoder

Crop segmentation is an important task in Precision Agriculture, where the use of aerial robots with an on-board camera has contributed to the development of new solution alternatives. We address the problem of fig plant segmentation in top-view RGB (Red-Green-Blue) images of a crop grown under open-field difficult circumstances of complex lighting conditions and non-ideal crop maintenance practices defined by local farmers. We present a Convolutional Neural Network (CNN) with an encoder-decoder architecture that classifies each pixel as crop or non-crop using only raw colour images as input. Our approach achieves a mean accuracy of 93.85% despite the complexity of the background and a highly variable visual appearance of the leaves. We make available our CNN code to the research community, as well as the aerial image data set and a hand-made ground truth segmentation with pixel precision to facilitate the comparison among different algorithms.

Download Full-text