Comparative Analysis of Supervised and Unsupervised Approaches Applied to Large-Scale “In The Wild” Face Verification

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.

Download Full-text

SaltSeg: Automatic 3D salt segmentation using a deep convolutional neural network

Interpretation ◽

10.1190/int-2018-0235.1 ◽

2019 ◽

Vol 7 (3) ◽

pp. SE113-SE122 ◽

Cited By ~ 26

Author(s):

Yunzhi Shi ◽

Xinming Wu ◽

Sergey Fomel

Keyword(s):

Large Scale ◽

Model Building ◽

Ground Truth ◽

Velocity Model ◽

Training Data ◽

Data Sets ◽

Validation Data ◽

Data Set ◽

Seismic Image ◽

Data Generator

Salt boundary interpretation is important for the understanding of salt tectonics and velocity model building for seismic migration. Conventional methods consist of computing salt attributes and extracting salt boundaries. We have formulated the problem as 3D image segmentation and evaluated an efficient approach based on deep convolutional neural networks (CNNs) with an encoder-decoder architecture. To train the model, we design a data generator that extracts randomly positioned subvolumes from large-scale 3D training data set followed by data augmentation, then feed a large number of subvolumes into the network while using salt/nonsalt binary labels generated by thresholding the velocity model as ground truth labels. We test the model on validation data sets and compare the blind test predictions with the ground truth. Our results indicate that our method is capable of automatically capturing subtle salt features from the 3D seismic image with less or no need for manual input. We further test the model on a field example to indicate the generalization of this deep CNN method across different data sets.

Download Full-text

Automatic Video Annotation of Human Health Care Action via Clustering

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3181 ◽

2020 ◽

Vol 10 (10) ◽

pp. 2512-2521

Keyword(s):

Health Care ◽

Human Health ◽

Large Scale ◽

Image Annotation ◽

Spinal Injury ◽

Health Sector ◽

Video Annotation ◽

Adjusted Rand Index ◽

Crowd Sourcing ◽

Data Set

Vision-based activity monitoring provides applications that revolutionized the e-health sector. Considering the potential of crowdsourcing data, to develop large scale applications, the researchers are working on consolidating smart hospital with crowd sourcing data. For creating a meaningful pattern from such huge data, a key challenge is that it needs to be annotated. Especially, the annotation of medical images plays an important role in providing pervasive health services. Although, multiple image annotation methods such as manual and semi-supervised exist. However, high cost and computation time remains a major issue. To overcome the abovementioned issues, a methodology is proposed for automatic annotation of images. The proposed approach is based on three tires namely frame extraction, interest point's generation, and clustering. Since the medical imaging lacks an appropriate dataset for our experimentation. Consequently, we have introduced a new dataset of Human Health care Actions (HHA). The data set comprises of videos related to multiple medical emergencies such as allergy reactions, burn, asthma, brain injury, bleeding, poisoning, heart attack, choking and spinal injury. We have also proposed an evaluation model to assess the effectiveness of the proposed methodology. The promising results of the proposed technique indicate the effectiveness of 78% in terms of Adjusted Rand Index. Furthermore, to investigate the effectiveness of the proposed technique, a comparison is made, by training the neural network classifier with annotated labels generated by proposed methodology and other existing techniques such as semi-supervised and manual methods. The overall precision of the proposed methodology is 0.75 (i.e., 75%) and semi-supervised learning is 0.69 (69%).

Download Full-text

Application of Nonnegative Tensor Factorization for Intercity Rail–Air Transport Supply Configuration Pattern Recognition

Sustainability ◽

10.3390/su11061803 ◽

2019 ◽

Vol 11 (6) ◽

pp. 1803

Author(s):

Han Zhong ◽

Geqi Qi ◽

Wei Guan ◽

Xiaochen Hua

Keyword(s):

Pattern Recognition ◽

High Speed ◽

Large Scale ◽

Rapid Expansion ◽

Air Transport ◽

Tensor Factorization ◽

Nonnegative Tensor ◽

Data Set ◽

Nonnegative Tensor Factorization ◽

Supply Level

With the rapid expansion of the railway represented by high-speed rail (HSR) in China, competition between railway and aviation will become increasingly common on a large scale. Beijing, Shanghai, and Guangzhou are the busiest cities and the hubs of railway and aviation transportation in China. Obtaining their supply configuration patterns can help identify defects in planning. To achieve that, supply level is proposed, which is a weighted supply traffic volume that takes population and distance factors into account. Then supply configuration can be expressed as the distribution of supply level over time periods with different railway stations, airports, and city categories. Furthermore, nonnegative tensor factorization (NTF) is applied to pattern recognition by introducing CP (CANDECOMP/PARAFAC) decomposition and the block coordinate descent (BCD) algorithm for the selected data set. Numerical experiments show that the designed method has good performance in terms of computation speed and solution quality. Recognition results indicate the significant pattern characteristics of rail–air transport for Beijing, Shanghai, and Guangzhou are extracted, which can provide some theoretical references for practical policymakers.

Download Full-text

A geostatistical approach to multisensor rain field reconstruction and downscaling

Hydrology and Earth System Sciences ◽

10.5194/hess-5-201-2001 ◽

2001 ◽

Vol 5 (2) ◽

pp. 201-213 ◽

Cited By ~ 9

Author(s):

P. Fiorucci ◽

P. La Barbera ◽

L.G. Lanza ◽

R. Minciardi

Keyword(s):

Large Scale ◽

A Priori ◽

Grid Cell ◽

Ground Truth ◽

Rain Gauge ◽

Rain Event ◽

Data Set ◽

Field Reconstruction ◽

Geostatistical Approach ◽

A Posteriori Estimates

Abstract. A rain field reconstruction and downscaling methodology is presented, which allows suitable integration of large scale rainfall information and rain-gauge measurements at the ground. The former data set is assumed to provide probabilistic indicators that are used to infer the parameters of the probability density function of the stochastic rain process at each pixel site. Rain-gauge measurements are assumed as the ground truth and used to constrain the reconstructed rain field to the associated point values. Downscaling is performed by assuming the a posteriori estimates of the rain figures at each grid cell as the a priori large-scale conditioning values for reconstruction of the rain field at finer scale. The case study of an intense rain event recently observed in northern Italy is presented and results are discussed with reference to the modelling capabilities of the proposed methodology. Keywords: Reconstruction, downscaling, remote sensing, geostatistics, Meteosat

Download Full-text

Recognition of overlapping elliptical objects in a binary image

Pattern Analysis and Applications ◽

10.1007/s10044-020-00951-z ◽

2021 ◽

Author(s):

Tong Zou ◽

Tianyu Pan ◽

Michael Taylor ◽

Hal Stern

Keyword(s):

Synthetic Data ◽

Region Of Interest ◽

Ground Truth ◽

Cell Segmentation ◽

Adjusted Rand Index ◽

Bloodstain Pattern Analysis ◽

Data Set ◽

Bubble Detection ◽

Bloodstain Pattern ◽

Minimal Subset

AbstractRecognition of overlapping objects is required in many applications in the field of computer vision. Examples include cell segmentation, bubble detection and bloodstain pattern analysis. This paper presents a method to identify overlapping objects by approximating them with ellipses. The method is intended to be applied to complex-shaped regions which are believed to be composed of one or more overlapping objects. The method has two primary steps. First, a pool of candidate ellipses are generated by applying the Euclidean distance transform on a compressed image and the pool is filtered by an overlaying method. Second, the concave points on the contour of the region of interest are extracted by polygon approximation to divide the contour into segments. Then, the optimal ellipses are selected from among the candidates by choosing a minimal subset that best fits the identified segments. We propose the use of the adjusted Rand index, commonly applied in clustering, to compare the fitting result with ground truth. Through a set of computational and optimization efficiencies, we are able to apply our approach in complex images comprised of a number of overlapped regions. Experimental results on a synthetic data set, two types of cell images and bloodstain patterns show superior accuracy and flexibility of our method in ellipse recognition, relative to other methods.

Download Full-text

Two hours in Hollywood: A manually annotated ground truth data set of eye movements during movie clip watching

Journal of Eye Movement Research ◽

10.16910/jemr.13.4.5 ◽

2020 ◽

Vol 13 (4) ◽

Author(s):

Ioannis Agtzidis ◽

Mikhail Startsev ◽

Michael Dorr

Keyword(s):

Eye Movement ◽

Large Scale ◽

Ground Truth ◽

Training Set ◽

Event Type ◽

Short Article ◽

Data Set ◽

Ground Truth Data ◽

The Gaze ◽

Movie Clip

In this short article we present our manual annotation of the eye movement events in a subset of the large-scale eye tracking data set Hollywood2. Our labels include fixations, saccades, and smooth pursuits, as well as a noise event type (the latter representing either blinks, loss of tracking, or physically implausible signals). In order to achieve more consistent annotations, the gaze samples were labelled by a novice rater based on rudimentary algorithmic suggestions, and subsequently corrected by an expert rater. Overall, we annotated eye movement events in the recordings corresponding to 50 randomly selected test set clips and 6 training set clips from Hollywood2, which were viewed by 16 observers and amount to a total of approximately 130 minutes of gaze data. In these labels, 62.4% of the samples were attributed to fixations, 9.1% – to saccades, and, notably, 24.2% – to pursuit (the remainder marked as noise). After evaluation of 15 published eye movement classification algorithms on our newly collected annotated data set, we found that the most recent algorithms perform very well on average, and even reach human-level labelling quality for fixations and saccades, but all have a much larger room for improvement when it comes to smooth pursuit classification. The data set is made available at https://gin.g- node.org/ioannis.agtzidis/hollywood2_em.

Download Full-text

Large-scale study of speech acts' development in early childhood

10.31234/osf.io/xs8k6 ◽

2021 ◽

Author(s):

Mitja Nikolaus ◽

Eliot Maes ◽

Jeremy Auguste ◽

Laurent Prévot ◽

Abdellah Fourtassi

Keyword(s):

Early Childhood ◽

Speech Acts ◽

Large Scale ◽

English Language ◽

Language Use ◽

Ground Truth ◽

Small Samples ◽

Language Data ◽

In The Wild ◽

Child Caregiver

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio, Snow, Pan, & Rollins, 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Further, we introduced two complementary measures for the age of acquisition of speech acts which allows us to rank different speech acts according to their order of emergence in production and comprehension.Our model will be shared with the community so that researchers can use it with their data to investigate various question related to language use both in typical and atypical populations of children.

Download Full-text

Characterizing and automatically detecting smooth pursuit in a large-scale ground-truth data set of dynamic natural scenes

Journal of Vision ◽

10.1167/19.14.10 ◽

2019 ◽

Vol 19 (14) ◽

pp. 10 ◽

Cited By ~ 1

Author(s):

Mikhail Startsev ◽

Ioannis Agtzidis ◽

Michael Dorr

Keyword(s):

Smooth Pursuit ◽

Large Scale ◽

Ground Truth ◽

Natural Scenes ◽

Data Set ◽

Ground Truth Data

Download Full-text

Systematic prediction of drug resistance caused by transporter genes in cancer cells

Scientific Reports ◽

10.1038/s41598-021-86921-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yao Shen ◽

Zhipeng Yan

Keyword(s):

Drug Resistance ◽

Large Scale ◽

Ground Truth ◽

Data Sets ◽

Drug Transporter ◽

Data Set ◽

Ground Truth Data ◽

Public Data ◽

The Difference ◽

Novel Drug

AbstractTo study the drug resistance problem caused by transporters, we leveraged multiple large-scale public data sets of drug sensitivity, cell line genetic and transcriptional profiles, and gene silencing experiments. Through systematic integration of these data sets, we built various machine learning models to predict the difference between cell viability upon drug treatment and the silencing of its target across the same cell lines. More than 50% of the models built with the same data set or with independent data sets successfully predicted the testing set with significant correlation to the ground truth data. Features selected by our models were also significantly enriched in known drug transporters annotated in DrugBank for more than 60% of the models. Novel drug-transporter interactions were discovered, such as lapatinib and gefitinib with ABCA1, olaparib and NVPADW742 with ABCC3, and gefitinib and AZ628 with SLC4A4. Furthermore, we identified ABCC3, SLC12A7, SLCO4A1, SERPINA1, and SLC22A3 as potential transporters for erlotinib, three of which are also significantly more highly expressed in patients who were resistant to therapy in a clinical trial.

Download Full-text

Large-scale study of speech acts' development using automatic labelling

10.31234/osf.io/j4smd ◽

2021 ◽

Author(s):

Mitja Nikolaus ◽

Juliette Maes ◽

Jeremy Auguste ◽

Laurent Prévot ◽

Abdellah Fourtassi

Keyword(s):

Speech Acts ◽

Large Scale ◽

English Language ◽

Language Use ◽

Ground Truth ◽

Small Samples ◽

Coding Scheme ◽

Language Data ◽

In The Wild ◽

Child Caregiver

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio et al., 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Our model will be shared with the community so that researchers can use it with their data to investigate various questions related to language use development.

Download Full-text