scholarly journals Limitations of CNNs for Approximating the Ideal Observer Despite Quantity of Training Data or Depth of Network

Author(s):  
Khalid Omer ◽  
Luca Caucci ◽  
Meredith Kupinski

The performance of a convolutional neural network (CNN) on an image texture detection task as a function of linear image processing and the number of training images is investigated. Performance is quantified by the area under (AUC) the receiver operating characteristic (ROC) curve. The Ideal Observer (IO) maximizes AUC but depends on high-dimensional image likelihoods. In many cases, the CNN performance can approximate the IO performance. This work demonstrates counterexamples where a full-rank linear transform degrades the CNN performance below the IO in the limit of large quantities of training data and network layers. A subsequent linear transform changes the images’ correlation structure, improves the AUC, and again demonstrates the CNN dependence on linear processing. Compression strictly decreases or maintains the IO detection performance while compression can increase the CNN performance especially for small quantities of training data. Results indicate an optimal compression ratio for the CNN based on task difficulty, compression method, and number of training images.

2020 ◽  
Vol 2020 (10) ◽  
pp. 310-1-310-7
Author(s):  
Khalid Omer ◽  
Luca Caucci ◽  
Meredith Kupinski

This work reports on convolutional neural network (CNN) performance on an image texture classification task as a function of linear image processing and number of training images. Detection performance of single and multi-layer CNNs (sCNN/mCNN) are compared to optimal observers. Performance is quantified by the area under the receiver operating characteristic (ROC) curve, also known as the AUC. For perfect detection AUC = 1.0 and AUC = 0.5 for guessing. The Ideal Observer (IO) maximizes AUC but is prohibitive in practice because it depends on high-dimensional image likelihoods. The IO performance is invariant to any fullrank, invertible linear image processing. This work demonstrates the existence of full-rank, invertible linear transforms that can degrade both sCNN and mCNN even in the limit of large quantities of training data. A subsequent invertible linear transform changes the images’ correlation structure again and can improve this AUC. Stationary textures sampled from zero mean and unequal covariance Gaussian distributions allow closed-form analytic expressions for the IO and optimal linear compression. Linear compression is a mitigation technique for high-dimension low sample size (HDLSS) applications. By definition, compression strictly decreases or maintains IO detection performance. For small quantities of training data, linear image compression prior to the sCNN architecture can increase AUC from 0.56 to 0.93. Results indicate an optimal compression ratio for CNN based on task difficulty, compression method, and number of training images.


2020 ◽  
Vol 2020 (16) ◽  
pp. 41-1-41-7
Author(s):  
Orit Skorka ◽  
Paul J. Kane

Many of the metrics developed for informational imaging are useful in automotive imaging, since many of the tasks – for example, object detection and identification – are similar. This work discusses sensor characterization parameters for the Ideal Observer SNR model, and elaborates on the noise power spectrum. It presents cross-correlation analysis results for matched-filter detection of a tribar pattern in sets of resolution target images that were captured with three image sensors over a range of illumination levels. Lastly, the work compares the crosscorrelation data to predictions made by the Ideal Observer Model and demonstrates good agreement between the two methods on relative evaluation of detection capabilities.


2015 ◽  
Vol 114 (6) ◽  
pp. 3076-3096 ◽  
Author(s):  
Ryan M. Peters ◽  
Phillip Staibano ◽  
Daniel Goldreich

The ability to resolve the orientation of edges is crucial to daily tactile and sensorimotor function, yet the means by which edge perception occurs is not well understood. Primate cortical area 3b neurons have diverse receptive field (RF) spatial structures that may participate in edge orientation perception. We evaluated five candidate RF models for macaque area 3b neurons, previously recorded while an oriented bar contacted the monkey's fingertip. We used a Bayesian classifier to assign each neuron a best-fit RF structure. We generated predictions for human performance by implementing an ideal observer that optimally decoded stimulus-evoked spike counts in the model neurons. The ideal observer predicted a saturating reduction in bar orientation discrimination threshold with increasing bar length. We tested 24 humans on an automated, precision-controlled bar orientation discrimination task and observed performance consistent with that predicted. We next queried the ideal observer to discover the RF structure and number of cortical neurons that best matched each participant's performance. Human perception was matched with a median of 24 model neurons firing throughout a 1-s period. The 10 lowest-performing participants were fit with RFs lacking inhibitory sidebands, whereas 12 of the 14 higher-performing participants were fit with RFs containing inhibitory sidebands. Participants whose discrimination improved as bar length increased to 10 mm were fit with longer RFs; those who performed well on the 2-mm bar, with narrower RFs. These results suggest plausible RF features and computational strategies underlying tactile spatial perception and may have implications for perceptual learning.


2001 ◽  
Author(s):  
Hongbin Zhang ◽  
Eric Clarkson ◽  
Harrison H. Barrett

2015 ◽  
Vol 15 (12) ◽  
pp. 1341
Author(s):  
Steven Shimozaki ◽  
Eleanor Swan ◽  
Claire Hutchinson ◽  
Jaspreet Mahal

Sign in / Sign up

Export Citation Format

Share Document