The Phantom Pattern Problem

Author(s):  
Gary Smith ◽  
Jay Cordes

Pattern recognition prowess served our ancestors well. However, today we are confronted by a deluge of data that are far more abstract, complicated, and difficult to interpret than were annual seasons and the sounds of predators. The number of possible patterns that can be identified relative to the number that are genuinely useful has grown exponentially—which means that the chances that a discovered pattern is useful is rapidly approaching zero. Coincidental streaks, clusters, and correlations are the norm—not the exception. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful.Computer algorithms can easily identify an essentially unlimited number of phantom patterns and relationships that vanish when confronted with fresh data. The paradox of big data is that the more data we ransack for patterns, the more likely it is that what we find will be worthless. Our challenge is to overcome our inherited inclination to think that all patterns are meaningful.

Author(s):  
Trevor J. Bihl ◽  
William A. Young II ◽  
Gary R. Weckman

Despite the natural advantage humans have for recognizing and interpreting patterns, large and complex datasets, as in Big Data, preclude efficient human analysis. Artificial neural networks (ANNs) provide a family of pattern recognition approaches for prediction, clustering and classification applicable to KDD with ANN model complexity ranging from simple (for small problems) highly complex (for large issues). To provide a starting point for readers, this chapter first describes foundational concepts that relate to ANNs. A listing of commonly used ANN methods, heuristics, and criteria for initializing ANNs is then discussed. Common pre- and post- data processing methods for dimensionality reduction and data quality issues are then described. The authors then provide a tutorial example of ANN analysis. Finally, the authors list and describe applications of ANNs to specific business related endeavors for further reading.


Author(s):  
Trevor J. Bihl ◽  
William A. Young II ◽  
Gary R. Weckman

Despite the natural advantage humans have for recognizing and interpreting patterns, large and complex datasets, as in big data, preclude efficient human analysis. Artificial neural networks (ANNs) provide a family of pattern recognition approaches for prediction, clustering, and classification applicable to KDD with ANN model complexity ranging from simple (for small problems) to highly complex (for large issues). To provide a starting point for readers, this chapter first describes foundational concepts that relate to ANNs. A listing of commonly used ANN methods, heuristics, and criteria for initializing ANNs are then discussed. Common pre- and post-data processing methods for dimensionality reduction and data quality issues are then described. The authors then provide a tutorial example of ANN analysis. Finally, the authors list and describe applications of ANNs to specific business-related endeavors for further reading.


2020 ◽  
pp. 137-152
Author(s):  
Gary Smith ◽  
Jay Cordes

Attempts to replicate reported studies often fail because the research relied on data mining—searching through data for patterns without any pre-specified, coherent theories. The perils of data mining can be exacerbated by data torturing—slicing, dicing, and otherwise mangling data to create patterns. If there is no underlying reason for a pattern, it is likely to disappear when someone attempts to replicate the study. Big data and powerful computers are part of the problem, not the solution, in that they can easily identify an essentially unlimited number of phantom patterns and relationships, which vanish when confronted with fresh data. If a researcher will benefit from a claim, it is likely to be biased. If a claim sounds implausible, it is probably misleading. If the statistical evidence sounds too good to be true, it probably is.


10.1142/10153 ◽  
2016 ◽  
Author(s):  
Amita Pal ◽  
Sankar K Pal
Keyword(s):  

2018 ◽  
Author(s):  
Shefali Setia Verma ◽  
Anurag Verma ◽  
Dokyoon Kim ◽  
Christian Darabos

Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2328 ◽  
Author(s):  
Alireza Entezami ◽  
Hassan Sarmadi ◽  
Behshid Behkamal ◽  
Stefano Mariani

Recent advances in sensor technologies and data acquisition systems opened up the era of big data in the field of structural health monitoring (SHM). Data-driven methods based on statistical pattern recognition provide outstanding opportunities to implement a long-term SHM strategy, by exploiting measured vibration data. However, their main limitation, due to big data or high-dimensional features, is linked to the complex and time-consuming procedures for feature extraction and/or statistical decision-making. To cope with this issue, in this article we propose a strategy based on autoregressive moving average (ARMA) modeling for feature extraction, and on an innovative hybrid divergence-based method for feature classification. Data relevant to a cable-stayed bridge are accounted for to assess the effectiveness and efficiency of the proposed method. The results show that the offered hybrid divergence-based method, in conjunction with ARMA modeling, succeeds in detecting damage in cases strongly characterized by big data.


Sign in / Sign up

Export Citation Format

Share Document