Cluster Validation

Author(s):  
Ricardo Vilalta ◽  
Tomasz Stepinski

Spacecrafts orbiting a selected suite of planets and moons of our solar system are continuously sending long sequences of data back to Earth. The availability of such data provides an opportunity to invoke tools from machine learning and pattern recognition to extract patterns that can help to understand geological processes shaping planetary surfaces. Due to the marked interest of the scientific community on this particular planet, we base our current discussion on Mars, where there are presently three spacecrafts in orbit (e.g., NASA’s Mars Odyssey Orbiter, Mars Reconnaissance Orbiter, ESA’s Mars Express). Despite the abundance of available data describing Martian surface, only a small fraction of the data is being analyzed in detail because current techniques for data analysis of planetary surfaces rely on a simple visual inspection and descriptive characterization of surface landforms (Wilhelms, 1990). The demand for automated analysis of Mars surface has prompted the use of machine learning and pattern recognition tools to generate geomorphic maps, which are thematic maps of landforms (or topographical expressions). Examples of landforms are craters, valley networks, hills, basins, etc. Machine learning can play a vital role in automating the process of geomorphic mapping. A learning system can be employed to either fully automate the process of discovering meaningful landform classes using clustering techniques; or it can be used instead to predict the class of unlabeled landforms (after an expert has manually labeled a representative sample of the landforms) using classification techniques. The impact of these techniques on the analysis of Mars topography can be of immense value due to the sheer size of the Martian surface that remains unmapped. While it is now clear that machine learning can greatly help in automating the detailed analysis of Mars’ surface (Stepinski et al., 2007; Stepinski et al., 2006; Bue and Stepinski, 2006; Stepinski and Vilalta, 2005), an interesting problem, however, arises when an automated data analysis has produced a novel classification of a specific site’s landforms. The problem lies on the interpretation of this new classification as compared to traditionally derived classifications generated through visual inspection by domain experts. Is the new classification novel in all senses? Is the new classification only partially novel, with many landforms matching existing classifications? This article discusses how to assess the value of clusters generated by machine learning tools as applied to the analysis of Mars’ surface.

2020 ◽  
Vol 30 (3) ◽  
pp. 112-126
Author(s):  
S. V. Palmov

Data analysis carried out by machine learning tools has covered almost all areas of human activity. This is due to a large amount of data that needs to be processed in order, for example, to predict the occurrence of specific events (an emergency, a customer contacting the organization’s technical support, a natural disaster, etc.) or to formulate recommendations regarding interaction with a certain group of people (personalized offers for the customer, a person’s reaction to advertising, etc.). The paper deals with the possibilities of the Multitool analytical system, created based on the machine learning method «decision tree», in terms of building predictive models that are suitable for solving data analysis problems in practical use. For this purpose, a series of ten experiments was conducted, in which the results generated by the system were evaluated in terms of their reliability and robustness using five criteria: arithmetic mean, standard deviation, variance, probability, and F-measure. As a result, it was found that Multitool, despite its limited functionality, allows creating predictive models of sufficient quality and suitable for practical use.


Author(s):  
Bo-Wei Chen ◽  
Jia-Ching Wang

This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.


Author(s):  
Khalid K. Al-jabery ◽  
Tayo Obafemi-Ajayi ◽  
Gayla R. Olbricht ◽  
Donald C. Wunsch II

2019 ◽  
Vol 15 (10) ◽  
pp. 155014771988160 ◽  
Author(s):  
Jersson X Leon-Medina ◽  
Leydi J Cardenas-Flechas ◽  
Diego A Tibaduiza

Electronic tongue-type sensor arrays are devices used to determine the quality of substances and seek to imitate the main components of the human sense of taste. For this purpose, an electronic tongue-based system makes use of sensors, data acquisition systems, and a pattern recognition system. Particularly, in the latter, machine learning techniques are useful in data analysis and have been used to solve classification and regression problems. However, one of the problems in the use of this kind of device is associated with the development of reliable pattern recognition algorithms and robust data analysis. In this sense, this work introduces a taste recognition methodology, which is composed of several steps including unfolding data, data normalization, principal component analysis for compressing the data, and classification through different machine learning models. The proposed methodology is tested using data from an electronic tongue with 13 different liquid substances; this electronic tongue uses multifrequency large amplitude pulse signal voltammetry. Results show that the methodology is able to perform the classification accurately and the best results are obtained when it includes the use of K-nearest neighbor machine in terms of accuracy compared with other kinds of machine learning approaches. Besides, the comparison to evaluate the methodology is made with different classification performance measures that show the behavior of the process in a single number.


2021 ◽  
Author(s):  
Andrew Imrie ◽  

Cement bond log interpretation methods consist of human pattern recognition and evaluation of the quality of the downhole isolation. Typically, a log interpreter compares acquisition data to their predefined classifications of cement bond quality. This paper outlines a complementary technique of intelligent cement evaluation and the implementation of the analysis of cement evaluation data by utilizing automatic pattern matching and machine learning. The proposed method is capable of defining bond quality across multiple distinct subclassification through analysis of image data using pattern recognition. Libraries of real log responses are used as comparisons to input data, and additionally may be supplemented with synthetic data. Using machine learning and image-based pattern recognition, the bond quality is classified into succinct categories to determine the presence of channeling. Successful classifications of the input data can then be added to the libraries, thus improving future analysis through an iterative process. The system uses the outputs of a conventional azimuthal ultrasonic scanning cement evaluation log and 5-ft CBL waveform to conclude a cement bond interpretation. The 5-ft CBL waveform is an optional addition to the processand improves the interpretation. The system searches forsimilarities between the acquisition data and thatcontained in the library. These similarities are comparedto evaluate the bonding. The process is described in two parts: i) image collection and library classification and ii) pattern recognition and interpretation. The former is the process of generating a readable library of reference data from historical cement evaluation logs and laboratory measurements and the latter is the machine learning and comparison method. Example results are shown with good correlations between automated analysis and interpreter analysis. The system is shown to be particularly capable at the automated identification of channeling of varying sizes, something which would be a challenge when using only the scalar curve representation of azimuthal data. Previously published methodologies for automated classification of bond quality typically utilize scaler data whereas this approach utilizes image-based pattern recognition for automated, learning and intelligent cement evaluation (ALICE). A discussion is presented on the limitations and merits of the ALICE process which include quality control, the removal of analyst bias during interpretation, and the fact that such a system will continually improve in accuracy through supervised training.


Author(s):  
Sabrina Bagnato ◽  
Antonina Barreca ◽  
Roberta Costantini ◽  
Francesca Quintiliani

The current uncertain, dynamic scenario calls for a systemic perspective when referring to organizational complexity and behavior. Our research contributes to the analysis of organizational complexity through multidimensional behavioral mapping. Our method uses machine learning tools to detect the interconnections between the different behaviors of a person in his/her operating context. First, the research project dealt with prototyping a model to read the organizational behavior, the related detection tool, and a data analysis methodology. It used machine learning tools and ended with a data visualization phase. We set our model to read the organizational behavior by comparing the literature benchmark theories with our field experience. The model was organized around 4 areas and 16 behaviors. These were the basis for singling out the indicators and the questionnaire items. The data analysis methodology aimed at detecting the interconnections between behaviors. We designed it by joining univariate analysis with a multivariate technique based on the application of machine learning tools. This led to a high-resolution network map through three specific steps: (a) creating a multidimensional topology based on a Kohonen Map (a type of unsupervised learning artificial neural network) to geometrically represent behavioral relationships; (b) implementing k-means clustering for identifying which areas of the map have behavior similarity or affinity factors; and (c) locating people and the various identified clusters within the map. The research highlighted the validity of machine learning tools to detect the multidimensionality of organizational behavior. Therefore, we could delineate the networking of the observed elements and visualize an otherwise unattainable complexity through multimedia and interactive reporting. Application in the field of research consisted of the design and development of a prototype integrated with our LMS platform via a plugin. Field experimentation confirmed the effectiveness of the method for creating professional growth and development paths. Furthermore, this experimentation allowed us to obtain significant data by applying our model to several sectors, namely pharmaceutical, TLC, banking, automotive, machinery, and services.


Science ◽  
2019 ◽  
Vol 366 (6468) ◽  
pp. 999-1004 ◽  
Author(s):  
Philip S. Thomas ◽  
Bruno Castro da Silva ◽  
Andrew G. Barto ◽  
Stephen Giguere ◽  
Yuriy Brun ◽  
...  

Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior—that they do not, for example, cause harm to humans—is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.


Author(s):  
ROBERTO TAGLIAFERRI ◽  
FRANCESCO IORIO ◽  
FRANCESCO NAPOLITANO ◽  
GIANCARLO RAICONI ◽  
GENNARO MIELE

2020 ◽  
Vol 14 (1) ◽  
pp. 6
Author(s):  
Dianna McAllister ◽  
Mauro Mendez ◽  
Ariana Bermúdez ◽  
Pascal Tyrrell

Introduction: Convolutional neural networks (CNNs) are machine learning tools that have great potential in the field of medical imaging. However, it is often regarded as a “black box” as the process that is used by the machine to acquire a result is not transparent. It would be valuable to find a method to be able to understand how the machine comes to its decision. Therefore, the purpose of this study is to examine how effective gradient-weighted class activation mapping (grad-CAM) visualizations are for certain layers in a CNN-based dental x-ray artifact prediction model. Methods: To tackle this project, Python code using PyTorch trained a CNN to classify dental plates as unusable or usable depending on the presence of artifacts. Furthermore, Python using PyTorch was also used to overlay grad-CAM visualizations on the given input images for various layers within the model. One image with seventeen different overlays of artifacts was used in this study. Results: In earlier layers, the model appeared to focus on general features such as lines and edges of the teeth, while in later layers, the model was more interested in detailed aspects of the image. All images that contained artifacts resulted in the model focusing on more detailed areas of the image rather than the artifacts themselves. Whereas the images without artifacts resulted in the model focusing on the visualization of areas that surrounded the teeth. Discussion and Conclusion: As subsequent layers examined more detailed aspects of the image as shown by the grad-CAM visualizations, they provided better insight into how the model processes information when it is making its classifications. Since all the images with artifacts showed similar trends in the visualizations of the various layers, it provides evidence to suggest that the location and size of the artifact does not affect the model’s pattern recognition and image classification.


Sign in / Sign up

Export Citation Format

Share Document