Machine Learning and Data Mining in Pattern Recognition

2005 ◽  
2020 ◽  
Author(s):  
Yasuhiro Date ◽  
Feifei Wei ◽  
Yuuri Tsuboi ◽  
Kengo Ito ◽  
Kenji Sakata ◽  
...  

Abstract Nuclear magnetic resonance (NMR)-based relaxometry is widely used in various fields of research because of its advantages such as simple sample preparation, easy handling, and relatively low cost compared with metabolomics approaches. However, there have been no reports on the application of the T2 relaxation curves in metabolomics studies involving the evaluation of metabolic mixtures, such as geographical origin determination and feature extraction by pattern recognition and data mining. In this study, we describe a data mining method for relaxometric data (i.e., relaxometric learning). This method is based on a machine learning algorithm supported by the analytical framework optimized for the relaxation curve analyses. In the analytical framework, we incorporated a variable optimization approach and bootstrap resampling-based matrixing to enhance the classification performance and balance the sample size between groups, respectively. The relaxometric learning enabled the extraction of features related to the physical properties of fish muscle and the determination of the geographical origin of the fish by improving the classification performance. Our results suggest that relaxometric learning is a powerful and versatile alternative to conventional metabolomics approaches for evaluating fleshiness of chemical mixtures in food and for other biological and chemical research requiring a nondestructive, cost-effective, and time-saving method.


Author(s):  
Yan Zhao ◽  
Yiyu Yao

Classification is one of the main tasks in machine learning, data mining, and pattern recognition. Compared with the extensively studied automation approaches, the interactive approaches, centered on human users, are less explored. This chapter studies interactive classification at 3 levels. At the philosophical level, the motivations and a process-based framework of interactive classification are proposed. At the technical level, a granular computing model is suggested for re-examining not only existing classification problems, but also interactive classification problems. At the application level, an interactive classification system (ICS), using a granule network as the search space, is introduced. ICS allows multi-strategies for granule tree construction, and enhances the understanding and interpretation of the classification process. Interactive classification is complementary to the existing classification methods.


Author(s):  
Giovanni Felici ◽  
Klaus Truemper

The method described in this chapter is designed for data mining and learning on logic data. This type of data is composed of records that can be described by the presence or absence of a finite number of properties. Formally, such records can be described by variables that may assume only the values true or false, usually referred to as logic (or Boolean) variables. In real applications, it may also happen that the presence or absence of some property cannot be verified for some record; in such a case we consider that variable to be unknown (the capability to treat formally data with missing values is a feature of logic-based methods). For example, to describe patient records in medical diagnosis applications, one may use the logic variables healthy, old, has_high_temperature, among many others. A very common data mining task is to find, based on training data, the rules that separate two subsets of the available records, or explains the belonging of the data to one subset or the other. For example, one may desire to find a rule that, based one the many variables observed in patient records, is able to distinguish healthy patients from sick ones. Such a rule, if sufficiently precise, may then be used to classify new data and/or to gain information from the available data. This task is often referred to as machine learning or pattern recognition and accounts for a significant portion of the research conducted in the data mining community. When the data considered is in logic form or can be transformed into it by some reasonable process, it is of great interest to determine explanatory rules in the form of the combination of logic variables, or logic formulas. In the example above, a rule derived from data could be:if (has_high_temperature is true) and (running_nose is true) then (the patient is not healthy).


Author(s):  
Adrian Mackenzie

Contemporary attempts to find patterns in data, ranging from the now mundane technologies of hand-writing recognition through to mammoth infrastructure-heavy practices of deep learning conducted by major business and government actors, rely on a group of techniques intensively developed during the 1950-60s in physics, engineering and psychology. Whether we designate them as pattern recognition, data mining, or machine learning, these techniques all seek to uncover patterns in data that cannot appear directly to the human eye, either because there are too many items for anyone to look at, or because the patterns are too subtly woven through in the data. From the techniques in current use, three developed in the Cold War era iconify contemporary modes of pattern finding: Monte Carlo simulation, gradient descent, and clustering algorithms that search for groups or clusters in data. Each of these techniques implements a different mode of pattern, and these different modes of pattern recognition flow through into contemporary scientific, technological, business and governmental problematizations. The different perspectives on event, trajectory, and proximity they embody imbue many power relations, forms of value and the play of truth/falsehood today.


BMC Chemistry ◽  
2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Yasuhiro Date ◽  
Feifei Wei ◽  
Yuuri Tsuboi ◽  
Kengo Ito ◽  
Kenji Sakata ◽  
...  

AbstractNuclear magnetic resonance (NMR)-based relaxometry is widely used in various fields of research because of its advantages such as simple sample preparation, easy handling, and relatively low cost compared with metabolomics approaches. However, there have been no reports on the application of the T2 relaxation curves in metabolomics studies involving the evaluation of metabolic mixtures, such as geographical origin determination and feature extraction by pattern recognition and data mining. In this study, we describe a data mining method for relaxometric data (i.e., relaxometric learning). This method is based on a machine learning algorithm supported by the analytical framework optimized for the relaxation curve analyses. In the analytical framework, we incorporated a variable optimization approach and bootstrap resampling-based matrixing to enhance the classification performance and balance the sample size between groups, respectively. The relaxometric learning enabled the extraction of features related to the physical properties of fish muscle and the determination of the geographical origin of the fish by improving the classification performance. Our results suggest that relaxometric learning is a powerful and versatile alternative to conventional metabolomics approaches for evaluating fleshiness of chemical mixtures in food and for other biological and chemical research requiring a nondestructive, cost-effective, and time-saving method.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Kai Zeng ◽  
Siyuan Jing

Rough set theory has been successfully applied to many fields, such as data mining, pattern recognition, and machine learning. Kernel rough sets and neighborhood rough sets are two important models that differ in terms of granulation. The kernel rough sets model, which has fuzziness, is susceptible to noise in the decision system. The neighborhood rough sets model can handle noisy data well but cannot describe the fuzziness of the samples. In this study, we define a novel model called kernel neighborhood rough sets, which integrates the advantages of the neighborhood and kernel models. Moreover, the model is used in the problem of feature selection. The proposed method is tested on the UCI datasets. The results show that our model outperforms classic models.


Sign in / Sign up

Export Citation Format

Share Document