Biological Data Mining

2008 ◽  
pp. 1696-1705
Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

At the end of the 1980s, a new discipline named data mining emerged. The introduction of new technologies such as computers, satellites, new mass storage media, and many others have lead to an exponential growth of collected data. Traditional data analysis techniques often fail to process large amounts of, often noisy, data efficiently in an exploratory fashion. The scope of data mining is the knowledge extraction from large data amounts with the help of computers. It is an interdisciplinary area of research that has its roots in databases, machine learning, and statistics and has contributions from many other areas such as information retrieval, pattern recognition, visualization, parallel and distributed computing. There are many applications of data mining in the real world. Customer relationship management, fraud detection, market and industry characterization, stock management, medicine, pharmacology, and biology are some examples (Two Crows Corporation, 1999).

Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

At the end of the 1980s, a new discipline named data mining emerged. The introduction of new technologies such as computers, satellites, new mass storage media, and many others have lead to an exponential growth of collected data. Traditional data analysis techniques often fail to process large amounts of, often noisy, data efficiently in an exploratory fashion. The scope of data mining is the knowledge extraction from large data amounts with the help of computers. It is an interdisciplinary area of research that has its roots in databases, machine learning, and statistics and has contributions from many other areas such as information retrieval, pattern recognition, visualization, parallel and distributed computing. There are many applications of data mining in the real world. Customer relationship management, fraud detection, market and industry characterization, stock management, medicine, pharmacology, and biology are some examples (Two Crows Corporation, 1999).


Author(s):  
Anjan Mukherjee ◽  
Ajoy Kanti Das

In this chapter, the authors introduce a new sequence of fuzzy soft multi sets in fuzzy soft multi topological spaces and their basic properties are studied. The concepts of subsequence, convergence sequence and cluster fuzzy soft multi sets of fuzzy soft multi sets are proposed. Actually Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data mining and a common technique for statistical data analysis used in many fields including machine learning, pattern recognition, image analysis, information retrieval and bioinformatics. Here the authors define the notions of net and filter and establish the correspondence between net convergence and filter convergence in fuzzy soft multi topological spaces.


2020 ◽  
Vol 4 (3) ◽  
pp. 88 ◽  
Author(s):  
Vadim Kapp ◽  
Marvin Carl May ◽  
Gisela Lanza ◽  
Thorsten Wuest

This paper presents a framework to utilize multivariate time series data to automatically identify reoccurring events, e.g., resembling failure patterns in real-world manufacturing data by combining selected data mining techniques. The use case revolves around the auxiliary polymer manufacturing process of drying and feeding plastic granulate to extrusion or injection molding machines. The overall framework presented in this paper includes a comparison of two different approaches towards the identification of unique patterns in the real-world industrial data set. The first approach uses a subsequent heuristic segmentation and clustering approach, the second branch features a collaborative method with a built-in time dependency structure at its core (TICC). Both alternatives have been facilitated by a standard principle component analysis PCA (feature fusion) and a hyperparameter optimization (TPE) approach. The performance of the corresponding approaches was evaluated through established and commonly accepted metrics in the field of (unsupervised) machine learning. The results suggest the existence of several common failure sources (patterns) for the machine. Insights such as these automatically detected events can be harnessed to develop an advanced monitoring method to predict upcoming failures, ultimately reducing unplanned machine downtime in the future.


2014 ◽  
Vol 556-562 ◽  
pp. 3949-3951
Author(s):  
Jian Xin Zhu

Data mining is a technique that aims to analyze and understand large source data reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions.


Author(s):  
Imad Rahal ◽  
Baoying Wang ◽  
James Schnepf

Since the invention of the printing press, text has been the predominate mode for collecting, storing and disseminating a vast, rich range of information. With the unprecedented increase of electronic storage and dissemination, document collections have grown rapidly, increasing the need to manage and analyze this form of data in spite of its unstructured or semistructured form. Text-data analysis (Hearst, 1999) has emerged as an interdisciplinary research area forming a junction of a number of older fields like machine learning, natural language processing, and information retrieval (Grobelnik, Mladenic, & Milic-Frayling, 2000). It is sometimes viewed as an adapted form of a very similar research field that has also emerged recently, namely, data mining, which focuses primarily on structured data mostly represented in relational tables or multidimensional cubes. This article provides an overview of the various research directions in text-data analysis. After the “Introduction,” the “Background” section provides a description of a ubiquitous text-data representation model along with preprocessing steps employed for achieving better text-data representations and applications. The focal section, “Text-Data Analysis,” presents a detailed treatment of various text-data analysis subprocesses such as information extraction, information retrieval and information filtering, document clustering and document categorization. The article closes with a “Future Trends” section followed by a “Conclusion” section.


2014 ◽  
Vol 70 (5) ◽  
Author(s):  
Mohammad Babrdelbonb ◽  
Siti Zaiton Mohd Hashim Mohd Hashim ◽  
Nor Erne Nazira Bazin

Data Clustering is one of the most used methods of data mining. The k-means Clustering Approach is one of the main algorithms in the literature of Pattern Recognition and Data Machine Learning which it very popular because of its simple application and high operational speed. But some obstacles such as the adherence of results to initial cluster centers or the risk of getting trapped  into local optimality hinders its performance. In this paper, inspired by the Imperialist Competitive Algorithm based on the k-means method, a new approach is developed, in which cluster centers are selected and computed appropriately. The Imperialist Competitive Algorithm (ICA) is a method in the field of evolutionary computations, trying to find the optimum solution for diverse optimization problems. The underlying traits of this algorithm are taken from the evolutionary process of social, economic and political development of countries so that by partly mathematical modeling of this process some operators are obtained in regular algorithmic forms. The investigated results of the suggested   approach over using standard data sets and comparing it with alternative methods in the literature reveals out that the proposed algorithm outperforms the k-means algorithm and other candidate algorithms in the pool.  


Author(s):  
Dharmpal Singh

Social media are based on computer-mediated technologies that smooth the progress of the creation and distribution of information, thoughts, idea, career benefits and other forms of expression via implicit communities and networks. The social network analysis (SNA) has emerged with the increasing popularity of social networking services like Facebook, Twitter, etc. Therefore, information about group cohesion, contribution in activities, and associations among subjects can be obtained from the analysis of the blogs. The analysis of the blogs required well-known knowledge discovery tools to help the administrator to discover participant collaborative activities or patterns with inferences to improve the learning and sharing process. Therefore, the goal of this chapter is to provide the data mining tools for information retrieval, statistical modelling and machine learning to employ data pre-processing, data analysis, and data interpretation processes to support the use of social network analysis (SNA) to improve the collaborative activities for better performance.


Author(s):  
Kapil Patidar ◽  
Manoj Kumar ◽  
Sushil Kumar

In real world data increased periodically, huge amount of data is called Big data. It is a well-known term used to define the exponential growth of data, both in structured and unstructured format. Data analysis is a method of cleaning, altering, learning valuable statistics, decision making and advising assumption with the help of many algorithm and procedure such as classification and clustering. In this chapter we discuss about big data analysis using soft computing technique and propose how to pair two different approaches like evolutionary algorithm and machine learning approach and try to find better cause.


2020 ◽  
Author(s):  
Svetlana Simić ◽  
Zorana Banković ◽  
José R Villar ◽  
Dragan Simić ◽  
Svetislav D Simić

Abstract Clustering is one of the most fundamental and essential data analysis tasks with broad applications. It has been studied in various research fields: data mining, machine learning, pattern recognition and in engineering, economics and biomedical data analysis. Headache is not a disease that typically shortens one’s life, but it can be a serious social as well as a health problem. Approximately 27 billion euros per year are lost through reduced work productivity in the European community. This paper is focused on a new strategy based on a hybrid model for combining fuzzy partition method and maximum likelihood estimation clustering algorithm for diagnosing primary headache disorder. The proposed hybrid system is tested on two data sets for diagnosing headache disorder collected from Clinical Centre of Vojvodina in Serbia.


Author(s):  
Tatiana V. Sambukova

The work is devoted to the decision of two interconnected key problems of Data Mining: discretization of numerical attributes, and inferring pattern recognition rules (decision rules) from training set of examples with the use of machine learning methods. The method of discretization is based on a learning procedure of extracting attribute values’ intervals the bounds of which are chosen in such a manner that the distributions of attribute’s values inside of these intervals should differ in the most possible degree for two classes of samples given by an expert. The number of intervals is defined to be not more than 3. The application of interval data analysis allowed more fully than by traditional statistical methods of comparing distributions of data sets to describe the functional state of persons in healthy condition depending on the absence or presence in their life of the episodes of secondary deficiency of their immunity system. The interval data analysis gives the possibility (1) to make the procedure of discretization to be clear and controlled by an expert, (2) to evaluate the information gain index of attributes with respect to the distinguishing of given classes of persons before any machine learning procedure (3) to decrease crucially the machine learning computational complexity.


Sign in / Sign up

Export Citation Format

Share Document