FROM DATA MINING TO BEHAVIOR MINING

High-content screening has brought new dimensions to cellular assays by generating rich data sets that characterize cell populations in great detail and detect subtle phenotypes. To derive relevant, reliable conclusions from these complex data, it is crucial to have informatics tools supporting quality control, data reduction, and data mining. These tools must reconcile the complexity of advanced analysis methods with the user-friendliness demanded by the user community. After review of existing applications, we realized the possibility of adding innovative new analysis options. Phaedra was developed to support workflows for drug screening and target discovery, interact with several laboratory information management systems, and process data generated by a range of techniques including high-content imaging, multicolor flow cytometry, and traditional high-throughput screening assays. The application is modular and flexible, with an interface that can be tuned to specific user roles. It offers user-friendly data visualization and reduction tools for HCS but also integrates Matlab for custom image analysis and the Konstanz Information Miner (KNIME) framework for data mining. Phaedra features efficient JPEG2000 compression and full drill-down functionality from dose-response curves down to individual cells, with exclusion and annotation options, cell classification, statistical quality controls, and reporting.

Download Full-text

Complex Biological Data Mining and Knowledge Discovery

Biotechnology ◽

10.4018/978-1-5225-8903-7.ch011 ◽

2019 ◽

pp. 305-321

Author(s):

Fatima Kabli

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Dna Sequences ◽

Protein Structures ◽

Extraction Process ◽

Biological Data ◽

Knowledge Discovery In Databases ◽

Complex Data ◽

Mining Methods ◽

Scientific Challenge

The mass of data available on the Internet is rapidly increasing; the complexity of this data is discussed at the level of the multiplicity of information sources, formats, modals, and versions. Facing the complexity of biological data, such as the DNA sequences, protein sequences, and protein structures, the biologist cannot simply use the traditional techniques to analyze this type of data. The knowledge extraction process with data mining methods for the analysis and processing of biological complex data is considered a real scientific challenge in the search for systematically potential relationships without prior knowledge of the nature of these relationships. In this chapter, the authors discuss the Knowledge Discovery in Databases process (KDD) from the Biological Data. They specifically present a state of the art of the best known and most effective methods of data mining for analysis of the biological data and problems of bioinformatics related to data mining.

Download Full-text

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.576892 ◽

2021 ◽

Vol 4 ◽

Author(s):

Shailesh Tripathi ◽

David Muhr ◽

Manuel Brunner ◽

Herbert Jodlbauer ◽

Matthias Dehmer ◽

...

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Data Science ◽

Model Development ◽

Data Driven ◽

Practical Implementation ◽

Complex Data ◽

Standard Process ◽

Industry Standard ◽

Address Data

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.

Download Full-text

Innovative Approaches for Efficiently Warehousing Complex Data from the Web

Data Mining ◽

10.4018/978-1-4666-2455-9.ch074 ◽

2013 ◽

pp. 1422-1448

Author(s):

Fadila Bentayeb ◽

Nora Maïz ◽

Hadj Mahboubi ◽

Cécile Favre ◽

Sabine Loudcher ◽

...

Keyword(s):

Data Mining ◽

Decision Support ◽

Data Warehouse ◽

Design Management ◽

Complex Data ◽

Data Warehouses ◽

Process Data ◽

Access Methods ◽

Olap Analysis ◽

The Web

Research in data warehousing and OLAP has produced important technologies for the design, management, and use of Information Systems for decision support. With the development of Internet, the availability of various types of data has increased. Thus, users require applications to help them obtaining knowledge from the Web. One possible solution to facilitate this task is to extract information from the Web, transform and load it to a Web Warehouse, which provides uniform access methods for automatic processing of the data. In this chapter, we present three innovative researches recently introduced to extend the capabilities of decision support systems, namely (1) the use of XML as a logical and physical model for complex data warehouses, (2) associating data mining to OLAP to allow elaborated analysis tasks for complex data and (3) schema evolution in complex data warehouses for personalized analyses. Our contributions cover the main phases of the data warehouse design process: data integration and modeling, and user driven-OLAP analysis.

Download Full-text

Data Mining and Knowledge Discovery Approach for Manufacturing in Process Data Optimization

Research and Development in Intelligent Systems XXXII ◽

10.1007/978-3-319-25032-8_16 ◽

2015 ◽

pp. 203-208 ◽

Cited By ~ 1

Author(s):

Raed S. Batbooti ◽

R. S. Ransing

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Process Data ◽

Data Optimization

Download Full-text

Graph-Based Data Mining

Research and Trends in Data Mining Technologies and Applications ◽

10.4018/978-1-59904-271-8.ch011 ◽

2007 ◽

pp. 291-307

Author(s):

Wenyuan Li ◽

Wee-Keong Ng ◽

Kok-Leong Ong

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Graph Mining ◽

The Other ◽

Complex Data ◽

Graph Problem ◽

Data Graph ◽

Theoretical Results

With the most expressive representation that is able to characterize the complex data, graph mining is an emerging and promising domain in data mining. Meanwhile, the graph has been well studied in a long history with many theoretical results from various foundational fields, such as mathematics, physics, and artificial intelligence. In this chapter, we systematically reviewed theories and techniques newly studied and proposed in these areas. Moreover, we focused on those approaches that are potentially valuable to graph-based data mining. These approaches provide the different perspectives and motivations for this new domain. To illustrate how the method from the other area contributes to graph-based data mining, we did a case study on a classic graph problem that can be widely applied in many application areas. Our results showed that the methods from foundational areas may contribute to graph-based data mining.

Download Full-text

On Business-Oriented Knowledge Discovery and Data Mining

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.760-762.2267 ◽

2013 ◽

Vol 760-762 ◽

pp. 2267-2271

Author(s):

Wu Hao

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Data Structures ◽

Real World ◽

Business Processes ◽

Complex Data ◽

Behavior Detection ◽

Business Modelling ◽

One Stop ◽

Modelling Techniques

This paper will discuss issues in data mining and business processes including Marketing, Finance and Health. In turn, the use of KDD in the complex real-world databases in business and government will push the IT researchers to identify and solve cutting-edge problems in KDD modelling, techniques and processes. From IT perspectives, some issues in economic sciences consist of business modelling and mining, aberrant behavior detection, and health economics. Some issues in KDD include data mining for complex data structures and complex modelling. These novel strategies will be integrated to build a one-stop KDD system.

Download Full-text

Penerapan Metode Clustering Dengan Algoritma K-Means Tindak Kejahatan Pencurian di Kabupaten Asahan

J-Com (Journal of Computer) ◽

10.33330/j-com.v1i1.1065 ◽

2021 ◽

Vol 1 (1) ◽

pp. 7-14

Author(s):

Nur Afni Syahpitri Damanik ◽

Irianto Irianto ◽

Dahriansah Dahriansah

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Historical Data ◽

Large Data ◽

Large Data Sets ◽

Annual Data ◽

Data Sets ◽

Process Data ◽

Different Characteristics

Abstract:Theft is the illegal taking of property or belongings of another person without the permission of the owner. The most common crime problem in Asahan District is theft, so that the POLRES is still having trouble determining which areas are often the crime of theft. With this problem, we need to do a grouping for areas where theft often occurs, so the process used is the data mining process. Data mining is one of the processes of Knowledge Discovery from Databases (KDD). KDD is an activity that includes collecting, using historical data to find regularities, patterns or relationships in large data sets. One of the techniques known in data mining is clustering technique. The K-Means method is a method for clustering techniques, K- Means is a method that partitions data into groups so that data with the same characteristics are entered into the same set of groups and data with different characteristics are grouped into other groups. The attributes used in grouping this data are annual data, namely 2015, 2016, 2017, 2018, 2019. A case study of 9 POLSEK in the Asahan. Keywords: Data Mining, Clustering, K-Means Algorithm, Theft Crimes Grouping. Abstrak: Pencurian merupakan pengambilan properti atau barang milik orang lain secara tidak sah tanpa ijin dari pemilik. Masalah tindak kejahatan yang paling banyak terjadi di Kabupaten Asahan adalah tindak kejahatan pencurian sehingga pihak POLRES masih kesulitan untuk menentukan daerah mana saja yang sering terjadi tindak kejahatan pencuriaan. Dengan adanya masalah ini kita perlu melakukan pengelompokan untuk daerah mana saja yang sering terjadi tindak pencurian maka proses yang digunakan adalah proses data mining. Data mining adalah salah satu proses dari Knowledge Discovery from Databases (KDD). KDD adalah kegiatan yang meliputi pengumpulan, pemakaian data, historis untuk menemukan keteraturan, pola atau hubungan dalam set data besar. Salah satu teknik yang di kenal dalam data mining adalah teknik clustering. Metode K-Means merupakan metode untuk teknik clustering, K-Means adalah metode yang mempartisi data kedalam kelompok sehingga data berkarakteristik sama dimasukan kedalam set kelompok yang sama dan data yang berkerakteristik berbeda dikelompokkan ke dalam kelompok yang lain. Atribut yang di gunakan dalam pengelomokan data ini adalah data pertahun yaitu tahun 2015, 2016, 2017, 2018, 2019. Studi kasus pada 9 POLSEK yang ada di daerah kabupaten Asahan. Kata kunci: Data Mining, Clustering, Algoritma K-Means, Pengelompokan Tindak Kejahatan Pencurian.

Download Full-text

Complex Biological Data Mining and Knowledge Discovery

Handbook of Research on Biomimicry in Information Retrieval and Knowledge Management - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-3004-6.ch016 ◽

2018 ◽

pp. 303-320

Author(s):

Fatima Kabli

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Dna Sequences ◽

Protein Structures ◽

Extraction Process ◽

Biological Data ◽

Knowledge Discovery In Databases ◽

Complex Data ◽

Mining Methods ◽

Scientific Challenge

The mass of data available on the Internet is rapidly increasing; the complexity of this data is discussed at the level of the multiplicity of information sources, formats, modals, and versions. Facing the complexity of biological data, such as the DNA sequences, protein sequences, and protein structures, the biologist cannot simply use the traditional techniques to analyze this type of data. The knowledge extraction process with data mining methods for the analysis and processing of biological complex data is considered a real scientific challenge in the search for systematically potential relationships without prior knowledge of the nature of these relationships. In this chapter, the authors discuss the Knowledge Discovery in Databases process (KDD) from the Biological Data. They specifically present a state of the art of the best known and most effective methods of data mining for analysis of the biological data and problems of bioinformatics related to data mining.

Download Full-text

Innovative Approaches for Efficiently Warehousing Complex Data from the Web

Business Intelligence Applications and the Web - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-61350-038-5.ch002 ◽

2011 ◽

pp. 26-52 ◽

Cited By ~ 2

Author(s):

Fadila Bentayeb ◽

Nora Maïz ◽

Hadj Mahboubi ◽

Cécile Favre ◽

Sabine Loudcher ◽

...

Keyword(s):

Data Mining ◽

Decision Support ◽

Data Warehouse ◽

Design Management ◽

Complex Data ◽

Data Warehouses ◽

Process Data ◽

Access Methods ◽

Olap Analysis ◽

The Web

Research in data warehousing and OLAP has produced important technologies for the design, management, and use of Information Systems for decision support. With the development of Internet, the availability of various types of data has increased. Thus, users require applications to help them obtaining knowledge from the Web. One possible solution to facilitate this task is to extract information from the Web, transform and load it to a Web Warehouse, which provides uniform access methods for automatic processing of the data. In this chapter, we present three innovative researches recently introduced to extend the capabilities of decision support systems, namely (1) the use of XML as a logical and physical model for complex data warehouses, (2) associating data mining to OLAP to allow elaborated analysis tasks for complex data and (3) schema evolution in complex data warehouses for personalized analyses. Our contributions cover the main phases of the data warehouse design process: data integration and modeling, and user driven-OLAP analysis.

Download Full-text