Implementation and Evaluation of Diabetes Management System Using Clustering Technique

Data mining is a field of computer science which is used to discover new patterns for large data sets. Clustering is the task of discovering groups and structures in the data that are in some way or another similar without using known structures of data. Most of this data is temporal in nature. Data mining and business intelligence techniques are often used to discover patterns in such data; however, mining temporal relationships typically is a complex task. The paper proposes a data analysis and visualization technique for representing trends in temporal data using a clustering based approach by using a system that implements the cluster graph construct, which maps data to a two-dimensional directed graph that identifies trends in dominant data types over time. In this paper, a clustering-based technique is used, to visualize temporal data to identifying trends for controlling diabetes mellitus. Given the complexity of chronic disease prevention, diabetes risk prevention and assessment may be critical area for improving clinical decision support. Information visualization utilizes high processing capabilities of the human visual system to reveal patterns in data that are not so clear in non-visual data analysis.

Download Full-text

ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.1 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 2

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

Audio and Speech Processing for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch017 ◽

2011 ◽

pp. 98-103 ◽

Cited By ~ 1

Author(s):

Zheng-Hua Tan

Keyword(s):

Data Mining ◽

Speech Processing ◽

Large Data ◽

Data Preprocessing ◽

Multimedia Data ◽

Data Sets ◽

Data Types ◽

Multimedia Data Mining ◽

Customer Preferences ◽

And Storage

The explosive increase in computing power, network bandwidth and storage capacity has largely facilitated the production, transmission and storage of multimedia data. Compared to alpha-numeric database, non-text media such as audio, image and video are different in that they are unstructured by nature, and although containing rich information, they are not quite as expressive from the viewpoint of a contemporary computer. As a consequence, an overwhelming amount of data is created and then left unstructured and inaccessible, boosting the desire for efficient content management of these data. This has become a driving force of multimedia research and development, and has lead to a new field termed multimedia data mining. While text mining is relatively mature, mining information from non-text media is still in its infancy, but holds much promise for the future. In general, data mining the process of applying analytical approaches to large data sets to discover implicit, previously unknown, and potentially useful information. This process often involves three steps: data preprocessing, data mining and postprocessing (Tan, Steinbach, & Kumar, 2005). The first step is to transform the raw data into a more suitable format for subsequent data mining. The second step conducts the actual mining while the last one is implemented to validate and interpret the mining results. Data preprocessing is a broad area and is the part in data mining where essential techniques are highly dependent on data types. Different from textual data, which is typically based on a written language, image, video and some audio are inherently non-linguistic. Speech as a spoken language lies in between and often provides valuable information about the subjects, topics and concepts of multimedia content (Lee & Chen, 2005). The language nature of speech makes information extraction from speech less complicated yet more precise and accurate than from image and video. This fact motivates content based speech analysis for multimedia data mining and retrieval where audio and speech processing is a key, enabling technology (Ohtsuki, Bessho, Matsuo, Matsunaga, & Kayashi, 2006). Progress in this area can impact numerous business and government applications (Gilbert, Moore, & Zweig, 2005). Examples are discovering patterns and generating alarms for intelligence organizations as well as for call centers, analyzing customer preferences, and searching through vast audio warehouses.

Download Full-text

ReactomeFIViz: a Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.2 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 31

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

"Data Analysis Software, S-PLUS Version 7 Enterprise for Large Data Sets, and its Application to Data Mining"

Journal of the Visualization Society of Japan ◽

10.3154/jvs.26.supplement1_35 ◽

2006 ◽

Vol 26 (Supplement1) ◽

pp. 35-38

Author(s):

Ttsukasa TAZAWA ◽

Mika NAKAZONO

Keyword(s):

Data Mining ◽

Data Analysis ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Analysis Software ◽

Data Analysis Software

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

The Integral of Spatial Data Mining in the Era of Big Data

Advances in Business Information Systems and Analytics - Handbook of Research on Advanced Data Mining Techniques and Applications for Business Intelligence ◽

10.4018/978-1-5225-2031-3.ch006 ◽

2017 ◽

pp. 90-126

Author(s):

Gebeyehu Belay Gebremeskel ◽

Chai Yi ◽

Zhongshi He

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Spatial Data ◽

High Volume ◽

Spatial Data Mining ◽

Research Field ◽

Data Sets ◽

Data Types ◽

Basic Principles ◽

Gis Data

Data Mining (DM) is a rapidly expanding field in many disciplines, and it is greatly inspiring to analyze massive data types, which includes geospatial, image and other forms of data sets. Such the fast growths of data characterized as high volume, velocity, variety, variability, value and others that collected and generated from various sources that are too complex and big to capturing, storing, and analyzing and challenging to traditional tools. The SDM is, therefore, the process of searching and discovering valuable information and knowledge in large volumes of spatial data, which draws basic principles from concepts in databases, machine learning, statistics, pattern recognition and 'soft' computing. Using DM techniques enables a more efficient use of the data warehouse. It is thus becoming an emerging research field in Geosciences because of the increasing amount of data, which lead to new promising applications. The integral SDM in which we focused in this chapter is the inference to geospatial and GIS data.

Download Full-text

Scalable Data Analysis Application to Web Usage Data

Multimedia and Sensory Input for Augmented, Mixed, and Virtual Reality - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4703-8.ch014 ◽

2021 ◽

pp. 261-274

Author(s):

Hocine Chebi

Keyword(s):

Data Mining ◽

Data Analysis ◽

Unified Modeling Language ◽

Modeling Language ◽

Data Sets ◽

Time Dimension ◽

Unified Modeling ◽

Internet Users ◽

Over Time ◽

Usage Data

The number of hits to web pages continues to grow. The web has become one of the most popular platforms for disseminating and retrieving information. Consequently, many website operators are encouraged to analyze the use of their sites in order to improve their response to the expectations of internet users. However, the way a website is visited can change depending on a variety of factors. Usage models must therefore be continuously updated in order to accurately reflect visitor behavior. This remains difficult when the time dimension is neglected or simply introduced as an additional numeric attribute in the description of the data. Data mining is defined as the application of data analysis and discovery algorithms on large databases with the goal of discovering non-trivial models. Several algorithms have been proposed in order to formalize the new models discovered, to build more efficient models, to process new types of data, and to measure the differences between the data sets. However, the most traditional algorithms of data mining assume that the models are static and do not take into account the possible evolution of these models over time. These considerations have motivated significant efforts in the analysis of temporal data as well as the adaptation of static data mining methods to data that evolves over time. The review of the main aspects of data mining dealt with in this thesis constitutes the body of this chapter, followed by a state of the art of current work in this field as well as a discussion of the major issues that exist there. Interest in temporal databases has increased considerably in recent years, for example in the fields of finance, telecommunications, surveillance, etc. A growing number of prototypes and systems are being implemented to take into account the time dimension of data explicitly, for example to study the variability over time of analysis results. To model an application, it is necessary to choose a common language, precise and known by all members of a team. UML (unified modeling language, in English, or unified modeling language, in French) is an object-oriented modeling language standardized by the OMG. This chapter aims to present the modeling with the diagrams of packages and classes built using UML. This chapter presents the conceptual model of the data, and finally, the authors specify the SQL queries used for the extraction of descriptive statistical variables of the navigations from a warehouse containing the preprocessed usage data.

Download Full-text

Bibliomining for Library Decision-Making

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch058 ◽

2011 ◽

pp. 341-345

Author(s):

Scott Nicholson ◽

Jeffrey Stanton

Keyword(s):

Data Mining ◽

Digital Libraries ◽

Large Data ◽

Data Sets ◽

Data Mining Techniques ◽

Governmental Organizations ◽

Non Governmental Organizations ◽

The People ◽

The World ◽

Use Of The Internet

Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a campus. These notions greatly oversimplify the world of libraries, however. Most large commercial organizations have dedicated in-house library operations, as do schools, non-governmental organizations, as well as local, state, and federal governments. With the increasing use of the Internet and the World Wide Web, digital libraries have burgeoned, and these serve a huge variety of different user audiences. With this expanded view of libraries, two key insights arise. First, libraries are typically embedded within larger institutions. Corporate libraries serve their corporations, academic libraries serve their universities, and public libraries serve taxpaying communities who elect overseeing representatives. Second, libraries play a pivotal role within their institutions as repositories and providers of information resources. In the provider role, libraries represent in microcosm the intellectual and learning activities of the people who comprise the institution. This fact provides the basis for the strategic importance of library data mining: By ascertaining what users are seeking, bibliomining can reveal insights that have meaning in the context of the library’s host institution. Use of data mining to examine library data might be aptly termed bibliomining. With widespread adoption of computerized catalogs and search facilities over the past quarter century, library and information scientists have often used bibliometric methods (e.g., the discovery of patterns in authorship and citation within a field) to explore patterns in bibliographic information. During the same period, various researchers have developed and tested data mining techniques—advanced statistical and visualization methods to locate non-trivial patterns in large data sets. Bibliomining refers to the use of these bibliometric and data mining techniques to explore the enormous quantities of data generated by the typical automated library.

Download Full-text