Data Mining: Modeling, Algorithms, Applications and Systems

Data mining is a multidisciplinary field of the 20th century gradually, this paper based on data mining modeling, algorithms, applications and software tools were reviewed, the definition of data mining, the scope and characteristics of the data sets and data mining various practical situations; summarizes the data mining in the practical application of the basic steps and processes; data mining tasks in a variety of applications and modeling issues were discussed; cited the current field of data mining is mainly popular algorithms, and algorithm design issues to consider briefly analyzed; overview of the current data mining algorithm in a number of areas; more comprehensive description of the current performance and data mining software tools developer circumstances; Finally, the development of data mining prospects and direction prospected.

Download Full-text

Secured Multi-Party Data Release on Cloud for Big Data Privacy-Preserving Using Fusion Learning

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1893 ◽

2021 ◽

Vol 12 (3) ◽

pp. 4716-4725

Author(s):

Divya Dangi Et.al

Keyword(s):

Data Mining ◽

Data Privacy ◽

Algorithm Design ◽

Privacy Preserving ◽

Current Data ◽

Data Publishing ◽

Data Sets ◽

Complex Data ◽

Security Model ◽

Serial Data

Previous computer protection analysis focuses on current data sets that do not have an update and need one-time releases. Serial data publishing on a complex data collection has only a little bit of literature, although it is not completely considered either. They cannot be used against various backgrounds or the usefulness of the publication of serial data is weak. A new generalization hypothesis is developed on the basis of a theoretical analysis, which effectively decreases the risk of re-publication of certain sensitive attributes. The results suggest that our higher anonymity and lower hiding rates were present in our algorithm. Design and Implementation of new proposed privacy preserving technique: In this phase proposed technique is implemented for demonstrating the entire scenario of data aggregation and their privacy preserving data mining. Comparative Production between the proposed technology and the traditional technology for the application of C.45: In this stage, the performance is evaluated and a comparative comparison with the standard algorithm for the proposed data mining security model is presented

Download Full-text

Personalized web search on e-commerce using ontology based association mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9487 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 286

Author(s):

B. Sekhar Babu ◽

P. Lakshmi Prasanna ◽

P. Vidyullatha

Keyword(s):

Data Mining ◽

Web Search ◽

Large Data ◽

Association Mining ◽

Data Sets ◽

Data Mining Algorithm ◽

Web Data ◽

Data Mining Technique ◽

Web Data Mining ◽

The Web

In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.

Download Full-text

Data Mining Clustering Algorithm Research and Application

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3608 ◽

2014 ◽

Vol 926-930 ◽

pp. 3608-3611 ◽

Cited By ~ 1

Author(s):

Yi Fan Zhang ◽

Yong Tao Qian ◽

Tai Yu Liu ◽

Shu Yan Wu

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Dimensional Space ◽

Data Mining Algorithm ◽

Practical Application ◽

Advantages And Disadvantages ◽

Analysis Theory ◽

Data Points

In this paper, first introduce data mining knowledge then focuses on the clustering analysis algorithms, including classification clustering algorithm, and each classification typical cluster analysis algorithms, including the formal description of each algorithm as well as the advantages and disadvantages of each algorithm also has a more detailed description. Then carefully introduce data mining algorithm on the basis of cluster analysis. And using cohesion based clustering algorithm with DBSCAN algorithm and clustering in consumer spending in two-dimensional space, 2,000 data points for each area, and get a reasonable clustering results, resulting in hierarchical clustering results valuable information, so as to realize the practical application of the algorithm and clustering analysis theory combined.

Download Full-text

ITERATIVE STRUCTURE DISCOVERY IN GRAPH-BASED DATA

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213005002016 ◽

2005 ◽

Vol 14 (01n02) ◽

pp. 101-124 ◽

Cited By ~ 6

Author(s):

JEFFREY A. COBLE ◽

RUNU RATHI ◽

DIANE J. COOK ◽

LAWRENCE B. HOLDER

Keyword(s):

Data Mining ◽

Current Data ◽

Data Sets ◽

Minimal Impact ◽

Dynamic Memory ◽

Demographic Group ◽

Data Mining Approach ◽

Complex Relationships ◽

Conventional Computer ◽

Iterative Structure

Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the globally-best patterns can be computed by mining only the new data increment. To address monolithic datasets we introduce a technique by which these datasets can be partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both the counter-terrorism and bioinformatics domains.

Download Full-text

Neural networks in data mining

Agricultural Economics (Zemědělská ekonomika) ◽

10.17221/5427-agricecon ◽

2012 ◽

Vol 49 (No. 9) ◽

pp. 427-431 ◽

Cited By ~ 3

Author(s):

AVeselý

Keyword(s):

Data Mining ◽

Neural Networks ◽

Knowledge Engineering ◽

Data Sets ◽

Data Mining Algorithm ◽

Self Organizing Map ◽

New Methods ◽

Area Of Interest ◽

Database Theory ◽

Self Organizing

To posses relevant information is an inevitable condition for successful enterprising in modern business. Information could be parted to data and knowledge. How to gather, store and retrieve data is studied in database theory. In the knowledge engineering, there is in the centre of interest the knowledge and methods of its formalization and gaining are studied. Knowledge could be gained from experts, specialists in the area of interest, or it can be gained by induction from sets of data. Automatic induction of knowledge from data sets, usually stored in large databases, is called data mining. Classical methods of gaining knowledge from data sets are statistical methods. In data mining, new methods besides statistical are used. These new methods have their origin in artificial intelligence. They look for unknown and unexpected relations, which can be uncovered by exploring of data in database. In the article, a utilization of modern methods of data mining is described and especially the methods based on neural networks theory are pursued. The advantages and drawbacks of applications of multiplayer feed forward neural networks and Kohonen’s self-organizing maps are discussed. Kohonen’s self-organizing map is the most promising neural data-mining algorithm regarding its capability to visualize high-dimensional data.

Download Full-text

Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0362 ◽

2019 ◽

Vol 23 (2) ◽

pp. 362-365 ◽

Cited By ~ 1

Author(s):

Nan-Chao Luo ◽

Keyword(s):

Data Mining ◽

Vector Space ◽

Clustering Algorithm ◽

Current Data ◽

Massive Data ◽

Data Mining Algorithm ◽

Quality Of Data ◽

Chi Square ◽

Data Mining Algorithms ◽

Mining Algorithm

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.

Download Full-text

Predictive Analysis using Data Mining Techniques for Heart Disease Diagnosis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.1.17079 ◽

2018 ◽

Vol 7 (3.1) ◽

pp. 166 ◽

Cited By ~ 1

Author(s):

Siddharth Joshi ◽

Ashish Sasanapuri ◽

Shreyash Anand ◽

Saurav Nandi ◽

Varsha Nemade

Keyword(s):

Data Mining ◽

Disease Diagnosis ◽

Content Management ◽

Data Sets ◽

Data Mining Algorithm ◽

Large Hospital ◽

Use Of Data ◽

Using Data ◽

Heart Disease Diagnosis ◽

Python Programming

Due to technological advancements in the field of computer science and data warehousing techniques. The healthcare industry ranging from small clinics to large hospital campuses use Content management system which has made the storage and accessing of data a faster option. But these large amounts of data generated are regrettably not mined and the data remains unexploited. Through this research we aim to demonstrate the use of Data Mining algorithm by using python programming language in order to create a desktop-based application which will cater to our aim. This Paper will analyze the performance by comparing the metrics of data analysis like accuracy, precision and recall in order introducing our software solution which tries to be more accurate than the work previously done on Cleveland, VA Hungarian data sets taken from UCI repository [1].

Download Full-text

An ontology-based approach to the analysis of the acid-base state of patients at operative measures

PeerJ Computer Science ◽

10.7717/peerj-cs.777 ◽

2021 ◽

Vol 7 ◽

pp. e777

Author(s):

Man Tianxing ◽

Mikhail Lushnov ◽

Dmitry I. Ignatov ◽

Yulia Alexandrovna Shichkina ◽

Natalia Alexandrovna Zhukova ◽

...

Keyword(s):

Data Mining ◽

Current Data ◽

Data Sets ◽

Acid Base ◽

Algorithm Selection ◽

Specific Domain ◽

Data Mining Techniques ◽

Complex Processes ◽

Task Requirements ◽

Operative Measures

Researchers working in various domains are focusing on extracting information from data sets by data mining techniques. However, data mining is a complicated task, including multiple complex processes, so that it is unfriendly to non-computer researchers. Due to the lack of experience, they cannot design suitable workflows that lead to satisfactory results. This article proposes an ontology-based approach to help users choose appropriate data mining techniques for analyzing domain data. By merging with domain ontology and extracting the corresponding sub-ontology based on the task requirements, an ontology oriented to a specific domain is generated that can be used for algorithm selection. Users can query for suitable algorithms according to the current data characteristics and task requirements step by step. We build a workflow to analyze the Acid-Base State of patients at operative measures based on the proposed approach and obtain appropriate conclusions.

Download Full-text

Human talent forecasting

Proceedings of the International Conference on Business Excellence ◽

10.1515/picbe-2017-0047 ◽

2017 ◽

Vol 11 (1) ◽

pp. 437-447

Author(s):

Bogdan Nedelcu

Keyword(s):

Data Mining ◽

Human Resources ◽

Large Data ◽

Data Sets ◽

Data Mining Algorithm ◽

Data Systems ◽

Strategic Role ◽

Human Skills ◽

Resource Data ◽

A Company

Abstract The demand for talent has increased while the offer has declined and these worrying trends don’t seem to show any sign of change in the near future. According to Bloomberg Businessweek, USA, Canada, UK, and Japan (among many others) will face varying degrees of talent shortages in almost every industry in the coming years. The performed study focuses on identifying patterns which relates to human skills. Recently, with the new demand and increasing visibility, human resources are seeking a more strategic role by harnessing data mining methods. This can be achieved by discovering generated patterns from existing useful data in HR databases. The main objective of the paper is to determine which data mining algorithm suits best for extracting knowledge from human resource data, when in it comes to determining how suited is a candidate for a specific job. First of all, it must be determined a way to evaluate a candidate as objective as possible and rate the candidate with a mark from 0 to 10. To do so, some data sets had to be generated with different numbers of values or different values and wore processed using Weka. The results had been plotted so that it would be easier to interpret. Also, the study shows the importance of using large volumes of data in order to take informed decisions has recently become extremely discussed in most organizations. While finances, marketing and other departments within a company receive data systems and customized analysis, human resources are still not supported by expert systems to process large data volumes. The software prototype designed for the experiment rates individuals (working for the company, or in trials) on a scale from 0 to 10, offering the decision makers an objective analysis. This way, a company looking for talent will know whether the person applying for the job is suited or not, and how much the hiring will influence the overall rating of the department.

Download Full-text

Using Prior Knowledge in Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch308 ◽

2011 ◽

pp. 2019-2023

Author(s):

Francesca A. Lisi

Keyword(s):

Data Mining ◽

Prior Knowledge ◽

Pattern Discovery ◽

General Pattern ◽

Current Data ◽

Frequent Pattern ◽

Language L ◽

Binary Predicate ◽

Intended Interpretation ◽

Definition Of

One of the most important and challenging problems in current Data Mining research is the definition of the prior knowledge that can be originated from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypotheses, represent the output in a most comprehensible way and improve the process. Ontological foundation is a precondition for efficient automated usage of such information (Chandrasekaran et al., 1999). An ontology is a formal explicit specification of a shared conceptualization for a domain of interest (Gruber, 1993). Among other things, this definition emphasizes the fact that an ontology has to be specified in a language that comes with a formal semantics. Due to this formalization ontologies provide the machine interpretable meaning of concepts and relations that is expected when using a semantic-based approach (Staab & Studer, 2004). In its most prevalent use in Artificial Intelligence (AI), an ontology refers to an engineering artifact (more precisely, produced according to the principles of Ontological Engineering (Gómez-Pérez et al., 2004)), constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words. This set of assumptions has usually the form of a First-Order Logic (FOL) theory, where vocabulary words appear as unary or binary predicate names, respectively called concepts and relations. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation. Ontologies can play several roles in Data Mining (Nigro et al., 2007). In this chapter we investigate the use of ontologies as prior knowledge in Data Mining. As an illustrative case throughout the chapter, we choose the task of Frequent Pattern Discovery, it being the most representative product of the cross-fertilization among Databases, Machine Learning and Statistics that has given rise to Data Mining. Indeed it is central to an entire class of descriptive tasks in Data Mining among which Association Rule Mining (Agrawal et al., 1993; Agrawal & Srikant, 1994) is the most popular. A pattern is considered as an intensional description (expressed in a given language L) of a subset of a data set r. The support of a pattern is the relative frequency of the pattern within r and is computed with the evaluation function supp. The task of Frequent Pattern Discovery aims at the extraction of all frequent patterns, i.e. all patterns whose support exceeds a user-defined threshold of minimum support. The blueprint of most algorithms for Frequent Pattern Discovery is the levelwise search (Mannila & Toivonen, 1997). It is based on the following assumption: If a generality order = for the language L of patterns can be found such that = is monotonic w.r.t. supp, then the resulting space (L, =) can be searched breadth-first by starting from the most general pattern in L and alternating candidate generation and candidate evaluation phases.

Download Full-text