Data Mining: Modeling, Algorithms, Applications and Systems

2014 ◽  
Vol 926-930 ◽  
pp. 2786-2789
Author(s):  
Jing Zhu Li ◽  
Qian Li ◽  
Tai Yu Liu ◽  
Wei Hong Niu

Data mining is a multidisciplinary field of the 20th century gradually, this paper based on data mining modeling, algorithms, applications and software tools were reviewed, the definition of data mining, the scope and characteristics of the data sets and data mining various practical situations; summarizes the data mining in the practical application of the basic steps and processes; data mining tasks in a variety of applications and modeling issues were discussed; cited the current field of data mining is mainly popular algorithms, and algorithm design issues to consider briefly analyzed; overview of the current data mining algorithm in a number of areas; more comprehensive description of the current performance and data mining software tools developer circumstances; Finally, the development of data mining prospects and direction prospected.

Author(s):  
Divya Dangi Et.al

Previous computer protection analysis focuses on current data sets that do not have an update and need one-time releases. Serial data publishing on a complex data collection has only a little bit of literature, although it is not completely considered either. They cannot be used against various backgrounds or the usefulness of the publication of serial data is weak. A new generalization hypothesis is developed on the basis of a theoretical analysis, which effectively decreases the risk of re-publication of certain sensitive attributes. The results suggest that our higher anonymity and lower hiding rates were present in our algorithm. Design and Implementation of new proposed privacy preserving technique: In this phase proposed technique is implemented for demonstrating the entire scenario of data aggregation and their privacy preserving data mining. Comparative Production between the proposed technology and the traditional technology for the application of C.45: In this stage, the performance is evaluated  and  a comparative comparison with the standard algorithm for the proposed data mining security model is presented


2017 ◽  
Vol 7 (1.1) ◽  
pp. 286
Author(s):  
B. Sekhar Babu ◽  
P. Lakshmi Prasanna ◽  
P. Vidyullatha

 In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.


2014 ◽  
Vol 926-930 ◽  
pp. 3608-3611 ◽  
Author(s):  
Yi Fan Zhang ◽  
Yong Tao Qian ◽  
Tai Yu Liu ◽  
Shu Yan Wu

In this paper, first introduce data mining knowledge then focuses on the clustering analysis algorithms, including classification clustering algorithm, and each classification typical cluster analysis algorithms, including the formal description of each algorithm as well as the advantages and disadvantages of each algorithm also has a more detailed description. Then carefully introduce data mining algorithm on the basis of cluster analysis. And using cohesion based clustering algorithm with DBSCAN algorithm and clustering in consumer spending in two-dimensional space, 2,000 data points for each area, and get a reasonable clustering results, resulting in hierarchical clustering results valuable information, so as to realize the practical application of the algorithm and clustering analysis theory combined.


2005 ◽  
Vol 14 (01n02) ◽  
pp. 101-124 ◽  
Author(s):  
JEFFREY A. COBLE ◽  
RUNU RATHI ◽  
DIANE J. COOK ◽  
LAWRENCE B. HOLDER

Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the globally-best patterns can be computed by mining only the new data increment. To address monolithic datasets we introduce a technique by which these datasets can be partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both the counter-terrorism and bioinformatics domains.


2012 ◽  
Vol 49 (No. 9) ◽  
pp. 427-431 ◽  
Author(s):  
AVeselý

To posses relevant information is an inevitable condition for successful enterprising in modern business. Information could be parted to data and knowledge. How to gather, store and retrieve data is studied in database theory. In the knowledge engineering, there is in the centre of interest the knowledge and methods of its formalization and gaining are studied. Knowledge could be gained from experts, specialists in the area of interest, or it can be gained by induction from sets of data. Automatic induction of knowledge from data sets, usually stored in large databases, is called data mining. Classical methods of gaining knowledge from data sets are statistical methods. In data mining, new methods besides statistical are used. These new methods have their origin in artificial intelligence. They look for unknown and unexpected relations, which can be uncovered by exploring of data in database. In the article, a utilization of modern methods of data mining is described and especially the methods based on neural networks theory are pursued. The advantages and drawbacks of applications of multiplayer feed forward neural networks and Kohonen’s self-organizing maps are discussed. Kohonen’s self-organizing map is the most promising neural data-mining algorithm regarding its capability to visualize high-dimensional data.


Author(s):  
Nan-Chao Luo ◽  

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.


2018 ◽  
Vol 7 (3.1) ◽  
pp. 166 ◽  
Author(s):  
Siddharth Joshi ◽  
Ashish Sasanapuri ◽  
Shreyash Anand ◽  
Saurav Nandi ◽  
Varsha Nemade

Due to technological advancements in the field of computer science and data warehousing techniques. The healthcare industry ranging from small clinics to large hospital campuses use Content management system which has made the storage and accessing of data a faster option. But these large amounts of data generated are regrettably not mined and the data remains unexploited. Through this research we aim to demonstrate the use of Data Mining algorithm by using python programming language in order to create a desktop-based application which will cater to our aim. This Paper will analyze the performance by comparing the metrics of data analysis like accuracy, precision and recall in order introducing our software solution which tries to be more accurate than the work previously done on Cleveland, VA Hungarian data sets taken from UCI repository [1].  


2021 ◽  
Vol 7 ◽  
pp. e777
Author(s):  
Man Tianxing ◽  
Mikhail Lushnov ◽  
Dmitry I. Ignatov ◽  
Yulia Alexandrovna Shichkina ◽  
Natalia Alexandrovna Zhukova ◽  
...  

Researchers working in various domains are focusing on extracting information from data sets by data mining techniques. However, data mining is a complicated task, including multiple complex processes, so that it is unfriendly to non-computer researchers. Due to the lack of experience, they cannot design suitable workflows that lead to satisfactory results. This article proposes an ontology-based approach to help users choose appropriate data mining techniques for analyzing domain data. By merging with domain ontology and extracting the corresponding sub-ontology based on the task requirements, an ontology oriented to a specific domain is generated that can be used for algorithm selection. Users can query for suitable algorithms according to the current data characteristics and task requirements step by step. We build a workflow to analyze the Acid-Base State of patients at operative measures based on the proposed approach and obtain appropriate conclusions.


Author(s):  
Bogdan Nedelcu

Abstract The demand for talent has increased while the offer has declined and these worrying trends don’t seem to show any sign of change in the near future. According to Bloomberg Businessweek, USA, Canada, UK, and Japan (among many others) will face varying degrees of talent shortages in almost every industry in the coming years. The performed study focuses on identifying patterns which relates to human skills. Recently, with the new demand and increasing visibility, human resources are seeking a more strategic role by harnessing data mining methods. This can be achieved by discovering generated patterns from existing useful data in HR databases. The main objective of the paper is to determine which data mining algorithm suits best for extracting knowledge from human resource data, when in it comes to determining how suited is a candidate for a specific job. First of all, it must be determined a way to evaluate a candidate as objective as possible and rate the candidate with a mark from 0 to 10. To do so, some data sets had to be generated with different numbers of values or different values and wore processed using Weka. The results had been plotted so that it would be easier to interpret. Also, the study shows the importance of using large volumes of data in order to take informed decisions has recently become extremely discussed in most organizations. While finances, marketing and other departments within a company receive data systems and customized analysis, human resources are still not supported by expert systems to process large data volumes. The software prototype designed for the experiment rates individuals (working for the company, or in trials) on a scale from 0 to 10, offering the decision makers an objective analysis. This way, a company looking for talent will know whether the person applying for the job is suited or not, and how much the hiring will influence the overall rating of the department.


Author(s):  
Francesca A. Lisi

One of the most important and challenging problems in current Data Mining research is the definition of the prior knowledge that can be originated from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypotheses, represent the output in a most comprehensible way and improve the process. Ontological foundation is a precondition for efficient automated usage of such information (Chandrasekaran et al., 1999). An ontology is a formal explicit specification of a shared conceptualization for a domain of interest (Gruber, 1993). Among other things, this definition emphasizes the fact that an ontology has to be specified in a language that comes with a formal semantics. Due to this formalization ontologies provide the machine interpretable meaning of concepts and relations that is expected when using a semantic-based approach (Staab & Studer, 2004). In its most prevalent use in Artificial Intelligence (AI), an ontology refers to an engineering artifact (more precisely, produced according to the principles of Ontological Engineering (Gómez-Pérez et al., 2004)), constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words. This set of assumptions has usually the form of a First-Order Logic (FOL) theory, where vocabulary words appear as unary or binary predicate names, respectively called concepts and relations. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation. Ontologies can play several roles in Data Mining (Nigro et al., 2007). In this chapter we investigate the use of ontologies as prior knowledge in Data Mining. As an illustrative case throughout the chapter, we choose the task of Frequent Pattern Discovery, it being the most representative product of the cross-fertilization among Databases, Machine Learning and Statistics that has given rise to Data Mining. Indeed it is central to an entire class of descriptive tasks in Data Mining among which Association Rule Mining (Agrawal et al., 1993; Agrawal & Srikant, 1994) is the most popular. A pattern is considered as an intensional description (expressed in a given language L) of a subset of a data set r. The support of a pattern is the relative frequency of the pattern within r and is computed with the evaluation function supp. The task of Frequent Pattern Discovery aims at the extraction of all frequent patterns, i.e. all patterns whose support exceeds a user-defined threshold of minimum support. The blueprint of most algorithms for Frequent Pattern Discovery is the levelwise search (Mannila & Toivonen, 1997). It is based on the following assumption: If a generality order = for the language L of patterns can be found such that = is monotonic w.r.t. supp, then the resulting space (L, =) can be searched breadth-first by starting from the most general pattern in L and alternating candidate generation and candidate evaluation phases.


Sign in / Sign up

Export Citation Format

Share Document