Computing on Vertices in Data Mining

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.

Download Full-text

DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS

IIUM Engineering Journal ◽

10.31436/iiumej.v22i2.1541 ◽

2021 ◽

Vol 22 (2) ◽

pp. 119-134

Author(s):

Ahad Shamseen ◽

Morteza Mohammadi Zanjireh ◽

Mahdi Bahaghighat ◽

Qin Xin

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Main Memory ◽

Experimental Results ◽

Primary Data ◽

Data Sets ◽

Decision Tree Classifier ◽

Vast Amount ◽

Tree Classifier

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.

Download Full-text

F011003 Big Data Mining : Knowledge discovery from huge,complicated data sets

The Proceedings of Mechanical Engineering Congress Japan ◽

10.1299/jsmemecj.2012._f011003-1 ◽

2012 ◽

Vol 2012 (0) ◽

pp. _F011003-1-_F011003-5

Author(s):

Keisuke HOSAKA

Keyword(s):

Data Mining ◽

Big Data ◽

Knowledge Discovery ◽

Data Sets ◽

Big Data Mining

Download Full-text

A primal-dual exterior point algorithm for linear programming problems

Yugoslav journal of operations research ◽

10.2298/yjor0901123s ◽

2009 ◽

Vol 19 (1) ◽

pp. 123-132 ◽

Cited By ~ 5

Author(s):

Nikolaos Samaras ◽

Angelo Sifelaras ◽

Charalampos Triantafyllidis

Keyword(s):

Linear Programming ◽

Computational Study ◽

Optimal Solution ◽

Simplex Algorithm ◽

Dual Algorithm ◽

Optimal Linear ◽

Primal Dual ◽

Primal And Dual ◽

Two Phases ◽

Feasible Solutions

The aim of this paper is to present a new simplex type algorithm for the Linear Programming Problem. The Primal - Dual method is a Simplex - type pivoting algorithm that generates two paths in order to converge to the optimal solution. The first path is primal feasible while the second one is dual feasible for the original problem. Specifically, we use a three-phase-implementation. The first two phases construct the required primal and dual feasible solutions, using the Primal Simplex algorithm. Finally, in the third phase the Primal - Dual algorithm is applied. Moreover, a computational study has been carried out, using randomly generated sparse optimal linear problems, to compare its computational efficiency with the Primal Simplex algorithm and also with MATLAB's Interior Point Method implementation. The algorithm appears to be very promising since it clearly shows its superiority to the Primal Simplex algorithm as well as its robustness over the IPM algorithm.

Download Full-text

USE OF DATA MINING TECHNIQUES IN ADVANCE DECISION MAKING PROCESSES IN A LOCAL FIRM

European Journal of Business and Economics ◽

10.12955/ejbe.v10i2.682 ◽

2015 ◽

Vol 10 (2) ◽

Cited By ~ 1

Author(s):

Onur Doğan ◽

Hakan Aşan ◽

Ejder Ayç

Keyword(s):

Data Mining ◽

Decision Making ◽

Big Data ◽

Data Cleaning ◽

Decision Makers ◽

Data Sets ◽

Data Set ◽

Scientific Methods ◽

Local Firm ◽

Data Mining Techniques

In today’s competitive world, organizations need to make the right decisions to prolong their existence. Using non-scientific methods and making emotional decisions gave way to the use of scientific methods in the decision making process in this competitive area. Within this scope, many decision support models are still being developed in order to assist the decision makers and owners of organizations. It is easy to collect massive amount of data for organizations, but generally the problem is using this data to achieve economic advances. There is a critical need for specialization and automation to transform the data into the knowledge in big data sets. Data mining techniques are capable of providing description, estimation, prediction, classification, clustering, and association. Recently, many data mining techniques have been developed in order to find hidden patterns and relations in big data sets. It is important to obtain new correlations, patterns, and trends, which are understandable and useful to the decision makers. There have been many researches and applications focusing on different data mining techniques and methodologies.In this study, we aim to obtain understandable and applicable results from a large volume of record set that belong to a firm, which is active in the meat processing industry, by using data mining techniques. In the application part, firstly, data cleaning and data integration, which are the first steps of data mining process, are performed on the data in the database. With the aid of data cleaning and data integration, the data set was obtained, which is suitable for data mining. Then, various association rule algorithms were applied to this data set. This analysis revealed that finding unexplored patterns in the set of data would be beneficial for the decision makers of the firm. Finally, many association rules are obtained, which are useful for decision makers of the local firm.

Download Full-text

Copyright Exceptions Reform and AI Data Analysis in ChinaA Modest Proposal

Artificial Intelligence and Intellectual Property ◽

10.1093/oso/9780198870944.003.0010 ◽

2021 ◽

pp. 196-220

Author(s):

Tianxiang He

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Big Data ◽

Data Analysis ◽

Data Collection ◽

Incomplete Data ◽

Copyright Law ◽

Data Sets ◽

Copyright Exception ◽

Using Data

The development of artificial intelligence (AI) technology is firmly connected to the availability of big data. However, using data sets involving copyrighted works for AI analysis or data mining without authorization will incur risks of copyright infringement. Considering the fact that incomplete data collection may lead to data bias, and since it is impossible for the user of AI technology to obtain a copyright licence from each and every right owner of the copyrighted works used, a mechanism that can free the data from copyright restrictions under certain conditions is needed. In the case of China, it is crucial to check whether China’s current copyright exception model can take on the role and offer that kind of function. This chapter suggests that a special AI analysis and data mining copyright exception that follows a semi-open style should be added to the current exceptions list under the Copyright Law of China.

Download Full-text

Proposal of Analytical Model for Business Problems Solving in Big Data Environment

Strategic Data-Based Wisdom in the Big Data Era - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-4666-8122-4.ch012 ◽

2015 ◽

pp. 209-228 ◽

Cited By ~ 7

Author(s):

Goran Klepac ◽

Kristi L. Berg

Keyword(s):

Data Mining ◽

Big Data ◽

Predictive Models ◽

Analytical Approach ◽

Fraud Detection ◽

Analytical Techniques ◽

Data Sources ◽

Business Decisions ◽

Mining Projects ◽

Structured Approach

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.

Download Full-text

a\A technique on novel based marching ants colonies clusters for operational big data sets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.12.11276 ◽

2018 ◽

Vol 7 (2.12) ◽

pp. 184

Author(s):

Konda Sreenu ◽

Dr Boddu Raja Srinivasa Reddy

Keyword(s):

Data Mining ◽

Big Data ◽

Everyday Life ◽

Large Time ◽

Storage System ◽

Data Sets ◽

World Data ◽

Data Mining Approach ◽

Output Information ◽

Aggregate Functions

Computer plays a key role in everywhere world. Data is growing along with the usage of computers. In everyday life we use computer for various purpose and store bulk of information. One or other way we want to retrieve data from the storage system. Retrieving bulk of data information is not a simple thing or it is magic show. Every user wants data in different forms like reports or output information. For doing all this exercises we require one process. Process is nothing but marching ants colonies. Data related databases and tables are collected, trivial data is selected from huge tables and databases, apply aggregate functions on data and output information or reports related to data. Paper focus on how efficiently we can use software for some extent on solving business related problems. Paper may not solve century year’s data but we can achieve something. When century years data it is better to go for data mining approach because it accumulates large time to solve such big problems

Download Full-text

A Preliminary Framework to Fight Tax Evasion in the Home Renovation Market

Advances in Data Mining and Database Management - Intelligent Analytics With Advanced Multi-Industry Applications ◽

10.4018/978-1-7998-4963-6.ch015 ◽

2021 ◽

pp. 304-325

Author(s):

Cataldo Zuccaro ◽

Michel Plaisent ◽

Prosper Bernard

Keyword(s):

Neural Network ◽

Data Mining ◽

Predictive Modeling ◽

Tax Evasion ◽

Predictive Models ◽

Support Vector ◽

Government Agencies ◽

Data Sets ◽

Network Support ◽

Vector Machines

This chapter presents a preliminary framework to tackle tax evasion in the field of residential renovation. This industry plays a major role in economic development and employment growth. Tax evasion and fraud are extremely difficult to combat in the industry since it is characterized by a large number of stakeholders (manufacturers, retailers, tradesmen, and households) generating complex transactional dynamics that often defy attempts to deploy transactional analytics to detect anomalies, fraud, and tax evasion. This chapter proposes a framework to apply transactional analytics and data mining to develop standard measures and predictive models to detect fraud and tax evasion. Combining big data sets, cross-referencing, and predictive modeling (i.e., anomaly detection, artificial neural network support vector machines, Bayesian network, and association rules) can assist government agencies to combat highly stealth tax evasion and fraud in the residential renovation.

Download Full-text

Big Data Mining Algorithms

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch052 ◽

2021 ◽

pp. 768-777

Author(s):

M. Govindarajan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Sets ◽

Big Data Mining ◽

Supervised Learning Algorithms

Big data mining involves knowledge discovery from these large data sets. The purpose of this chapter is to provide an analysis of different machine learning algorithms available for performing big data analytics. The machine learning algorithms are categorized in three key categories, namely, supervised, unsupervised, and semi-supervised machine learning algorithm. The supervised learning algorithms are trained with a complete set of data, and thus, the supervised learning algorithms are used to predict/forecast. Example algorithms include logistic regression and the back propagation neural network. The unsupervised learning algorithms starts learning from scratch, and therefore, the unsupervised learning algorithms are used for clustering. Example algorithms include: the Apriori algorithm and K-Means. The semi-supervised learning combines both supervised and unsupervised learning algorithms. The semi-supervised algorithms are trained, and the algorithms also include non-trained learning.

Download Full-text