scholarly journals Computing on Vertices in Data Mining

2021 ◽  
Author(s):  
Leon Bobrowski

The main challenges in data mining are related to large, multi-dimensional data sets. There is a need to develop algorithms that are precise and efficient enough to deal with big data problems. The Simplex algorithm from linear programming can be seen as an example of a successful big data problem solving tool. According to the fundamental theorem of linear programming the solution of the optimization problem can found in one of the vertices in the parameter space. The basis exchange algorithms also search for the optimal solution among finite number of the vertices in the parameter space. Basis exchange algorithms enable the design of complex layers of classifiers or predictive models based on a small number of multivariate data vectors.

Web Services ◽  
2019 ◽  
pp. 618-638
Author(s):  
Goran Klepac ◽  
Kristi L. Berg

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.


2021 ◽  
Vol 22 (2) ◽  
pp. 119-134
Author(s):  
Ahad Shamseen ◽  
Morteza Mohammadi Zanjireh ◽  
Mahdi Bahaghighat ◽  
Qin Xin

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.


2009 ◽  
Vol 19 (1) ◽  
pp. 123-132 ◽  
Author(s):  
Nikolaos Samaras ◽  
Angelo Sifelaras ◽  
Charalampos Triantafyllidis

The aim of this paper is to present a new simplex type algorithm for the Linear Programming Problem. The Primal - Dual method is a Simplex - type pivoting algorithm that generates two paths in order to converge to the optimal solution. The first path is primal feasible while the second one is dual feasible for the original problem. Specifically, we use a three-phase-implementation. The first two phases construct the required primal and dual feasible solutions, using the Primal Simplex algorithm. Finally, in the third phase the Primal - Dual algorithm is applied. Moreover, a computational study has been carried out, using randomly generated sparse optimal linear problems, to compare its computational efficiency with the Primal Simplex algorithm and also with MATLAB's Interior Point Method implementation. The algorithm appears to be very promising since it clearly shows its superiority to the Primal Simplex algorithm as well as its robustness over the IPM algorithm.


Author(s):  
Onur Doğan ◽  
Hakan  Aşan ◽  
Ejder Ayç

In today’s competitive world, organizations need to make the right decisions to prolong their existence. Using non-scientific methods and making emotional decisions gave way to the use of scientific methods in the decision making process in this competitive area. Within this scope, many decision support models are still being developed in order to assist the decision makers and owners of organizations. It is easy to collect massive amount of data for organizations, but generally the problem is using this data to achieve economic advances. There is a critical need for specialization and automation to transform the data into the knowledge in big data sets. Data mining techniques are capable of providing description, estimation, prediction, classification, clustering, and association. Recently, many data mining techniques have been developed in order to find hidden patterns and relations in big data sets. It is important to obtain new correlations, patterns, and trends, which are understandable and useful to the decision makers. There have been many researches and applications focusing on different data mining techniques and methodologies.In this study, we aim to obtain understandable and applicable results from a large volume of record set that belong to a firm, which is active in the meat processing industry, by using data mining techniques. In the application part, firstly, data cleaning and data integration, which are the first steps of data mining process, are performed on the data in the database. With the aid of data cleaning and data integration, the data set was obtained, which is suitable for data mining. Then, various association rule algorithms were applied to this data set. This analysis revealed that finding unexplored patterns in the set of data would be beneficial for the decision makers of the firm. Finally, many association rules are obtained, which are useful for decision makers of the local firm. 


Author(s):  
Tianxiang He

The development of artificial intelligence (AI) technology is firmly connected to the availability of big data. However, using data sets involving copyrighted works for AI analysis or data mining without authorization will incur risks of copyright infringement. Considering the fact that incomplete data collection may lead to data bias, and since it is impossible for the user of AI technology to obtain a copyright licence from each and every right owner of the copyrighted works used, a mechanism that can free the data from copyright restrictions under certain conditions is needed. In the case of China, it is crucial to check whether China’s current copyright exception model can take on the role and offer that kind of function. This chapter suggests that a special AI analysis and data mining copyright exception that follows a semi-open style should be added to the current exceptions list under the Copyright Law of China.


Author(s):  
Goran Klepac ◽  
Kristi L. Berg

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.


2018 ◽  
Vol 7 (2.12) ◽  
pp. 184
Author(s):  
Konda Sreenu ◽  
Dr Boddu Raja Srinivasa Reddy

Computer plays a key role in everywhere world. Data is growing along with the usage of computers. In everyday life we use computer for various purpose and store bulk of information. One or other way we want to retrieve data from the storage system. Retrieving bulk of data information is not a simple thing or it is magic show. Every user wants data in different forms like reports or output information. For doing all this exercises we require one process. Process is nothing but marching ants colonies. Data related databases and tables are collected, trivial data is selected from huge tables and databases, apply aggregate functions on data and output information or reports related to data. Paper focus on how efficiently we can use software for some extent on solving business related problems. Paper may not solve century year’s data but we can achieve something. When century years data it is better to go for data mining approach because it accumulates large time to solve such big problems 


Author(s):  
Cataldo Zuccaro ◽  
Michel Plaisent ◽  
Prosper Bernard

This chapter presents a preliminary framework to tackle tax evasion in the field of residential renovation. This industry plays a major role in economic development and employment growth. Tax evasion and fraud are extremely difficult to combat in the industry since it is characterized by a large number of stakeholders (manufacturers, retailers, tradesmen, and households) generating complex transactional dynamics that often defy attempts to deploy transactional analytics to detect anomalies, fraud, and tax evasion. This chapter proposes a framework to apply transactional analytics and data mining to develop standard measures and predictive models to detect fraud and tax evasion. Combining big data sets, cross-referencing, and predictive modeling (i.e., anomaly detection, artificial neural network support vector machines, Bayesian network, and association rules) can assist government agencies to combat highly stealth tax evasion and fraud in the residential renovation.


Author(s):  
M. Govindarajan

Big data mining involves knowledge discovery from these large data sets. The purpose of this chapter is to provide an analysis of different machine learning algorithms available for performing big data analytics. The machine learning algorithms are categorized in three key categories, namely, supervised, unsupervised, and semi-supervised machine learning algorithm. The supervised learning algorithms are trained with a complete set of data, and thus, the supervised learning algorithms are used to predict/forecast. Example algorithms include logistic regression and the back propagation neural network. The unsupervised learning algorithms starts learning from scratch, and therefore, the unsupervised learning algorithms are used for clustering. Example algorithms include: the Apriori algorithm and K-Means. The semi-supervised learning combines both supervised and unsupervised learning algorithms. The semi-supervised algorithms are trained, and the algorithms also include non-trained learning.


Sign in / Sign up

Export Citation Format

Share Document