scholarly journals Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

Author(s):  
Jonathan Will ◽  
Onur Arslan ◽  
Jonathan Bader ◽  
Dominik Scheinert ◽  
Lauritz Thamsen
2019 ◽  
Vol 8 (3) ◽  
pp. 4373-4378

The amount of data belonging to different domains are being stored rapidly in various repositories across the globe. Extracting useful information from the huge volumes of data is always difficult due to the dynamic nature of data being stored. Data Mining is a knowledge discovery process used to extract the hidden information from the data stored in various repositories, termed as warehouses in the form of patterns. One of the popular tasks of data mining is Classification, which deals with the process of distinguishing every instance of a data set into one of the predefined class labels. Banking system is one of the realworld domains, which collects huge number of client data on a daily basis. In this work, we have collected two variants of the bank marketing data set pertaining to a Portuguese financial institution consisting of 41188 and 45211 instances and performed classification on them using two data reduction techniques. Attribute subset selection has been performed on the first data set and the training data with the selected features are used in classification. Principal Component Analysis has been performed on the second data set and the training data with the extracted features are used in classification. A deep neural network classification algorithm based on Backpropagation has been developed to perform classification on both the data sets. Finally, comparisons are made on the performance of each deep neural network classifier with the four standard classifiers, namely Decision trees, Naïve Bayes, Support vector machines, and k-nearest neighbors. It has been found that the deep neural network classifier outperforms the existing classifiers in terms of accuracy


2014 ◽  
Vol 41 (2) ◽  
pp. 405-420 ◽  
Author(s):  
Senzhang Wang ◽  
Zhoujun Li ◽  
Chunyang Liu ◽  
Xiaoming Zhang ◽  
Haijun Zhang

2020 ◽  
Vol 10 (6) ◽  
pp. 2134 ◽  
Author(s):  
Yemao Man ◽  
Tobias Sturm ◽  
Monica Lundh ◽  
Scott N. MacKinnon

The shipping industry constantly strives to achieve efficient use of energy during sea voyages. Previous research that can take advantages of both ethnographic studies and big data analytics to understand factors contributing to fuel consumption and seek solutions to support decision making is rather scarce. This paper first employed ethnographic research regarding the use of a commercially available fuel-monitoring system. This was to contextualize the real challenges on ships and informed the need of taking a big data approach to achieve energy efficiency (EE). Then this study constructed two machine-learning models based on the recorded voyage data of five different ferries over a one-year period. The evaluation showed that the models generalize well on different training data sets and model outputs indicated a potential for better performance than the existing commercial EE system. How this predictive-analytical approach could potentially impact the design of decision support navigational systems and management practices was also discussed. It is hoped that this interdisciplinary research could provide some enlightenment for a richer methodological framework in future maritime energy research.


Author(s):  
Rainer Mühlhoff

AbstractData analytics and data-driven approaches in Machine Learning are now among the most hailed computing technologies in many industrial domains. One major application is predictive analytics, which is used to predict sensitive attributes, future behavior, or cost, risk and utility functions associated with target groups or individuals based on large sets of behavioral and usage data. This paper stresses the severe ethical and data protection implications of predictive analytics if it is used to predict sensitive information about single individuals or treat individuals differently based on the data many unrelated individuals provided. To tackle these concerns in an applied ethics, first, the paper introduces the concept of “predictive privacy” to formulate an ethical principle protecting individuals and groups against differential treatment based on Machine Learning and Big Data analytics. Secondly, it analyses the typical data processing cycle of predictive systems to provide a step-by-step discussion of ethical implications, locating occurrences of predictive privacy violations. Thirdly, the paper sheds light on what is qualitatively new in the way predictive analytics challenges ethical principles such as human dignity and the (liberal) notion of individual privacy. These new challenges arise when predictive systems transform statistical inferences, which provide knowledge about the cohort of training data donors, into individual predictions, thereby crossing what I call the “prediction gap”. Finally, the paper summarizes that data protection in the age of predictive analytics is a collective matter as we face situations where an individual’s (or group’s) privacy is violated using data other individuals provide about themselves, possibly even anonymously.


2014 ◽  
Vol 11 (2) ◽  
pp. 665-678 ◽  
Author(s):  
Stefanos Ougiaroglou ◽  
Georgios Evangelidis

Data reduction techniques improve the efficiency of k-Nearest Neighbour classification on large datasets since they accelerate the classification process and reduce storage requirements for the training data. IB2 is an effective prototype selection data reduction technique. It selects some items from the initial training dataset and uses them as representatives (prototypes). Contrary to many other techniques, IB2 is a very fast, one-pass method that builds its reduced (condensing) set in an incremental manner. New training data can update the condensing set without the need of the ?old? removed items. This paper proposes a variation of IB2, that generates new prototypes instead of selecting them. The variation is called AIB2 and attempts to improve the efficiency of IB2 by positioning the prototypes in the center of the data areas they represent. The empirical experimental study conducted in the present work as well as the Wilcoxon signed ranks test show that AIB2 performs better than IB2.


Sign in / Sign up

Export Citation Format

Share Document