scholarly journals Tobit Regressive Based Gaussian Independence Bayes Map Reduce Classifier on Data Warehouse for Predictive Analytics

Data warehouse comprises of data collected from different probable heterogeneous resources at different time intervals with the objective of responding to user analytic queries. Big data is a field that helps in analysing and extracting information from large datasets. The unfolding Big Data incorporation inflicts multiple confronts, compromising the feasible business research practice. Heterogeneous resources, high dimensionality and massive volumes that confront Big Data prototype may prevent the effectual data and system integration processes. In this work, we plan to develop a Tobit Regressive based Gaussian Independence Bayes Map Reduce Classifier (TRGIBMRC) method for categorizing the collected and stored data which helps the users in making decision with minimum time consumption. The TR-GIBMRC method consists of two processes. They are, Tobit Regressive Feature Selection and Gaussian Independence Bayes Map Reduce Classification. Tobit Regressive Feature Selection process is used to select relevant features from collected and stored data. Tobit statistical model, used to describe the relationship between non-negative dependent variable and an independent variable for selecting relevant features. Next, Gaussian Independence Bayes Map Reduce Classifier is used to classify the selected relevant features for decision making with lesser time consumption. Gaussian Independence Bayes Map Reduce Classifier, a probabilistic classifier segments the data by class by measuring the mean and variance of data in each class. The data point gets allocated to the class with minimal variance. This in turn helps to perform efficient data classification for accurate decision making. Experimental evaluation is carried out on the factors such as feature selection rate, classification accuracy, classification time and error rate with respect to number of features and number of data points. Keyword

Author(s):  
Jorge Bernardino ◽  
Joaquim Lapa ◽  
Ana Almeida

A big data warehouse enables the analysis of large amounts of information that typically comes from the organization's transactional systems (OLTP). However, today's data warehouse systems do not have the capacity to handle the massive amount of data that is currently produced. Business intelligence (BI) is a collection of decision support technologies that enable executives, managers, and analysts to make better and faster decisions. Organizations must make good use of business intelligence platforms to quickly acquire desirable information from the huge volume of data to reduce the time and increase the efficiency of decision-making processes. In this chapter, the authors present a comparative analysis of commercial and open source BI tools capabilities, in order to aid organizations in the selection process of the most suitable BI platform. They also evaluated and compared six major open source BI platforms: Actuate, Jaspersoft, Jedox/Palo, Pentaho, SpagoBI, and Vanilla; and six major commercial BI platforms: IBM Cognos, Microsoft BI, MicroStrategy, Oracle BI, SAP BI, and SAS BI & Analytics.


Author(s):  
Iftikhar U. Sikder ◽  
Aryya Gangopadhyay

This chapter introduces the research issues on spatial decision-making in the context of distributed geo-spatial data warehouse. Spatial decision-making in a distributed environment involves access to data and models from heterogeneous sources and composing disparate services into a meaningful integration. The chapter reviews system integration and interoperability issues of spatial data and models in a distributed computing environment. We present a prototype system to illustrate the collaborative access to data and as a model for supporting spatial decision-making.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Surendran Rajendran ◽  
Osamah Ibrahim Khalaf ◽  
Youseef Alotaibi ◽  
Saleh Alghamdi

AbstractIn recent times, big data classification has become a hot research topic in various domains, such as healthcare, e-commerce, finance, etc. The inclusion of the feature selection process helps to improve the big data classification process and can be done by the use of metaheuristic optimization algorithms. This study focuses on the design of a big data classification model using chaotic pigeon inspired optimization (CPIO)-based feature selection with an optimal deep belief network (DBN) model. The proposed model is executed in the Hadoop MapReduce environment to manage big data. Initially, the CPIO algorithm is applied to select a useful subset of features. In addition, the Harris hawks optimization (HHO)-based DBN model is derived as a classifier to allocate appropriate class labels. The design of the HHO algorithm to tune the hyperparameters of the DBN model assists in boosting the classification performance. To examine the superiority of the presented technique, a series of simulations were performed, and the results were inspected under various dimensions. The resultant values highlighted the supremacy of the presented technique over the recent techniques.


Author(s):  
R. Sathya , Et. al.

In recent times, generation of big data takes place in an exponential way from diverse textual data sources like review sites, media, blogs, etc. Sentiment analysis (SA) finds it useful to classify the opinions of the big data to different kinds ofsentiments. Therefore, SA on big data helps a business to take beneficial commercial understandings from text based content. Though several SA approaches have been presented, yet, there is a need to improve the performance of SA to interpret the customer’s feedback and increase the product quality.This paper introduces a novel social spider optimization based feature selection based wavelet kernel extreme learning machine (SSO-WKELM) model. The proposed model initially undergoes pre-processing to remove the unwanted word removal. Then, Term Frequency-Inverse Document Frequency (TF-IDF) is utilized as a feature extraction technique to extract the set of feature vectors. Besides, a social spider optimization (SSO) algorithm is utilized for feature selection process and thereby achieves improved classification performance. Subsequently, WKELM is employed as a classifier to classify the incidence of positive or negative user reviews. For experimental validation, a Product review dataset derived from Amazon along with synthetic data is used. The experimental results stated the superior classification performance of the SSO-WKELM model.   


Author(s):  
Dariusz Jacek Jakobczak ◽  
Ahan Chatterjee

The huge amount of data burst which occurred with the arrival of economic access to the internet led to the rise of market of cloud computing which stores this data. And obtaining results from these data led to the growth of the “big data” industry which analyses this humongous amount of data and retrieve conclusion using various algorithms. Hadoop as a big data platform certainly uses map-reduce framework to give an analysis report of big data. The term “big data” can be defined as modern technique to store, capture, and manage data which are in the scale of petabytes or larger sized dataset with high-velocity and various structures. To address this massive growth of data or big data requires a huge computing space to ensure fruitful results through processing of data, and cloud computing is that technology that can perform huge-scale and computation which are very complex in nature. Cloud analytics does enable organizations to perform better business intelligence, data warehouse operation, and online analytical processing (OLAP).


Sign in / Sign up

Export Citation Format

Share Document