A Big Data Query Optimization Framework for Telecom Customer Churn Analysis

Abstract In today's world, most of the private and public sector organizations deal with massive amounts of raw data, which includes information and knowledge in their secret layer. In addition, the format, scale, variety, and velocity of generated data make it more difficult to use the algorithms in an efficient manner. This complexity necessitates the use of sophisticated methods, strategies, and algorithms to solve the challenges of managing raw data. Big data query optimization (BDQO) requires businesses to define, diagnose, forecast, prescribe, and cognize hidden growth opportunities and guiding them toward achieving market value. BDQO uses advanced analytical methods to extract information from an increasingly growing volume of data, resulting in a reduction in the difficulty of the decision-making process. Hadoop, Apache Hive, No SQL, Map Reduce, and HPCC are the technologies used in big data applications to manage large data. It is less costly to consume data for query processing because big data provides scalability. However, small businesses will never be able to query large databases. Joining tables with millions of tuples could take hours. Parallelism, which solves the problem by using more processors, may be a potential solution. Unfortunately, small businesses cannot afford to operate on a shoestring budget. There are many techniques to tackle the problem. The technologies used in the big data query optimization process are discussed in depth in this paper.

Download Full-text

Significance-Based Feature Extraction for Customer Churn Prediction Data in the Telecom Sector

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8303 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3428-3431

Author(s):

Kamya Eria ◽

Booma Poolan Marikannan

Keyword(s):

Feature Extraction ◽

Big Data ◽

Real Time ◽

Missing Values ◽

Service Providers ◽

Prediction Errors ◽

Churn Prediction ◽

Logistic Regression Models ◽

Customer Churn ◽

Churn Analysis

The telecom industry is saturated with many service providers competing for highly rational customers. The current big data and highly technological era calls for real-time churn analysis and decision making which has also been highlighted in previous studies. However, telecom data is highly dimensional in nature thus when this is coupled with this big data era increases the computational and processing costs. Therefore, this complexity and dimensionality of telecom data coupled with the current need for near or real-time churn analysis demands feature selection-based models that efficiently consider the most relevant variables in explaining customer churn behaviors. This study proposes a feature extraction-based churn prediction model that concentrates on the most relevant features with significant discriminatory power for churn. The data has been reduced on the basis of missing values and irrelevant variables. Irrelevant variables were first identified by use of Random Forest and Logistic Regression models. The findings of the study provide churn analysts with insights about the prediction errors to consider and minimize in their future churn analyses. It also contributes to reducing computational costs incurred by churn analysts working with big data in their churn prediction and analysis.

Download Full-text

Research on Big Data Query Optimization Method of Power System Substation Equipment Condition Monitoring

10.1109/icpsasia52756.2021.9621504 ◽

2021 ◽

Author(s):

Lixia Wang ◽

Dawei Wang ◽

Wei Li

Keyword(s):

Big Data ◽

Power System ◽

Condition Monitoring ◽

Query Optimization ◽

Optimization Method ◽

Data Query ◽

Substation Equipment

Download Full-text

Adaptive correlation exploitation in big data query optimization

The VLDB Journal ◽

10.1007/s00778-018-0515-8 ◽

2018 ◽

Vol 27 (6) ◽

pp. 873-898 ◽

Cited By ~ 2

Author(s):

Yuchen Liu ◽

Hai Liu ◽

Dongqing Xiao ◽

Mohamed Y. Eltabakh

Keyword(s):

Big Data ◽

Query Optimization ◽

Data Query

Download Full-text

Real-Time Internet of Things (IOT) Application Big Data Stream Graph Optimization Framework

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i8.163167 ◽

2019 ◽

Vol 7 (8) ◽

pp. 163-167

Author(s):

Sharmila G.

Keyword(s):

Big Data ◽

Internet Of Things ◽

Real Time ◽

Data Stream ◽

Optimization Framework ◽

Graph Optimization

Download Full-text

An Optimal Framework for Spatial Query Optimization Using Hadoop in Big Data Analytics

Recent Patents on Computer Science ◽

10.2174/2213275912666190419215231 ◽

2019 ◽

Vol 12 ◽

Author(s):

Pankaj Dadheech ◽

Dinesh Goyal ◽

Sumit Srivastava ◽

Ankit Kumar

Keyword(s):

Big Data ◽

Query Optimization ◽

Spatial Data ◽

Spatial Information ◽

Big Data Analytics ◽

Spatial Query ◽

Data Process ◽

Boolean Queries ◽

Spatial Query Optimization ◽

Hadoop System

Spatial queries frequently used in Hadoop for significant data process. However, vast and massive size of spatial information makes it difficult to process the spatial inquiries proficiently, so they utilized the Hadoop system for process Big Data. We have used Boolean Queries & Geometry Boolean Spatial Data for Query Optimization using Hadoop System. In this paper, we show a lightweight and adaptable spatial data index for big data which will process in Hadoop frameworks. Results demonstrate the proficiency and adequacy of our spatial ordering system for various spatial inquiries.

Download Full-text

Mobile Big Data Query Based on Double R-Tree and Double Indexing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.916 ◽

2013 ◽

Vol 756-759 ◽

pp. 916-921

Author(s):

Ye Liang

Keyword(s):

Big Data ◽

Dynamic Condition ◽

Index Structure ◽

Limited Area ◽

Universal Method ◽

Mobile Objects ◽

Data Query ◽

Mobile Big Data ◽

Structure Query ◽

Window Query

The amount of data in our industry and the world is exploding. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data, which is also called big data, but also to analyze and query from it. In order to put forward the universal method to response mobile big data query, queries are separated and grouped according to kinds of query for massive mobile objects in the space. The indexing method for grouping the mobile objects with Grid (GG TPR-tree) has great efficiency to manage a massive capacity of mobile objects within a limited area, but it only could meet a part of requirements for mobile big data query if the GG TPR-tree was used solely. This thesis offers solutions to simple immediate query, simple continuous query, active window query, and continuous window query, dynamic condition query and other query requests by employing DTDI index structure. The experiments prove that with the support of DTDI index structure, query of massive mobile objects has higher precision and better query performance.

Download Full-text

Customer churn analysis : A case study on the telecommunication industry of Thailand

2017 12th International Conference for Internet Technology and Secured Transactions (ICITST) ◽

10.23919/icitst.2017.8356410 ◽

2017 ◽

Cited By ~ 2

Author(s):

Paweena Wanchai

Keyword(s):

Telecommunication Industry ◽

Customer Churn ◽

Churn Analysis

Download Full-text

A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20183640768 ◽

2018 ◽

Vol 36 (4) ◽

pp. 768-777

Author(s):

Jintao Gao ◽

Zhanhuai Li ◽

Wenjie Liu

Keyword(s):

Big Data ◽

Query Optimization ◽

Cardinality Estimation ◽

Estimation Strategy ◽

Query Result ◽

Original Table ◽

Data Statistics ◽

Low Efficiency

Cardinality estimation is an important component of query optimization. Its accuracy and efficiency directly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strategy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR directly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality services for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.

Download Full-text