spark framework Latest Research Papers

The churn prediction based on telecom data has been paid great attention because of the increasing the number telecom providers, but due to inconsistent data, sparsity, and hugeness, the churn prediction becomes complicated and challenging. Hence, an effective and optimal prediction of churns mechanism, named adaptive firefly-spider optimization (adaptive FSO) algorithm, is proposed in this research to predict the churns using the telecom data. The proposed churn prediction method uses telecom data, which is the trending domain of research in predicting the churns; hence, the classification accuracy is increased. However, the proposed adaptive FSO algorithm is designed by integrating the spider monkey optimization (SMO), firefly optimization algorithm (FA), and the adaptive concept. The input data is initially given to the master node of the spark framework. The feature selection is carried out using Kendall’s correlation to select the appropriate features for further processing. Then, the selected unique features are given to the master node to perform churn prediction. Here, the churn prediction is made using a deep convolutional neural network (DCNN), which is trained by the proposed adaptive FSO algorithm. Moreover, the developed model obtained better performance using the metrics, like dice coefficient, accuracy, and Jaccard coefficient by varying the training data percentage and selected features. Thus, the proposed adaptive FSO-based DCNN showed improved results with a dice coefficient of 99.76%, accuracy of 98.65%, Jaccard coefficient of 99.52%.

Basketball Data Analysis Based on Spark Framework and K-means Algorithm

10.1007/978-3-030-89511-2_116 ◽

2021 ◽

pp. 853-857

Author(s):

Ning Zhu ◽

Qiongjie Dai

Keyword(s):

Data Analysis ◽

Spark Framework

Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud

10.5753/wscad.2021.18523 ◽

2021 ◽

Author(s):

Alan L. Nunes ◽

Alba Cristina Magalhaes Alves de Melo ◽

Cristina Boeres ◽

Daniel de Oliveira ◽

Lúcia Maria de Assumpção Drummond

Keyword(s):

Fault Tolerance ◽

South America ◽

Virtual Machines ◽

Spot Markets ◽

Computational Costs ◽

On Demand ◽

Financial Costs ◽

Execution Times ◽

And Storage ◽

Spark Framework

In this paper, we developed a Spark application, named Diff Sequences Spark, which compares 540 SARS-CoV-2 sequences from South America in Amazon EC2 Cloud, generating as output the positions where the differences occur. We analyzed the performance of the proposed application on selected memory and storage optimized virtual machines (VMs) at on-demand and spot markets. The execution times and financial costs of the memory optimized VMs outperformed the storage optimized ones. Regarding the markets, Diff Sequences Spark reduced the average execution times and monetary costs when using spot VMs compared to their respective on-demand VMs, even in scenarios with several spot revocations, benefiting from the low overhead fault tolerance Spark framework.

Performance Optimization of a Parallel Error Correction Tool

Engineering Proceedings ◽

10.3390/engproc2021007034 ◽

2021 ◽

Vol 7 (1) ◽

pp. 34

Author(s):

Marco Martínez-Sánchez ◽

Roberto R. Expósito ◽

Juan Touriño

Keyword(s):

Big Data ◽

Next Generation Sequencing ◽

Error Correction ◽

Performance Optimization ◽

Core Cluster ◽

Improved Performance ◽

Next Generation Sequencing Ngs ◽

Spark Framework ◽

Generation Sequencing

Due to the continuous development in the field of Next Generation Sequencing (NGS) technologies that have allowed researchers to take advantage of greater genetic samples in less time, it is a matter of relevance to improve the existing algorithms aimed at the enhancement of the quality of those generated reads. In this work, we present a Big Data tool implemented upon the open-source Apache Spark framework that is able to execute validated error-correction algorithms at an improved performance. The experimental evaluation conducted on a multi-core cluster has shown significant improvements in execution times, providing a maximum speedup of 9.5 over existing error correction tools when processing an NGS dataset with 25 million reads.

Big Data-aware News Recommendation System According to Regional Twitter Users’ Interests

10.21203/rs.3.rs-392181/v1 ◽

2021 ◽

Author(s):

Maryam Bagheri ◽

Shahram Jamali ◽

Reza Fotohi

Keyword(s):

Big Data ◽

Recommendation System ◽

Research Area ◽

Small Data ◽

City Region ◽

Home Pages ◽

News Websites ◽

Twitter Users ◽

News Recommendation ◽

Spark Framework

Abstract Nowadays with the development of technology and access to the Internet everywhere for everyone, the interest to get the news from newspapers and other traditional media is decreasing. Therefore, the popularity of news websites is ascending as the newspapers are changing into electronic versions. News websites can be accessed from anywhere, i.e., any country, city, region, etc. So, the need to present the news depends on where the reader is from can be a research area, as with facing with variety of news topics on websites readers prefer to choose those which more often show the news, they are interested in on their home pages. Based on this idea we represent the technique to find favorite topics of Twitter users of certain geographical districts to provide news websites a way of increasing popularity. In this work we processed tweets. It seems that tweets are some small data, but we found out that processing this small data needs a lot of time, due to the repetition of the algorithm a lot and many searches to be done. Therefore, we categorized our work as big data. To help this problem we developed our work in the Spark framework. Our technique includes 2 phases; Feature Extraction Phase and Topic Discovery Phase. Our analysis shows that with this technique we can get the accuracy between 68% and 76%, in 3 developments 3-fold, 5-fold, and 10-fold.

Multi-dimensional data analysis technology of business application system based on Spark framework

Journal of Physics Conference Series ◽

10.1088/1742-6596/2010/1/012067 ◽

2021 ◽

Vol 2010 (1) ◽

pp. 012067

Author(s):

Changchao Dong ◽

Yanbin Jiao ◽

Youyong Chen ◽

Lanxian Feng

Keyword(s):

Data Analysis ◽

Application System ◽

Business Application ◽

Spark Framework

Basketball Data Analysis Using Spark Framework and K-Means Algorithm

Journal of Healthcare Engineering ◽

10.1155/2021/6393560 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Xijun Hong

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Search Algorithm ◽

Rapid Development ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Suggested Approach ◽

Training Impact ◽

Experimental Findings ◽

Spark Framework

With the rapid development, different information relating to sports may now be recorded forms of useful big data through wearable and sensing technology. Big data technology has become a pressing challenge to tackle in the present basketball training, which improves the effect of baseball analysis. In this study, we propose the Spark framework based on in-memory computing for big data processing. First, we use a new swarm intelligence optimization cuckoo search algorithm because the algorithm has fewer parameters, powerful global search ability, and support of fast convergence. Second, we apply the traditional K-clustering algorithm to improve the final output using clustering means in Spark distributed environment. Last, we examine the aspects that could lead to high-pressure game circumstances to study professional athletes’ defensive performance. Both recruiters and trainers may use our technique to better understand essential player’s qualities and eventually, to assess and improve a team’s performance. The experimental findings reveal that the suggested approach outperforms previous methods in terms of clustering performance and practical utility. It has the greatest influence on the shooting training impact when moving, yielding complimentary outcomes in the training effect.

Rider Chaotic Biography Optimization-driven Deep Stacked Auto-encoder for Big Data Classification Using Spark Architecture

International Journal of Web Services Research ◽

10.4018/ijwsr.2021070103 ◽

2021 ◽

Vol 18 (3) ◽

pp. 42-62

Author(s):

Anilkumar V Brahmane ◽

Chaitanya B Krishna

Keyword(s):

Feature Selection ◽

Big Data ◽

Optimization Algorithm ◽

Imbalanced Data ◽

Data Classification ◽

Software Tools ◽

Novel Technique ◽

Big Data Classification ◽

Spark Framework ◽

Day By Day

The novelty in big data is rising day-by-day in such a way that the existing software tools face difficulty in supervision of big data. Furthermore, the rate of the imbalanced data in the huge datasets is a key constraint to the research industry. Thus, this paper proposes a novel technique for handling the big data using Spark framework. The proposed technique undergoes two steps for classifying the big data, which involves feature selection and classification, which is performed in the initial nodes of Spark architecture. The proposed optimization algorithm is named rider chaotic biography optimization (RCBO) algorithm, which is the integration of the rider optimization algorithm (ROA) and the standard chaotic biogeography-based optimisation (CBBO). The proposed RCBO deep-stacked auto-encoder using Spark framework effectively handles the big data for attaining effective big data classification. Here, the proposed RCBO is employed for selecting suitable features from the massive dataset.

Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35540 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 2728-2732

Author(s):

Gangadhara Rao Kommu

Keyword(s):

Performance Evaluation ◽

Map Reduce ◽

Time Data ◽

Amazon Ec2 ◽

Spark Framework ◽

Machine Image ◽

Compute Time

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. We focus on the comparison of TeraSort algorithm on the different distributed platforms with different configurations of the resources. We have considered the parameters of measure of efficiency as Compute Time, Data Read, Data Write, Compute Time, and Speedup. We have conducted experiments using Hadoop map reduce and Spark (Java). We empirically evaluate the performance of TeraSort algorithm on Amazon EC2 Machine Images, and demonstrate that it achieves 3.95 × - 2.4 × speedup, compared with TeraSort, for typical settings of interest.

Person’s multiple intelligence classification based on tweet post using SentiStrength and processed on the Apache Spark framework

Journal of Physics Conference Series ◽

10.1088/1742-6596/1882/1/012125 ◽

2021 ◽

Vol 1882 (1) ◽

pp. 012125

Author(s):

B Siregar ◽

M N Misyuari ◽

E B Nababan ◽

Fahmi

Keyword(s):

Multiple Intelligence ◽

Apache Spark ◽

Spark Framework

spark framework
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Optimization-Enabled Neural Networks to Handle the Imbalance Churn Data in Churn Prediction

Basketball Data Analysis Based on Spark Framework and K-means Algorithm

Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud

Performance Optimization of a Parallel Error Correction Tool

Big Data-aware News Recommendation System According to Regional Twitter Users’ Interests

Multi-dimensional data analysis technology of business application system based on Spark framework

Basketball Data Analysis Using Spark Framework and K-Means Algorithm

Rider Chaotic Biography Optimization-driven Deep Stacked Auto-encoder for Big Data Classification Using Spark Architecture

Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm

Person’s multiple intelligence classification based on tweet post using SentiStrength and processed on the Apache Spark framework

Export Citation Format

spark frameworkRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive Optimization-Enabled Neural Networks to Handle the Imbalance Churn Data in Churn Prediction

Basketball Data Analysis Based on Spark Framework and K-means Algorithm

Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud

Performance Optimization of a Parallel Error Correction Tool

Big Data-aware News Recommendation System According to Regional Twitter Users’ Interests

Multi-dimensional data analysis technology of business application system based on Spark framework

Basketball Data Analysis Using Spark Framework and K-Means Algorithm

Rider Chaotic Biography Optimization-driven Deep Stacked Auto-encoder for Big Data Classification Using Spark Architecture

Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm

Person’s multiple intelligence classification based on tweet post using SentiStrength and processed on the Apache Spark framework

spark framework
Recently Published Documents