Big Data in Cloud Computing

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch005 ◽

2021 ◽

pp. 77-84

Author(s):

Jayashree K. ◽

Swaminathan B.

Keyword(s):

Cloud Computing ◽

Big Data ◽

Social Network ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

It Services ◽

Delivery Model ◽

Research Challenges ◽

Future Direction

The huge size of data that has been produced by applications that spans from social network to scientific computing is termed big data. Cloud computing as a delivery model for IT services enhances business productivity by reducing cost. It has the intention of achieving solution for managing big data such as high dimensional data sets. Thus, this chapter discusses the background of big data and cloud computing. It also discusses the various application of big data in detail. The various related work, research challenges of big data in cloud computing, and the future direction are addressed in this chapter.

Download Full-text

A primer on high-dimensional data analysis workflows for studying visual cortex development and plasticity

10.1101/554378 ◽

2019 ◽

Cited By ~ 1

Author(s):

Justin L. Balsor ◽

David G. Jones ◽

Kathryn M. Murphy

Keyword(s):

Big Data ◽

Visual Cortex ◽

Clustering Algorithms ◽

High Dimensional Data ◽

R Package ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Dimensional Changes ◽

Or Genes

AbstractNew techniques for quantifying large numbers of proteins or genes are transforming the study of plasticity mechanisms in visual cortex (V1) into the era of big data. With those changes comes the challenge of applying new analytical methods designed for high-dimensional data. Studies of V1, however, can take advantage of the known functions that many proteins have in regulating experience-dependent plasticity to facilitate linking big data analyses with neurobiological functions. Here we discuss two workflows and provide example R code for analyzing high-dimensional changes in a group of proteins (or genes) using two data sets. The first data set includes 7 neural proteins, 9 visual conditions, and 3 regions in V1 from an animal model for amblyopia. The second data set includes 23 neural proteins and 31 ages (20d-80yrs) from human post-mortem samples of V1. Each data set presents different challenges and we describe using PCA, tSNE, and various clustering algorithms including sparse high-dimensional clustering. Also, we describe a new approach for identifying high-dimensional features and using them to construct a plasticity phenotype that identifies neurobiological differences among clusters. We include an R package “v1hdexplorer” that aggregates the various coding packages and custom visualization scripts written in R Studio.

Download Full-text

An Advanced Mining Services in Predicting and Ranking User Vitality across Dynamic and High Dimensional Data Sets

SSRN Electronic Journal ◽

10.2139/ssrn.3395242 ◽

2019 ◽

Author(s):

Ch. Durga Bhavani ◽

Dr. A. Daveedu Raju ◽

Dr. V. Surya Narayana

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Data Sets

Download Full-text

Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems ◽

10.3233/his-2012-0149 ◽

2012 ◽

Vol 9 (2) ◽

pp. 91-103 ◽

Cited By ~ 1

Author(s):

Santhosh Pathical ◽

Gursel Serpen

Keyword(s):

High Dimensional Data ◽

Classifier Ensemble ◽

High Dimensional ◽

Data Sets

Download Full-text

Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

21st International Conference on Data Engineering (ICDE'05) ◽

10.1109/icde.2005.66 ◽

2005 ◽

Cited By ~ 43

Author(s):

M.E. Houle ◽

Jun Sakuma

Keyword(s):

Similarity Search ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Approximate Similarity Search ◽

Approximate Similarity

Download Full-text

Mahalanobis distance informed by clustering

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay011 ◽

2018 ◽

Vol 8 (2) ◽

pp. 377-406

Author(s):

Almog Lahav ◽

Ronen Talmon ◽

Yuval Kluger

Keyword(s):

Mahalanobis Distance ◽

High Dimensional Data ◽

Hidden Variables ◽

Real Data ◽

Risk Groups ◽

High Dimensional ◽

Data Sets ◽

Kaplan Meier ◽

Data Points ◽

Survival Plot

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.

Download Full-text

Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review

Ingénierie des systèmes d information ◽

10.18280/isi.260107 ◽

2021 ◽

Vol 26 (1) ◽

pp. 67-77

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Big Data ◽

Large Scale ◽

High Dimensional Data ◽

Research Work ◽

Basic Feature ◽

High Dimensional ◽

Selection Methods ◽

Fast Development ◽

Improved Accuracy

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.

Download Full-text

A Survey on Comparison of Performance Analysis on a Cloud-Based Big Data Framework

10.4018/978-1-6684-3662-2.ch091 ◽

2022 ◽

pp. 1865-1875

Author(s):

Krishan Tuli ◽

Amanpreet Kaur ◽

Meenakshi Sharma

Keyword(s):

Cloud Computing ◽

Big Data ◽

It Services ◽

Huge Amount ◽

Data Framework ◽

Development Tools ◽

Cloud Framework ◽

Suitable Framework ◽

Cloud Applications ◽

Day By Day

Cloud computing is offering various IT services to many users in the work on the basis of pay-as-you-use model. As the data is increasing day by day, there is a huge requirement for cloud applications that manage such a huge amount of data. Basically, a best solution for analyzing such amounts of data and handles a large dataset. Various companies are providing such framesets for particular applications. A cloud framework is the accruement of different components which is similar to the development tools, various middleware for particular applications and various other database management services that are needed for cloud computing deployment, development and managing the various applications of the cloud. This results in an effective model for scaling such a huge amount of data in dynamically allocated recourses along with solving their complex problems. This article is about the survey on the performance of the big data framework based on a cloud from various endeavors which assists ventures to pick a suitable framework for their work and get a desired outcome.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Big Data and Clustering Techniques

Handbook of Research on Big Data Clustering and Machine Learning - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-0106-1.ch001 ◽

2020 ◽

pp. 1-9

Author(s):

Jayashree K. ◽

Chithambaramani R.

Keyword(s):

Social Media ◽

Big Data ◽

Sensor Data ◽

Log Data ◽

Clustering Techniques ◽

Research Challenges ◽

The Future ◽

Process Operations ◽

Future Direction

Big data has become a chief strength of innovation across academics, governments, and corporates. Big data comprises massive sensor data, raw and semi-structured log data of IT industries, and the exploded quantity of data from social media. Big data needs big storage, and this volume makes operations such as analytical operations, process operations, retrieval operations very difficult and time consuming. One way to overcome these difficult problems is to have big data clustered in a compact format. Thus, this chapter discusses the background of big data and clustering. It also discusses the various application of big data in detail. The various related work, research challenges of big data, and the future direction are addressed in this chapter.

Download Full-text

Double-Arc Parallel Coordinates and its Axes re-Ordering Methods

Mobile Networks and Applications ◽

10.1007/s11036-019-01455-9 ◽

2020 ◽

Vol 25 (4) ◽

pp. 1376-1391

Author(s):

Liangfu Lu ◽

Wenbo Wang ◽

Zhiyuan Tan

Keyword(s):

High Dimensional Data ◽

Two Dimensions ◽

High Dimensional ◽

Data Sets ◽

Parallel Coordinates ◽

Practical Applications ◽

Data Volume ◽

Visual Clutter ◽

Correlation Information ◽

Value Decomposition

AbstractThe Parallel Coordinates Plot (PCP) is a popular technique for the exploration of high-dimensional data. In many cases, researchers apply it as an effective method to analyze and mine data. However, when today’s data volume is getting larger, visual clutter and data clarity become two of the main challenges in parallel coordinates plot. Although Arc Coordinates Plot (ACP) is a popular approach to address these challenges, few optimization and improvement have been made on it. In this paper, we do three main contributions on the state-of-the-art PCP methods. One approach is the improvement of visual method itself. The other two approaches are mainly on the improvement of perceptual scalability when the scale or the dimensions of the data turn to be large in some mobile and wireless practical applications. 1) We present an improved visualization method based on ACP, termed as double arc coordinates plot (DACP). It not only reduces the visual clutter in ACP, but use a dimension-based bundling method with further optimization to deals with the issues of the conventional parallel coordinates plot (PCP). 2)To reduce the clutter caused by the order of the axes and reveal patterns that hidden in the data sets, we propose our first dimensional reordering method, a contribution-based method in DACP, which is based on the singular value decomposition (SVD) algorithm. The approach computes the importance score of attributes (dimensions) of the data using SVD and visualize the dimensions from left to right in DACP according the score in SVD. 3) Moreover, a similarity-based method, which is based on the combination of nonlinear correlation coefficient and SVD algorithm, is proposed as well in the paper. To measure the correlation between two dimensions and explains how the two dimensions interact with each other, we propose a reordering method based on non-linear correlation information measurements. We mainly use mutual information to calculate the partial similarity of dimensions in high-dimensional data visualization, and SVD is used to measure global data. Lastly, we use five case scenarios to evaluate the effectiveness of DACP, and the results show that our approaches not only do well in visualizing multivariate dataset, but also effectively alleviate the visual clutter in the conventional PCP, which bring users a better visual experience.

Download Full-text