large scale data Latest Research Papers

A Dynamic Scaling Approach in Hadoop YARN

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.286176 ◽

2022 ◽

Vol 12 (2) ◽

pp. 0-0

Keyword(s):

Large Scale ◽

Distributed Processing ◽

Dynamic Scaling ◽

Data Sets ◽

Large Scale Data ◽

The People ◽

Big Data Applications ◽

Scaling Process ◽

And Performance

In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.

Download Full-text

An Evaluation of Supervised Dimensionality Reduction For Large Scale Data

Journal of Machine and Computing ◽

10.53759/7669/jmc202202003 ◽

2022 ◽

pp. 17-25

Author(s):

Nancy Jan Sliper

Keyword(s):

Dimensionality Reduction ◽

Large Scale ◽

Simulated Data ◽

Principal Component ◽

Low Rank ◽

Learning Tools ◽

Large Scale Data ◽

Reduction Methods ◽

Low Dimensional ◽

Scale Data

Experimenters today frequently quantify millions or even billions of characteristics (measurements) each sample to address critical biological issues, in the hopes that machine learning tools would be able to make correct data-driven judgments. An efficient analysis requires a low-dimensional representation that preserves the differentiating features in data whose size and complexity are orders of magnitude apart (e.g., if a certain ailment is present in the person's body). While there are several systems that can handle millions of variables and yet have strong empirical and conceptual guarantees, there are few that can be clearly understood. This research presents an evaluation of supervised dimensionality reduction for large scale data. We provide a methodology for expanding Principal Component Analysis (PCA) by including category moment estimations in low-dimensional projections. Linear Optimum Low-Rank (LOLR) projection, the cheapest variant, includes the class-conditional means. We show that LOLR projections and its extensions enhance representations of data for future classifications while retaining computing flexibility and reliability using both experimental and simulated data benchmark. When it comes to accuracy, LOLR prediction outperforms other modular linear dimension reduction methods that require much longer computation times on conventional computers. LOLR uses more than 150 million attributes in brain image processing datasets, and many genome sequencing datasets have more than half a million attributes.

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text

Mapping the Collaborative Platform Economy Business Practice

10.4018/978-1-7998-7545-1.ch003 ◽

2022 ◽

pp. 52-80

Author(s):

Shouheng Sun ◽

Dafei Yang ◽

Xue Yan

Keyword(s):

Large Scale ◽

Business Practice ◽

Full Spectrum ◽

Large Scale Data ◽

Public Media ◽

Empirical Typology ◽

Collaborative Platform ◽

Almost All ◽

Status Data ◽

Platform Economy

This study aims to develop a typological configuration that characterizes the full spectrum of collaborative platform economy business practice in the real world. The analysis is conducted on the basis of a large-scale data set which contains information on 1,335 representative platforms in more than 60 countries on five continents, covering almost all collaborative platform economy business practices mentioned in academic journals and public media. Leveraging the k-means clustering method, an empirical typology comprising seven categories of collaborative platform economy business practice is proposed: collaborative support platform, resource supply platform, authentic C2C platform, C2C mutualized mobility platform, hybrid service platform, B2C service platforms, collaborative finance platform. In addition, with the help of operating status data of the collaborative platform economy, a cross-comparative analysis was also carried out on the category differences and geographic differences.

Download Full-text

Efficient estimation and computation in generalized varying coefficient models with unknown link and variance functions for large-scale data

Statistica Sinica ◽

10.5705/ss.202020.0063 ◽

2022 ◽

Author(s):

Huazhen Lin ◽

Jiaxin Liu ◽

Haoqi Li ◽

Lixian Pan ◽

Yi Li

Keyword(s):

Large Scale ◽

Efficient Estimation ◽

Varying Coefficient Models ◽

Varying Coefficient ◽

Large Scale Data ◽

Variance Functions ◽

Scale Data ◽

Unknown Link

Download Full-text

Artificial Neural Network Models for Large-Scale Data

10.4018/978-1-6684-2408-7.ch006 ◽

2022 ◽

pp. 112-145

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Large Scale ◽

Network Models ◽

Data Sets ◽

Neural Network Models ◽

Large Scale Data ◽

The World ◽

Commercial Applications ◽

Artificial Neural Network Models ◽

Scale Data ◽

Large Scale Data Sets

Artificial intelligence (ARTINT) and information have been famous fields for many years. A reason has been that many different areas have been promoted quickly based on the ARTINT and information, and they have created many significant values for many years. These crucial values have certainly been used more and more for many economies of the countries in the world, other sciences, companies, organizations, etc. Many massive corporations, big organizations, etc. have been established rapidly because these economies have been developed in the strongest way. Unsurprisingly, lots of information and large-scale data sets have been created clearly from these corporations, organizations, etc. This has been the major challenges for many commercial applications, studies, etc. to process and store them successfully. To handle this problem, many algorithms have been proposed for processing these big data sets.

Download Full-text

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

Future Internet ◽

10.3390/fi14010019 ◽

2021 ◽

Vol 14 (1) ◽

pp. 19

Author(s):

Zineddine Kouahla ◽

Ala-Eddine Benrazek ◽

Mohamed Amine Ferrag ◽

Brahim Farou ◽

Hamid Seridi ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Search Time ◽

Large Data ◽

Open Problems ◽

Large Scale Data ◽

Indexing Techniques ◽

Efficient Retrieval ◽

Data Collections ◽

Scale Data

The past decade has been characterized by the growing volumes of data due to the widespread use of the Internet of Things (IoT) applications, which introduced many challenges for efficient data storage and management. Thus, the efficient indexing and searching of large data collections is a very topical and urgent issue. Such solutions can provide users with valuable information about IoT data. However, efficient retrieval and management of such information in terms of index size and search time require optimization of indexing schemes which is rather difficult to implement. The purpose of this paper is to examine and review existing indexing techniques for large-scale data. A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme. The real-world applications of the existing indexing techniques in different areas, such as health, business, scientific experiments, and social networks, are presented. Open problems and research challenges, e.g., privacy and large-scale data mining, are also discussed.

Download Full-text

SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning

Applied Sciences ◽

10.3390/app12010292 ◽

2021 ◽

Vol 12 (1) ◽

pp. 292

Author(s):

Yunyong Ko ◽

Sang-Wook Kim

Keyword(s):

Large Scale ◽

Heterogeneous Environments ◽

Local Models ◽

Training Algorithm ◽

Distributed Training ◽

Large Scale Data ◽

Synchronization Overhead ◽

The Difference ◽

Asynchronous Training ◽

Scale Data

The recent unprecedented success of deep learning (DL) in various fields is underlied by its use of large-scale data and models. Training a large-scale deep neural network (DNN) model with large-scale data, however, is time-consuming. To speed up the training of massive DNN models, data-parallel distributed training based on the parameter server (PS) has been widely applied. In general, a synchronous PS-based training suffers from the synchronization overhead, especially in heterogeneous environments. To reduce the synchronization overhead, asynchronous PS-based training employs the asynchronous communication between PS and workers so that PS processes the request of each worker independently without waiting. Despite the performance improvement of asynchronous training, however, it inevitably incurs the difference among the local models of workers, where such a difference among workers may cause slower model convergence. Fro addressing this problem, in this work, we propose a novel asynchronous PS-based training algorithm, SHAT that considers (1) the scale of distributed training and (2) the heterogeneity among workers for successfully reducing the difference among the local models of workers. The extensive empirical evaluation demonstrates that (1) the model trained by SHAT converges to the higher accuracy up to 5.22% than state-of-the-art algorithms, and (2) the model convergence of SHAT is robust under various heterogeneous environments.

Download Full-text

What Might Books Be Teaching Young Children About Gender?

Psychological Science ◽

10.1177/09567976211024643 ◽

2021 ◽

pp. 095679762110246

Author(s):

Molly Lewis ◽

Matt Cooper Borkenhagen ◽

Ellen Converse ◽

Gary Lupyan ◽

Mark S. Seidenberg

Keyword(s):

Young Children ◽

Gender Stereotypes ◽

Large Scale ◽

Children's Books ◽

Children’S Books ◽

Large Scale Data ◽

Occurrence Data ◽

Gender Biases ◽

Scale Data ◽

Early Source

We investigated how gender is represented in children’s books using a novel 200,000-word corpus comprising 247 popular, contemporary books for young children. Using adult human judgments and word co-occurrence data, we quantified gender biases of words in individual books and in the whole corpus. We found that children’s books contain many words that adults judge as gendered. Semantic analyses based on co-occurrence data yielded word clusters related to gender stereotypes (e.g., feminine: emotions; masculine: tools). Co-occurrence data also indicated that many books instantiate gender stereotypes identified in other research (e.g., girls are better at reading, and boys are better at math). Finally, we used large-scale data to estimate the gender distribution of the audience for individual books, and we found that children are more often exposed to stereotypes for their own gender. Together, the data suggest that children’s books may be an early source of gender associations and stereotypes.

Download Full-text

Multimedia Image Encryption Analysis Based on High-Dimensional Chaos Algorithm

Advances in Multimedia ◽

10.1155/2021/7384170 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Xing Zhang

Keyword(s):

Image Encryption ◽

Calculation Method ◽

Large Scale ◽

Multimedia Communication ◽

Data Encryption ◽

High Dimensional ◽

Mixed Pixel ◽

Large Scale Data ◽

Graph Mapping ◽

The Lorenz System

With the development of network and multimedia technology, multimedia communication has attracted the attention of researchers. Image encryption has become an urgent need for secure multimedia communication. Compared with the traditional encryption system, encryption algorithms based on chaos are easier to implement, which makes them more suitable for large-scale data encryption. The calculation method of image encryption proposed in this paper is a combination of high-dimensional chaotic systems. This algorithm is mainly used for graph mapping and used the Lorenz system to expand and replace them one by one. Studies have shown that this calculation method causes mixed pixel values, good diffusion performance, and strong key performance with strong resistance. The pixel of the encrypted picture is distributed relatively random, and the characteristics of similar loudness are not relevant. It is proved through experiments that the above calculation methods have strong safety performance.

Download Full-text

large scale data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Dynamic Scaling Approach in Hadoop YARN

An Evaluation of Supervised Dimensionality Reduction For Large Scale Data

Neural Network for Big Data Sets

Mapping the Collaborative Platform Economy Business Practice

Efficient estimation and computation in generalized varying coefficient models with unknown link and variance functions for large-scale data

Artificial Neural Network Models for Large-Scale Data

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning

What Might Books Be Teaching Young Children About Gender?

Multimedia Image Encryption Analysis Based on High-Dimensional Chaos Algorithm

Export Citation Format

large scale dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Dynamic Scaling Approach in Hadoop YARN

An Evaluation of Supervised Dimensionality Reduction For Large Scale Data

Neural Network for Big Data Sets

Mapping the Collaborative Platform Economy Business Practice

Efficient estimation and computation in generalized varying coefficient models with unknown link and variance functions for large-scale data

Artificial Neural Network Models for Large-Scale Data

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning

What Might Books Be Teaching Young Children About Gender?

Multimedia Image Encryption Analysis Based on High-Dimensional Chaos Algorithm

large scale data
Recently Published Documents