Big Data Processing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Applications and Approaches to Object-Oriented Software Design ◽

10.4018/978-1-7998-2142-7.ch005 ◽

2020 ◽

pp. 111-132

Author(s):

Can Eyupoglu

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

Scale Up ◽

It Industry ◽

Governmental Agencies ◽

Digital World ◽

Large Scale Data ◽

International Data ◽

Processing Techniques

Big data has attracted significant and increasing attention recently and has become a hot topic in the areas of IT industry, finance, business, academia, and scientific research. In the digital world, the amount of generated data has increased. According to the research of International Data Corporation (IDC), 33 zettabytes of data were created in 2018, and it is estimated that the amount of data will scale up more than five times from 2018 to 2025. In addition, the advertising sector, healthcare industry, biomedical companies, private firms, and governmental agencies have to make many investments in the collection, aggregation, and sharing of enormous amounts of data. To process this large-scale data, specific data processing techniques are used rather than conventional methodologies. This chapter deals with the concepts, architectures, technologies, and techniques that process big data.

Download Full-text

EON OF IMPLEMENTING A MULTIFACETED CLOUD BASED OCR IN APPLE’S COMPASSIONATE APP STORE MILIEU

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2016.1376 ◽

2016 ◽

pp. 235-239

Author(s):

C. Infant Louis Richards ◽

T. Yuva ◽

J.SYLVESTER BRITTO

Keyword(s):

Cloud Computing ◽

Data Processing ◽

Character Recognition ◽

Large Scale ◽

Scale Up ◽

App Store ◽

Large Scale Data ◽

One Machine ◽

Scale Data ◽

Cursive Scripts

Cloud Architectures discourse key hitches surrounding large-scale data dispensation. In customary data processing it is grim to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them. Third, it is difficult to dispense and harmonize a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto scale up and down based on dynamic workloads. Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.Optical character recognition of cursive scripts present a number of thought-provokingsnags in both segmentation and recognition processes and this entices many researches in the arena of contraption learning. This paper presents the best approach based on a mishmash of OCR and Cloud Computing to handle with the Apple’s prerequisite, to make it available in the app store to design a splendid OCR for outdoor portable documents. The enactment results on a comprehensive database show a high notch of accuracy which meets the requirements of viable use.

Download Full-text

A Dockerized Big Data Architecture for Sports Analytics

10.21203/rs.3.rs-524005/v1 ◽

2021 ◽

Author(s):

Yavuz Melih Özgüven ◽

Utku Gönener ◽

Süleyman Eken

Keyword(s):

Big Data ◽

Large Scale ◽

Ease Of Use ◽

Sports Analytics ◽

Data Intensive ◽

Central Processing ◽

Large Scale Data ◽

Data Architecture ◽

Structured Analysis ◽

Processing Techniques

Abstract The revolution of big data has also affected the area of sports analytics. Many big companies have started to see the benefits of combining sports analytics and big data to make a profit. Aggregating and processing big sport data from different sources becomes challenging if we rely on central processing techniques, which hurts the accuracy and the timeliness of the information. Distributed systems come to the rescue as a solution to these problems and the MapReduce paradigm is promising for large-scale data analytics. In this study, we present a big data architecture based on Docker containers in Apache Spark. We demonstrate the architecture on four data-intensive case studies including structured analysis, streaming, machine learning methods, and graph-based analysis in sport analytics, showing ease of use.

Download Full-text

The Data Allocation Strategy Based on Load in NoSQL Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1464 ◽

2014 ◽

Vol 513-517 ◽

pp. 1464-1469 ◽

Cited By ~ 3

Author(s):

Zhi Kun Chen ◽

Shu Qiang Yang ◽

Shuang Tan ◽

Hui Zhao ◽

Li He ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

Internet Technology ◽

Data Allocation ◽

Allocation Strategy ◽

Data Parallel ◽

Large Scale Data ◽

Nosql Database ◽

Parallel Data

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.

Download Full-text

Teaching large scale data processing

Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China - SCE '08 ◽

10.1145/1517632.1517635 ◽

2008 ◽

Author(s):

Kang Chen ◽

Yubing Yin ◽

Weimin Zheng

Keyword(s):

Data Processing ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

Advanced monitoring techniques for a large‐scale data‐processing network

Campus-Wide Information Systems ◽

10.1108/10650740810921448 ◽

2008 ◽

Vol 25 (5) ◽

pp. 287-300 ◽

Cited By ~ 1

Author(s):

B. Martin ◽

A. Al‐Shabibi ◽

S.M. Batraneanu ◽

Ciobotaru ◽

G.L. Darlea ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Monitoring Techniques ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Network ◽

Scale Data

Download Full-text

Large scale data processing in real world: From analytics to predictions

2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2014.7083870 ◽

2014 ◽

Author(s):

Srinath Perera

Keyword(s):

Data Processing ◽

Real World ◽

Large Scale ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Download Full-text

BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2012.236 ◽

2014 ◽

Vol 26 (6) ◽

pp. 1316-1331 ◽

Cited By ~ 6

Author(s):

Gang Chen ◽

Tianlei Hu ◽

Dawei Jiang ◽

Peng Lu ◽

Kian-Lee Tan ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Peer To Peer ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Processing Platform ◽

Scale Data

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

Scale Data

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Download Full-text

A Systematic Analysis of Big Image Data Methodologies in Various Applications

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2307.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 483-487

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Scale ◽

Image Data ◽

Computational Time ◽

Process Data ◽

Systematic Analysis ◽

Large Scale Data ◽

Learning Techniques ◽

Effective Performance

Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.

Download Full-text