Big Data Processing

Author(s):  
Can Eyupoglu

Big data has attracted significant and increasing attention recently and has become a hot topic in the areas of IT industry, finance, business, academia, and scientific research. In the digital world, the amount of generated data has increased. According to the research of International Data Corporation (IDC), 33 zettabytes of data were created in 2018, and it is estimated that the amount of data will scale up more than five times from 2018 to 2025. In addition, the advertising sector, healthcare industry, biomedical companies, private firms, and governmental agencies have to make many investments in the collection, aggregation, and sharing of enormous amounts of data. To process this large-scale data, specific data processing techniques are used rather than conventional methodologies. This chapter deals with the concepts, architectures, technologies, and techniques that process big data.

Author(s):  
C. Infant Louis Richards ◽  
T. Yuva ◽  
J.SYLVESTER BRITTO

Cloud Architectures discourse key hitches surrounding large-scale data dispensation. In customary data processing it is grim to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them. Third, it is difficult to dispense and harmonize a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto scale up and down based on dynamic workloads. Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.Optical character recognition of cursive scripts present a number of thought-provokingsnags in both segmentation and recognition processes and this entices many researches in the arena of contraption learning. This paper presents the best approach based on a mishmash of OCR and Cloud Computing to handle with the Apple’s prerequisite, to make it available in the app store to design a splendid OCR for outdoor portable documents. The enactment results on a comprehensive database show a high notch of accuracy which meets the requirements of viable use.


2021 ◽  
Author(s):  
Yavuz Melih Özgüven ◽  
Utku Gönener ◽  
Süleyman Eken

Abstract The revolution of big data has also affected the area of sports analytics. Many big companies have started to see the benefits of combining sports analytics and big data to make a profit. Aggregating and processing big sport data from different sources becomes challenging if we rely on central processing techniques, which hurts the accuracy and the timeliness of the information. Distributed systems come to the rescue as a solution to these problems and the MapReduce paradigm is promising for large-scale data analytics. In this study, we present a big data architecture based on Docker containers in Apache Spark. We demonstrate the architecture on four data-intensive case studies including structured analysis, streaming, machine learning methods, and graph-based analysis in sport analytics, showing ease of use.


2014 ◽  
Vol 513-517 ◽  
pp. 1464-1469 ◽  
Author(s):  
Zhi Kun Chen ◽  
Shu Qiang Yang ◽  
Shuang Tan ◽  
Hui Zhao ◽  
Li He ◽  
...  

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.


2008 ◽  
Vol 25 (5) ◽  
pp. 287-300 ◽  
Author(s):  
B. Martin ◽  
A. Al‐Shabibi ◽  
S.M. Batraneanu ◽  
Ciobotaru ◽  
G.L. Darlea ◽  
...  

2014 ◽  
Vol 26 (6) ◽  
pp. 1316-1331 ◽  
Author(s):  
Gang Chen ◽  
Tianlei Hu ◽  
Dawei Jiang ◽  
Peng Lu ◽  
Kian-Lee Tan ◽  
...  

2021 ◽  
Author(s):  
Mohammad Hassan Almaspoor ◽  
Ali Safaei ◽  
Afshin Salajegheh ◽  
Behrouz Minaei-Bidgoli

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.


Sign in / Sign up

Export Citation Format

Share Document