Implementation of MapReduce parallel computing framework based on multi-data fusion sensors and GPU cluster

AbstractNowadays, with the rapid growth of data volume, massive data has become one of the factors that plague the development of enterprises. How to effectively process data and reduce the concurrency pressure of data access has become the driving force for the continuous development of big data solutions. This article mainly studies the MapReduce parallel computing framework based on multiple data fusion sensors and GPU clusters. This experimental environment uses a Hadoop fully distributed cluster environment, and the entire programming of the single-source shortest path algorithm based on MapReduce is implemented in Java language. 8 ordinary physical machines are used to build a fully distributed cluster, and the configuration environment of each node is basically the same. The MapReduce framework divides the request job into several mapping tasks and assigns them to different computing nodes. After the mapping process, a certain intermediate file that is consistent with the final file format is generated. At this time, the system will generate several reduction tasks and distribute these files to different cluster nodes for execution. This experiment will verify the changes in the running time of the PSON algorithm when the size of the test data set gradually increases while keeping the hardware level and software configuration of the Hadoop platform unchanged. When the number of computing nodes increases from 2 to 4, the running time is significantly reduced. When the number of computing nodes continues to increase, the reduction in running time will become less and less significant. The results show that NESTOR can complete the basic workflow of MapReduce, and simplifies the process of user development of GPU positive tree order, which has a significant speedup for applications with large amounts of calculations.

Download Full-text

Parallel Computing Framework Based on MapReduce and GPU Clusters

Proceedings of the 4th International Conference on Computer Science and Application Engineering ◽

10.1145/3424978.3425051 ◽

2020 ◽

Author(s):

Chunlei Xu ◽

Weijin Zhuang

Keyword(s):

Parallel Computing ◽

Gpu Clusters ◽

Computing Framework

Download Full-text

The Austrian Corona Panel Project: monitoring individual and societal dynamics amidst the COVID-19 crisis

European Political Science ◽

10.1057/s41304-020-00294-7 ◽

2020 ◽

Author(s):

Bernhard Kittel ◽

Sylvia Kritzinger ◽

Hajo Boomgaarden ◽

Barbara Prainsack ◽

Jakob-Moritz Eberl ◽

...

Keyword(s):

Data Access ◽

Economic Consequences ◽

Process Data ◽

Data Set ◽

Domain Specific ◽

Data Collection Process ◽

The Social ◽

The Individual ◽

Austrian Population ◽

Societal Dynamics

Abstract Systematic and openly accessible data are vital to the scientific understanding of the social, political, and economic consequences of the COVID-19 pandemic. This article introduces the Austrian Corona Panel Project (ACPP), which has generated a unique, publicly available data set from late March 2020 onwards. ACPP has been designed to capture the social, political, and economic impact of the COVID-19 crisis on the Austrian population on a weekly basis. The thematic scope of the study covers several core dimensions related to the individual and societal impact of the COVID-19 crisis. The panel survey has a sample size of approximately 1500 respondents per wave. It contains questions that are asked every week, complemented by domain-specific modules to explore specific topics in more detail. The article presents details on the data collection process, data quality, the potential for analysis, and the modalities of data access pertaining to the first ten waves of the study.

Download Full-text

TopADD: a 2D/3D integrated topology optimization parallel-computing framework for arbitrary design domains

Structural and Multidisciplinary Optimization ◽

10.1007/s00158-021-02917-z ◽

2021 ◽

Author(s):

Zhi-Dong Zhang ◽

Osezua Ibhadode ◽

Ali Bonakdar ◽

Ehsan Toyserkani

Keyword(s):

Parallel Computing ◽

Topology Optimization ◽

Computing Framework

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Snow: A Parallel Computing Framework for the R System

International Journal of Parallel Programming ◽

10.1007/s10766-008-0077-2 ◽

2008 ◽

Vol 37 (1) ◽

pp. 78-90 ◽

Cited By ~ 39

Author(s):

Luke Tierney ◽

A. J. Rossini ◽

Na Li

Keyword(s):

Parallel Computing ◽

Computing Framework

Download Full-text

Inside Trending Topic Algorithm: How Do Human Interactions Drive Public Opinion in an Artificial Environment

Social Science Computer Review ◽

10.1177/08944393211041501 ◽

2021 ◽

pp. 089443932110415

Author(s):

Vanessa Russo ◽

Emiliano del Gobbo

Keyword(s):

Public Opinion ◽

Semantic Network ◽

Opinion Leader ◽

Computational Techniques ◽

Data Sets ◽

Process Data ◽

Data Set ◽

Social Actors ◽

Human Interactions ◽

Artificial Environment

The object of this research is to exploit the algorithm of Twitter’s trending topic (TT) and identify the elements capable of guiding public opinion in the Italian panorama. The underlying hypotheses that guide the whole article, confirmed by the research results, concern the existence of (a) a limited number of elements at the base of each popular hashtag with very high viral power and (b) hashtags transversal to the themes detected by the Twitter algorithm that define specific opinion polls. Through computational techniques, it was possible to extract and process data sets from six specific hashtags highlighted by TT. In a first step through social network analysis, we analyzed the hashtag semantic network to identify the hashtags transversal to the six TTs. Subsequently, we selected for each data set the contents with high sharing power and created a “potential opinion leader” index to identify users with influencer characteristics. Finally, a cross section of social actors able to guide public opinion in the Twittersphere emerged from the intersection between potentially influential users and the viral contents.

Download Full-text

Implementasi Kebijakan Keterbukaan Akses Data Perbankan Dalam Meningkatkan Tax Compliance Di Indonesia

Jurnal Akuntansi dan Governance ◽

10.24853/jago.1.2.89-103 ◽

2021 ◽

Vol 1 (2) ◽

pp. 89

Author(s):

Lutfia Rizkyatul Akbar ◽

Gunadi Gunadi

Keyword(s):

Banking Sector ◽

Tax Compliance ◽

Data Access ◽

Assessment System ◽

First Year ◽

Process Data ◽

Self Assessment ◽

Customer Data ◽

Qualitative Descriptive ◽

Access Policies

This study aims to assess the implementation of the openness of banking data access policies to improving tax compliance in Indonesia. It cause by the implementation of tax collection using a self-assessment system, thus requiring taxpayer data and information through financial institutions, include banking. Researchers used qualitative descriptive methods. The results of this study are, first, there is support for the implementation of the policy on openness to access to banking data in increasing tax compliance in Indonesia in the form of the issuance of Law Number 9 of 2017 concerning Access to Financial Information. Second, the implementation of banking data disclosure policies to increase tax compliance in Indonesia, including the willingness of target groups to comply with policy outputs, in this case the reporting of customer data by banks to the DGT. Third, the policy of open banking data access does not impede or reduce the number of bank accounts and deposits. Fourth, there are technical obstacles both by the DGT and the banking sector, especially in the first year. Furthermore, there are several inhibiting factors in the implementation of this policy, namely IT factors, and resistance from some circles at the beginning of the emergence of regulations, limited financial resources to process data quickly, so it must be done gradually, in addition to lack of quantity and quality of human resources

Download Full-text

A Practical Approach for Scalable Record Linkage on Hadoop

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.753-755.3018 ◽

2013 ◽

Vol 753-755 ◽

pp. 3018-3024 ◽

Cited By ~ 1

Author(s):

Fen Gyu Yang ◽

Ying Chen ◽

Ye Zhang

Keyword(s):

Parallel Computing ◽

Record Linkage ◽

Practical Approach ◽

Massive Data ◽

Traditional Methods ◽

High Recall ◽

Data Parallel ◽

Computing Framework ◽

Data Parallel Computing ◽

Hadoop Cluster

As increasing data have been collected in many applications, we have to face with millions of data in record linkage. With respect to traditional methods, there comes out a big challenge in performance while dealing with massive data. Parallel computing framework, such as MapReduce, has become an efficient and practical way to address this problem. In this paper, we propose a practical 3-phase MapReduce approach that fulfills blocking, filtering, and linking in 3 consecutive processes on Hadoop cluster. Experiments show that our approach functions efficiently and effectively with keeping high recall in contrast to tradition method.

Download Full-text

VODCA: View-Oriented, Distributed, Cluster-Based Approach to Parallel Computing

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06) ◽

10.1109/ccgrid.2006.1630910 ◽

2006 ◽

Cited By ~ 4

Author(s):

Z. Huang ◽

W. Chen ◽

M. Purvis ◽

W. Zheng

Keyword(s):

Parallel Computing ◽

Distributed Cluster

Download Full-text

In-Process Data Fusion for Process Monitoring and Control of Metal Additive Manufacturing

10.1115/detc2021-71813 ◽

2021 ◽

Author(s):

Zhuo Yang ◽

Yan Lu ◽

Simin Li ◽

Jennifer Li ◽

Yande Ndiaye ◽

...

Keyword(s):

Additive Manufacturing ◽

Process Control ◽

Data Fusion ◽

Real Time ◽

Process Data ◽

Multi Scale ◽

And Control ◽

Fusion Framework ◽

Scale Data ◽

Metal Additive

Abstract To accelerate the adoption of Metal Additive Manufacturing (MAM) for production, an understanding of MAM process-structure-property (PSP) relationships is indispensable for quality control. A multitude of physical phenomena involved in MAM necessitates the use of multi-modal and in-process sensing techniques to model, monitor and control the process. The data generated from these sensors and process actuators are fused in various ways to advance our understanding of the process and to estimate both process status and part-in-progress states. This paper presents a hierarchical in-process data fusion framework for MAM, consisting of pointwise, trackwise, layerwise and partwise data analytics. Data fusion can be performed at raw data, feature, decision or mixed levels. The multi-scale data fusion framework is illustrated in detail using a laser powder bed fusion process for anomaly detection, material defect isolation, and part quality prediction. The multi-scale data fusion can be generally applied and integrated with real-time MAM process control, near-real-time layerwise repairing and buildwise decision making. The framework can be utilized by the AM research and standards community to rapidly develop and deploy interoperable tools and standards to analyze, process and exploit two or more different types of AM data. Common engineering standards for AM data fusion systems will dramatically improve the ability to detect, identify and locate part flaws, and then derive optimal policies for process control.

Download Full-text