Scientific Data Processing Using Mapreduce in Cloud Environments

Kong Xiangsheng

doi:10.3923/itj.2013.7869.7873

SciDP: Support HPC and Big Data Applications via Integrated Scientific Data Processing

2018 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2018.00023 ◽

2018 ◽

Cited By ~ 4

Author(s):

Kun Feng ◽

Xian-He Sun ◽

Xi Yang ◽

Shujia Zhou

Keyword(s):

Big Data ◽

Data Processing ◽

Scientific Data ◽

Big Data Applications

Download Full-text

THE GWAC DATA PROCESSING AND MANAGEMENT SYSTEM

Revista Mexicana de Astronomía y Astrofísica Serie de Conferencias ◽

10.22201/ia.14052059p.2021.53.36 ◽

2021 ◽

Vol 53 ◽

pp. 174-179

Author(s):

Y. Xu ◽

L. P. Xin ◽

X. H. Han ◽

H. B. Cai ◽

L. Huang ◽

...

Keyword(s):

Data Processing ◽

User Interfaces ◽

Management System ◽

Scientific Data ◽

Data Archiving ◽

Processing Pipeline ◽

Advanced Technologies ◽

Single Frame ◽

Moonless Night ◽

Processing Requirement

GWAC will have been built an integrated FOV of 5,000 degree2 and have already built 1,800 square degree2. The limit magnitude of a 10-second exposure image in the moonless night is 16R. In each observation night, GWAC produces about 0.7TB of raw data, and the data processing pipeline generates millions of single frame alerts. We describe the GWAC Data Processing and Management System (GPMS), including hardware architecture, database, detection-filtering-validation of transient candidates, data archiving, and user interfaces for the check of transient and the monitor of the system. GPMS combines general technology and software in astronomy and computer field, and use some advanced technologies such as deep learning. Practical results show that GPMS can fully meet the scientific data processing requirement of GWAC. It can online accomplish the detection, filtering and validation of millions of transient candidates, and feedback the final results to the astronomer in real-time. During the observation from October of 2018 to December of 2019, we have already found 102 transients.

Download Full-text

Data Security and Validation Framework for a Scientific Data Processing SOA Based System

2011 Developments in E-systems Engineering ◽

10.1109/dese.2011.75 ◽

2011 ◽

Cited By ~ 1

Author(s):

Veska Gancheva

Keyword(s):

Data Processing ◽

Data Security ◽

Scientific Data ◽

Validation Framework

Download Full-text

A scientific data processing framework for time series NetCDF data

Environmental Modelling & Software ◽

10.1016/j.envsoft.2014.06.005 ◽

2014 ◽

Vol 60 ◽

pp. 241-249 ◽

Cited By ~ 4

Author(s):

Krista Gaustad ◽

Tim Shippert ◽

Brian Ermold ◽

Sherman Beus ◽

Jeff Daily ◽

...

Keyword(s):

Time Series ◽

Data Processing ◽

Scientific Data ◽

Processing Framework

Download Full-text

Gaia: Astrometric performance and current status of the project

Proceedings of the International Astronomical Union ◽

10.1017/s1743921309990548 ◽

2009 ◽

Vol 5 (S261) ◽

pp. 296-305 ◽

Cited By ~ 13

Author(s):

Lennart Lindegren

Keyword(s):

Solar System ◽

Data Processing ◽

Preliminary Data ◽

Scientific Data ◽

Current Status ◽

Fundamental Physics ◽

Implementation Phase ◽

Optical Reference ◽

Measurement Principle

AbstractThe scientific objectives of the Gaia mission cover areas of galactic structure and evolution, stellar astrophysics, exoplanets, solar system physics, and fundamental physics. Astrometrically, its main contribution will be the determination of millions of absolute stellar parallaxes and the establishment of a very accurate, dense and faint non-rotating optical reference frame. With a planned launch in spring 2012, the project is in its advanced implementation phase. In parallel, preparations for the scientific data processing are well under way within the Gaia Data Processing and Analysis Consortium. Final mission results are expected around 2021, but early releases of preliminary data are expected. This review summarizes the main science goals and overall organisation of the project, the measurement principle and core astrometric solution, and provide an updated overview of the expected astrometric performance.

Download Full-text

Protein Experiment: Scientific Data Processing Platform for On-Flight Experiment Tuning

Microgravity Science and Technology ◽

10.1007/s12217-012-9320-y ◽

2012 ◽

Vol 24 (5) ◽

pp. 327-334 ◽

Cited By ~ 1

Author(s):

Luis David Patiño-Lopez ◽

Klaas Decanniere ◽

Jose Antonio Gavira ◽

Dominique Maes ◽

Fermín Otalora

Keyword(s):

Data Processing ◽

Scientific Data ◽

Flight Experiment ◽

Processing Platform

Download Full-text

Employing Vertical Elasticity for Efficient Big Data Processing in Container-Based Cloud Environments

Applied Sciences ◽

10.3390/app11136200 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6200

Author(s):

Jin-young Choi ◽

Minkyoung Cho ◽

Jik-Soo Kim

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Storage ◽

Resource Utilization ◽

System Throughput ◽

Big Data Processing ◽

Cloud Environments ◽

Utilization Scheme ◽

Adaptive Resource

Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing scalable data storage and computing resources for competitive and economical Big Data processing. Accordingly, server virtualization technologies that are the cornerstone of Cloud Computing have attracted a lot of research interests. However, conventional hypervisor-based virtualization can cause performance degradation problems due to its heavily loaded guest operating systems and rigid resource allocations. On the other hand, container-based virtualization technology can provide the same level of service faster with a lightweight capacity by effectively eliminating the guest OS layers. In addition, container-based virtualization enables efficient cloud resource management by dynamically adjusting the allocated computing resources (e.g., CPU and memory) during the runtime through “Vertical Elasticity”. In this paper, we present our practice and experience of employing an adaptive resource utilization scheme for Big Data workloads in container-based cloud environments by leveraging the vertical elasticity of Docker, a representative container-based virtualization technique. We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark. During the workload executions, our adaptive resource utilization scheme periodically monitors the resource usage patterns of running containers and dynamically adjusts allocated computing resources that could result in substantial improvements in the overall system throughput.

Download Full-text

Demonstration of the remote exploration and experimentation (REE) fault-tolerant parallel-processing supercomputer for spacecraft onboard scientific data processing

Proceeding International Conference on Dependable Systems and Networks. DSN 2000 ◽

10.1109/icdsn.2000.857562 ◽

2002 ◽

Cited By ~ 8

Author(s):

F. Chen ◽

L. Craymer ◽

J. Deifik ◽

A.J. Fogel ◽

D.S. Katz ◽

...

Keyword(s):

Parallel Processing ◽

Data Processing ◽

Fault Tolerant ◽

Scientific Data

Download Full-text

A Web Service-Enabled Distributed Workflow System for Scientific Data Processing

11th IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS'07) ◽

10.1109/ftdcs.2007.9 ◽

2007 ◽

Cited By ~ 2

Author(s):

Rajesh Kalyanam ◽

Lan Zhao ◽

Taezoon Park ◽

Sebastien Goasguen

Keyword(s):

Data Processing ◽

Web Service ◽

Scientific Data ◽

Distributed Workflow ◽

Workflow System ◽

Distributed Workflow System

Download Full-text

THE SELF-JUDGMENT PRINCIPLE IN SCIENTIFIC DATA PROCESSING

Industrial & Engineering Chemistry ◽

10.1021/ie50640a004 ◽

1963 ◽

Vol 55 (4) ◽

pp. 29-32 ◽

Cited By ~ 8

Author(s):

P. A. D. de Maine ◽

R. D. Seawright

Keyword(s):

Data Processing ◽

Scientific Data ◽

The Self

Download Full-text