High Performance BLAST Over the Grid

As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be ported on large scale platforms. The BLAST kernel, one of the main cornerstone of high performance genomics, was one the first application ported on such platform. However, if a simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. In this chapter, we review existing parallelization and “gridification” approaches as well as related issues such as data management and replication, and a case study using the DIET middleware over the Grid’5000 experimental platform.

Download Full-text

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

Applied Sciences ◽

10.3390/app10072634 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2634

Author(s):

JunWeon Yoon ◽

TaeYoung Hong ◽

ChanYeol Park ◽

Seo-Young Noh ◽

HeonChang Yu

Keyword(s):

Execution Time ◽

High Performance ◽

Large Scale ◽

Experimental Result ◽

Optimization Approach ◽

Root Cause ◽

Large Systems ◽

Job Scheduler ◽

Performance Computing

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.

Download Full-text

Analysis of Large-Scale Networks Using High Performance Technology (Vkontakte Case Study)

Communications in Computer and Information Science - Creativity in Intelligent Technologies and Data Science ◽

10.1007/978-3-319-23766-4_42 ◽

2015 ◽

pp. 531-541 ◽

Cited By ~ 2

Keyword(s):

High Performance ◽

Large Scale ◽

Performance Technology ◽

Large Scale Networks

Download Full-text

Data management for large-scale scientific computations in high performance distributed systems

Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469) ◽

10.1109/hpdc.1999.805306 ◽

2003 ◽

Cited By ~ 8

Author(s):

A. Choudhary ◽

M. Kandemir ◽

H. Nagesh ◽

J. No ◽

X. Shen ◽

...

Keyword(s):

Distributed Systems ◽

Data Management ◽

High Performance ◽

Large Scale ◽

Scientific Computations

Download Full-text

Building Sensor Grid Architecture for Large-Scale Air Pollution Data Management

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.831.276 ◽

2013 ◽

Vol 831 ◽

pp. 276-281

Author(s):

Ya Jie Ma ◽

Zhi Jian Mei ◽

Xiang Chuan Tian

Keyword(s):

Air Pollution ◽

Data Management ◽

High Performance ◽

Large Scale ◽

Low Cost ◽

Physical World ◽

Sensor Nodes ◽

Grid Architecture ◽

Autonomous Sensor ◽

Efficient Data

Large-scale sensor networks are systems that a large number of high-throughput autonomous sensor nodes are distributed over wide areas. Much attention has paid to provide efficient data management in such systems. Sensor grid provides low cost and high performance computing to physical world data perceived through sensors. This article analyses the real-time sensor grid challenges on large-scale air pollution data management. A sensor grid architecture for pollution data management is proposed. The processing of the service-oriented grid management is described in psuedocode. A simulation experiment investigates the performance of the data management for such a system.

Download Full-text

Leveraging High Performance Computing for Managing Large and Evolving Data Collections

International Journal of Digital Curation ◽

10.2218/ijdc.v9i2.331 ◽

2014 ◽

Vol 9 (2) ◽

pp. 17-27 ◽

Cited By ~ 6

Author(s):

Ritu Arora ◽

Maria Esteva ◽

Jessica Trelogan

Keyword(s):

Data Management ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Research Process ◽

Open Science ◽

Test Case ◽

Growth Data ◽

Data Types ◽

Performance Computing

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.

Download Full-text

ITD Data Quality Maturity (A Case Study)

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v8i10.4368 ◽

2019 ◽

Vol 8 (10) ◽

pp. 24851-24854

Author(s):

Hewa Majeed Zangana

Keyword(s):

Data Quality ◽

Data Management ◽

High Performance ◽

Research University ◽

It Services ◽

University Ranking ◽

Management Quality ◽

The University

Nowadays, more and more organizations are realizing of importance of their data, because it can be considered as an important asset in present nearly all business organizational processes. Information Technology Division (ITD) is a department in the International Islamic University Malaysia (IIUM) that consolidates efforts in providing IT services to the university. The university data management started with decentralized units, where each center or division has its own hardware and database system. Later it improved to become became centralized, and ITD is now trying to one policy across the whole university. This will optimize the high performance of data management in the university. A visit has been done to ITD building and a presentation has been conducted discussing many issues concerning data management quality maturity in IT division at IIUM. We got some notices like server’s room location, power supply and backup and existence of data redundant. These issues are discussed in details in the next sections of this paper and some recommendations are suggested to improve data quality in the university. The quality of the data is very important in decision making, especially for a university that is trying to improve its strategy towards a research university and rise its rank among the World University Ranking.

Download Full-text

The application, required investments and operational costs of geological CO2 sequestration: a case study

Research Society and Development ◽

10.33448/rsd-v8i6.1023 ◽

2019 ◽

Vol 8 (6) ◽

pp. e12861023 ◽

Cited By ~ 1

Author(s):

Pedro Junior Zucatelli ◽

Ana Paula Meneguelo ◽

Gisele de Lorena Diniz Chaves ◽

Marielce de Cassia Ribeiro Tosta

Keyword(s):

Carbon Capture ◽

Large Scale ◽

Co2 Storage ◽

Carbon Capture And Storage ◽

Saline Aquifers ◽

Natural Systems ◽

Geological Co2 Sequestration ◽

And Storage ◽

Geological Formations

The integrity of natural systems is already at risk because of climate change caused by the intense emissions of greenhouse gases in the atmosphere. The goal of geological carbon sequestration is to capture, transport and store CO2 in appropriate geological formations. In this review, we address the geological environments conducive to the application of CCS projects (Carbon Capture and Storage), the phases that make up these projects, and their associated investment and operating costs. Furthermore it is presented the calculations of the estimated financial profitability of different types of projects in Brazil. Using mathematical models, it can be concluded that the Roncador field presents higher gross revenue when the amount of extra oil that can be retrieved is 9.3% (US$ 48.55 billions approximately in 2018). Additional calculations show that the Paraná saline aquifer has the highest gross revenue (US$ 6.90 trillions in 2018) when compared to the Solimões (US$ 3.76 trillions approximately in 2018) and Santos saline aquifers (US$ 2.21 trillions approximately in 2018) if a CCS project were to be employed. Therefore, the proposed Carbon Capture and Storage method in this study is an important scientific contribution for reliable large-scale CO2 storage in Brazil.

Download Full-text

Comparison Study of Different NoSQL and Cloud Paradigm for Better Data Storage Technology

Handbook of Research on Cloud and Fog Computing Infrastructures for Data Science - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-5972-6.ch015 ◽

2018 ◽

pp. 312-343

Author(s):

Pankaj Lathar ◽

K. G. Srinivasa ◽

Abhishek Kumar ◽

Nabeel Siddiqui

Keyword(s):

Cloud Computing ◽

Data Management ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Web Based ◽

Storage Technology ◽

Data Store ◽

Challenges And Opportunities ◽

User Data

Advancements in web-based technology and the proliferation of sensors and mobile devices interacting with the internet have resulted in immense data management requirements. These data management activities include storage, processing, demand of high-performance read-write operations of big data. Large-scale and high-concurrency applications like SNS and search engines have appeared to be facing challenges in using the relational database to store and query dynamic user data. NoSQL and cloud computing has emerged as a paradigm that could meet these requirements. The available diversity of existing NoSQL and cloud computing solutions make it difficult to comprehend the domain and choose an appropriate solution for a specific business task. Therefore, this chapter reviews NoSQL and cloud-system-based solutions with the goal of providing a perspective in the field of data storage technology/algorithms, leveraging guidance to researchers and practitioners to select the best-fit data store, and identifying challenges and opportunities of the paradigm.

Download Full-text

Pathways towards large-scale implementation of CO2 capture and storage: A case study for the Netherlands

International Journal of Greenhouse Gas Control ◽

10.1016/j.ijggc.2008.09.005 ◽

2009 ◽

Vol 3 (2) ◽

pp. 217-236 ◽

Cited By ~ 40

Author(s):

K DAMEN ◽

A FAAIJ ◽

W TURKENBURG

Keyword(s):

The Netherlands ◽

Co2 Capture ◽

Large Scale ◽

Co2 Capture And Storage ◽

And Storage

Download Full-text

Processor Scheduling in High-Performance Computing (HPC) Environment

Advances in Library and Information Science - Emerging Trends and Impacts of the Internet of Things in Libraries ◽

10.4018/978-1-7998-4742-7.ch009 ◽

2020 ◽

pp. 151-179 ◽

Cited By ~ 1

Author(s):

Annu Priya ◽

Sudip Kumar Sahana

Keyword(s):

High Performance ◽

Large Scale ◽

Gpu Programming ◽

Primary Concern ◽

Scheduling Problems ◽

Processor Scheduling ◽

Complex Problems ◽

Programming Software ◽

Gpu Scheduling ◽

Future Technologies

Processor scheduling is one of the thrust areas in the field of computer science. The future technologies use a huge amount of processing for execution of their tasks like huge games, programming software, and in the field of quantum computing. In real-time, many complex problems are solved by GPU programming. The primary concern of scheduling is to reduce the time complexity and manpower. Several traditional techniques exit for processor scheduling. The performance of traditional techniques is reduced when it comes to the huge processing of tasks. Most scheduling problems are NP-hard in nature. Many of the complex problems are recently solved by GPU programming. GPU scheduling is another complex issue as it runs thousands of threads in parallel and needs to be scheduled efficiently. For such large-scale scheduling problems, the performance of state-of-the-art algorithms is very poor. It is observed that evolutionary and genetic-based algorithms exhibit better performance for large-scale combinatorial and internet of things (IoT) problems.

Download Full-text