High Performance BLAST Over the Grid

Author(s):  
Vincent Breton ◽  
Eddy Caron ◽  
Frederic Desprez ◽  
Gael Le Mahec

As grids become more and more attractive for solving complex problems with high computational and storage requirements, bioinformatics starts to be ported on large scale platforms. The BLAST kernel, one of the main cornerstone of high performance genomics, was one the first application ported on such platform. However, if a simple parallelization was enough for the first proof of concept, its use in production platform needed more optimized algorithms. In this chapter, we review existing parallelization and “gridification” approaches as well as related issues such as data management and replication, and a case study using the DIET middleware over the Grid’5000 experimental platform.

2020 ◽  
Vol 10 (7) ◽  
pp. 2634
Author(s):  
JunWeon Yoon ◽  
TaeYoung Hong ◽  
ChanYeol Park ◽  
Seo-Young Noh ◽  
HeonChang Yu

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.


2013 ◽  
Vol 831 ◽  
pp. 276-281
Author(s):  
Ya Jie Ma ◽  
Zhi Jian Mei ◽  
Xiang Chuan Tian

Large-scale sensor networks are systems that a large number of high-throughput autonomous sensor nodes are distributed over wide areas. Much attention has paid to provide efficient data management in such systems. Sensor grid provides low cost and high performance computing to physical world data perceived through sensors. This article analyses the real-time sensor grid challenges on large-scale air pollution data management. A sensor grid architecture for pollution data management is proposed. The processing of the service-oriented grid management is described in psuedocode. A simulation experiment investigates the performance of the data management for such a system.


2014 ◽  
Vol 9 (2) ◽  
pp. 17-27 ◽  
Author(s):  
Ritu Arora ◽  
Maria Esteva ◽  
Jessica Trelogan

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.


2019 ◽  
Vol 8 (10) ◽  
pp. 24851-24854
Author(s):  
Hewa Majeed Zangana

Nowadays, more and more organizations are realizing of importance of their data, because it can be considered as an important asset in present nearly all business organizational processes. Information Technology Division (ITD) is a department in the International Islamic University Malaysia (IIUM) that consolidates efforts in providing IT services to the university. The university data management started with decentralized units, where each center or division has its own hardware and database system. Later it improved to become became centralized, and ITD is now trying to one policy across the whole university. This will optimize the high performance of data management in the university. A visit has been done to ITD building and a presentation has been conducted discussing many issues concerning data management quality maturity in IT division at IIUM. We got some notices like server’s room location, power supply and backup and existence of data redundant. These issues are discussed in details in the next sections of this paper and some recommendations are suggested to improve data quality in the university. The quality of the data is very important in decision making, especially for a university that is trying to improve its strategy towards a research university and rise its rank among the World University Ranking.


2019 ◽  
Vol 8 (6) ◽  
pp. e12861023 ◽  
Author(s):  
Pedro Junior Zucatelli ◽  
Ana Paula Meneguelo ◽  
Gisele de Lorena Diniz Chaves ◽  
Gisele de Lorena Diniz Chaves ◽  
Marielce de Cassia Ribeiro Tosta

The integrity of natural systems is already at risk because of climate change caused by the intense emissions of greenhouse gases in the atmosphere. The goal of geological carbon sequestration is to capture, transport and store CO2 in appropriate geological formations. In this review, we address the geological environments conducive to the application of CCS projects (Carbon Capture and Storage), the phases that make up these projects, and their associated investment and operating costs. Furthermore it is presented the calculations of the estimated financial profitability of different types of projects in Brazil. Using mathematical models, it can be concluded that the Roncador field presents higher gross revenue when the amount of extra oil that can be retrieved is 9.3% (US$ 48.55 billions approximately in 2018). Additional calculations show that the Paraná saline aquifer has the highest gross revenue (US$ 6.90 trillions in 2018) when compared to the Solimões (US$ 3.76 trillions approximately in 2018) and Santos saline aquifers (US$ 2.21 trillions approximately in 2018) if a CCS project were to be employed. Therefore, the proposed Carbon Capture and Storage method in this study is an important scientific contribution for reliable large-scale CO2 storage in Brazil.


Author(s):  
Pankaj Lathar ◽  
K. G. Srinivasa ◽  
Abhishek Kumar ◽  
Nabeel Siddiqui

Advancements in web-based technology and the proliferation of sensors and mobile devices interacting with the internet have resulted in immense data management requirements. These data management activities include storage, processing, demand of high-performance read-write operations of big data. Large-scale and high-concurrency applications like SNS and search engines have appeared to be facing challenges in using the relational database to store and query dynamic user data. NoSQL and cloud computing has emerged as a paradigm that could meet these requirements. The available diversity of existing NoSQL and cloud computing solutions make it difficult to comprehend the domain and choose an appropriate solution for a specific business task. Therefore, this chapter reviews NoSQL and cloud-system-based solutions with the goal of providing a perspective in the field of data storage technology/algorithms, leveraging guidance to researchers and practitioners to select the best-fit data store, and identifying challenges and opportunities of the paradigm.


Author(s):  
Annu Priya ◽  
Sudip Kumar Sahana

Processor scheduling is one of the thrust areas in the field of computer science. The future technologies use a huge amount of processing for execution of their tasks like huge games, programming software, and in the field of quantum computing. In real-time, many complex problems are solved by GPU programming. The primary concern of scheduling is to reduce the time complexity and manpower. Several traditional techniques exit for processor scheduling. The performance of traditional techniques is reduced when it comes to the huge processing of tasks. Most scheduling problems are NP-hard in nature. Many of the complex problems are recently solved by GPU programming. GPU scheduling is another complex issue as it runs thousands of threads in parallel and needs to be scheduled efficiently. For such large-scale scheduling problems, the performance of state-of-the-art algorithms is very poor. It is observed that evolutionary and genetic-based algorithms exhibit better performance for large-scale combinatorial and internet of things (IoT) problems.


Sign in / Sign up

Export Citation Format

Share Document