Using High Performance Computing for Detecting Duplicate, Similar and Related Images in a Large Data Collection

Author(s):  
Ritu Arora ◽  
Jessica Trelogan ◽  
Trung Nguyen Ba
Symmetry ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 1029
Author(s):  
Anabi Hilary Kelechi ◽  
Mohammed H. Alsharif ◽  
Okpe Jonah Bameyi ◽  
Paul Joan Ezra ◽  
Iorshase Kator Joseph ◽  
...  

Power-consuming entities such as high performance computing (HPC) sites and large data centers are growing with the advance in information technology. In business, HPC is used to enhance the product delivery time, reduce the production cost, and decrease the time it takes to develop a new product. Today’s high level of computing power from supercomputers comes at the expense of consuming large amounts of electric power. It is necessary to consider reducing the energy required by the computing systems and the resources needed to operate these computing systems to minimize the energy utilized by HPC entities. The database could improve system energy efficiency by sampling all the components’ power consumption at regular intervals and the information contained in a database. The information stored in the database will serve as input data for energy-efficiency optimization. More so, device workload information and different usage metrics are stored in the database. There has been strong momentum in the area of artificial intelligence (AI) as a tool for optimizing and processing automation by leveraging on already existing information. This paper discusses ideas for improving energy efficiency for HPC using AI.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 377 ◽  
Author(s):  
Adam P. Cribbs ◽  
Sebastian Luna-Valero ◽  
Charlotte George ◽  
Ian M. Sudbery ◽  
Antonio J. Berlanga-Taylor ◽  
...  

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.


2019 ◽  
Author(s):  
Adam Cribbs ◽  
Sebastian Luna-Valero ◽  
Charlotte George ◽  
Ian M Sudbery ◽  
Antonio J Berlanga-Taylor ◽  
...  

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 377 ◽  
Author(s):  
Adam P. Cribbs ◽  
Sebastian Luna-Valero ◽  
Charlotte George ◽  
Ian M. Sudbery ◽  
Antonio J. Berlanga-Taylor ◽  
...  

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.


MRS Bulletin ◽  
1997 ◽  
Vol 22 (10) ◽  
pp. 5-6
Author(s):  
Horst D. Simon

Recent events in the high-performance computing industry have concerned scientists and the general public regarding a crisis or a lack of leadership in the field. That concern is understandable considering the industry's history from 1993 to 1996. Cray Research, the historic leader in supercomputing technology, was unable to survive financially as an independent company and was acquired by Silicon Graphics. Two ambitious new companies that introduced new technologies in the late 1980s and early 1990s—Thinking Machines and Kendall Square Research—were commercial failures and went out of business. And Intel, which introduced its Paragon supercomputer in 1994, discontinued production only two years later.During the same time frame, scientists who had finished the laborious task of writing scientific codes to run on vector parallel supercomputers learned that those codes would have to be rewritten if they were to run on the next-generation, highly parallel architecture. Scientists who are not yet involved in high-performance computing are understandably hesitant about committing their time and energy to such an apparently unstable enterprise.However, beneath the commercial chaos of the last several years, a technological revolution has been occurring. The good news is that the revolution is over, leading to five to ten years of predictable stability, steady improvements in system performance, and increased productivity for scientific applications. It is time for scientists who were sitting on the fence to jump in and reap the benefits of the new technology.


Sign in / Sign up

Export Citation Format

Share Document