scholarly journals The carbon footprint of bioinformatics

2021 ◽  
Author(s):  
Jason Grealey ◽  
Loïc Lannelongue ◽  
Woei-Yuh Saw ◽  
Jonathan Marten ◽  
Guillaume Meric ◽  
...  

AbstractBioinformatic research relies on large-scale computational infrastructures which have a non-zero carbon footprint. So far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this study, we estimate the bioinformatic carbon footprint (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org). We assess (i) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics and molecular simulations, as well as (ii) computation strategies, such as parallelisation, CPU (central processing unit) vs GPU (graphics processing unit), cloud vs. local computing infrastructure and geography. In particular, for GWAS, we found that biobank-scale analyses emitted substantial kgCO2e and simple software upgrades could make GWAS greener, e.g. upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Switching from the average data centre to a more efficient data centres can reduce carbon footprint by ~34%. Memory over-allocation can be a substantial contributor to an algorithm’s carbon footprint. The use of faster processors or greater parallelisation reduces run time but can lead to, sometimes substantially, greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimise kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.

2018 ◽  
Vol 7 (12) ◽  
pp. 472 ◽  
Author(s):  
Bo Wan ◽  
Lin Yang ◽  
Shunping Zhou ◽  
Run Wang ◽  
Dezhi Wang ◽  
...  

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.


2021 ◽  
Vol 8 ◽  
Author(s):  
Gabriela Canalli Kretzschmar ◽  
Nina Moura Alencar ◽  
Saritha Suellen Lopes da Silva ◽  
Carla Daniela Sulzbach ◽  
Caroline Grisbach Meissner ◽  
...  

Several genome-wide association studies (GWAS) have been carried out with late-onset Alzheimer’s disease (LOAD), mainly in European and Asian populations. Different polymorphisms were associated, but several of them without a functional explanation. GWAS are fundamental for identifying loci associated with diseases, although they often do not point to causal polymorphisms. In this sense, functional investigations are a fundamental tool for discovering causality, although the failure of this validation does not necessarily indicate a non-causality. Furthermore, the allele frequency of associated genetic variants may vary widely between populations, requiring replication of these associations in other ethnicities. In this sense, our study sought to replicate in 150 AD patients and 114 elderly controls from the South Brazilian population 18 single-nucleotide polymorphisms (SNPs) associated with AD in European GWAS, with further functional investigation using bioinformatic tools for the associated SNPs. Of the 18 SNPs investigated, only four were associated in our population: rs769449 (APOE), rs10838725 (CELF1), rs6733839, and rs744373 (BIN1–CYP27C1). We identified 54 variants in linkage disequilibrium (LD) with the associated SNPs, most of which act as expression or splicing quantitative trait loci (eQTLs/sQTLs) in genes previously associated with AD or with a possible functional role in the disease, such as CELF1, MADD, MYBPC3, NR1H3, NUP160, SPI1, and TOMM40. Interestingly, eight of these variants are located within long non-coding RNA (lncRNA) genes that have not been previously investigated regarding AD. Some of these polymorphisms can result in changes in these lncRNAs’ secondary structures, leading to either loss or gain of microRNA (miRNA)-binding sites, deregulating downstream pathways. Our pioneering work not only replicated LOAD association with polymorphisms not yet associated in the Brazilian population but also identified six possible lncRNAs that may interfere in LOAD development. The results lead us to emphasize the importance of functional exploration of associations found in large-scale association studies in different populations to base personalized and inclusive medicine in the future.


2019 ◽  
Author(s):  
Roy Ben-Shalom ◽  
Nikhil S. Artherya ◽  
Alexander Ladd ◽  
Christopher Cross ◽  
Hersh Sanghevi ◽  
...  

AbstractThe membrane potential of individual neurons depends on a large number of interacting biophysical processes operating on spatial-temporal scales spanning several orders of magnitude. The multi-scale nature of these processes dictates that accurate prediction of membrane potentials in specific neurons requires utilization of detailed simulations. Unfortunately, constraining parameters within biologically detailed neuron models can be difficult, leading to poor model fits. This obstacle can be overcome partially by numerical optimization or detailed exploration of parameter space. However, these processes, which currently rely on central processing unit (CPU) computation, often incur exponential increases in computing time for marginal improvements in model behavior. As a result, model quality is often compromised to accommodate compute resources. Here, we present a simulation environment, NeuroGPU, that takes advantage of the inherent parallelized structure of graphics processing unit (GPU) to accelerate neuronal simulation. NeuroGPU can simulate most of biologically detailed models 800x faster than traditional simulators when using multiple GPU cores, and even 10-200 times faster when implemented on relatively inexpensive GPU systems. We demonstrate the power of NeuoGPU through large-scale parameter exploration to reveal the response landscape of a neuron. Finally, we accelerate numerical optimization of biophysically detailed neuron models to achieve highly accurate fitting of models to simulation and experimental data. Thus, NeuroGPU enables the rapid simulation of multi-compartment, biophysically detailed neuron models on commonly used computing systems accessible by many scientists.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
James M. Kunert-Graf ◽  
Nikita A. Sakhanenko ◽  
David J. Galas

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.


Author(s):  
Wisoot Sanhan ◽  
Kambiz Vafai ◽  
Niti Kammuang-Lue ◽  
Pradit Terdtoon ◽  
Phrut Sakulchangsatjatai

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.


2016 ◽  
Vol 27 (9) ◽  
pp. 2657-2673 ◽  
Author(s):  
Mathieu Emily

The Cochran-Armitage trend test (CA) has become a standard procedure for association testing in large-scale genome-wide association studies (GWAS). However, when the disease model is unknown, there is no consensus on the most powerful test to be used between CA, allelic, and genotypic tests. In this article, we tackle the question of whether CA is best suited to single-locus scanning in GWAS and propose a power comparison of CA against allelic and genotypic tests. Our approach relies on the evaluation of the Taylor decompositions of non-centrality parameters, thus allowing an analytical comparison of the power functions of the tests. Compared to simulation-based comparison, our approach offers the advantage of simultaneously accounting for the multidimensionality of the set of features involved in power functions. Although power for CA depends on the sample size, the case-to-control ratio and the minor allelic frequency (MAF), our results first show that it is largely influenced by the mode of inheritance and a deviation from Hardy–Weinberg Equilibrium (HWE). Furthermore, when compared to other tests, CA is shown to be the most powerful test under a multiplicative disease model or when the single-nucleotide polymorphism largely deviates from HWE. In all other situations, CA lacks in power and differences can be substantial, especially for the recessive mode of inheritance. Finally, our results are illustrated by the comparison of the performances of the statistics in two genome scans.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 215 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Georg Homuth ◽  
Alexander Teumer ◽  
Uwe Völker ◽  
Matthias Nauck

The metabolome, defined as the reflection of metabolic dynamics derived from parameters measured primarily in easily accessible body fluids such as serum, plasma, and urine, can be considered as the omics data pool that is closest to the phenotype because it integrates genetic influences as well as nongenetic factors. Metabolic traits can be related to genetic polymorphisms in genome-wide association studies, enabling the identification of underlying genetic factors, as well as to specific phenotypes, resulting in the identification of metabolome signatures primarily caused by nongenetic factors. Similarly, correlation of metabolome data with transcriptional or/and proteome profiles of blood cells also produces valuable data, by revealing associations between metabolic changes and mRNA and protein levels. In the last years, the progress in correlating genetic variation and metabolome profiles was most impressive. This review will therefore try to summarize the most important of these studies and give an outlook on future developments.


Sign in / Sign up

Export Citation Format

Share Document