scholarly journals Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

2020 ◽  
Vol 27 (9) ◽  
pp. 1425-1430
Author(s):  
Inès Krissaane ◽  
Carlos De Niz ◽  
Alba Gutiérrez-Sacristán ◽  
Gabor Korodi ◽  
Nneka Ede ◽  
...  

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?

2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Gabriel Costa Monteiro Moreira ◽  
Clarissa Boschiero ◽  
Aline Silva Mello Cesar ◽  
James M. Reecy ◽  
Thaís Fernanda Godoy ◽  
...  

2020 ◽  
Author(s):  
Yixin An ◽  
Lin Chen ◽  
Yongxiang Li ◽  
Chunhui Li ◽  
Yunsu Shi ◽  
...  

Abstract Background: Kernel row number (KRN) is an important trait for the domestication and improvement of maize. To explore the genetic basis of KRN has great research significance and can provide the valuable information for molecular assisted selection.Results: In this study, one single-locus method (MLM) and six multi-locus methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO) of genome-wide association studies (GWASs) were used to identify significant quantitative trait nucleotides (QTNs) for KRN in an association panel including 639 maize inbred lines that were genotyped by the MaizeSNP50 BeadChip. In three phenotyping environments and with best linear unbiased prediction (BLUP) values, seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN based on expression analysis and candidate gene association mapping. Whole-genome prediction (WGP) for KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate the total KRN-associated tagSNPs identified by all GWAS models.Conclusions: These results aid in our understanding of the genetic architecture of KRN and provide useful information for genomic selection for KRN in maize breeding.


2019 ◽  
Vol 51 (1) ◽  
Author(s):  
Sanne van den Berg ◽  
Jérémie Vandenplas ◽  
Fred A. van Eeuwijk ◽  
Aniek C. Bouwman ◽  
Marcos S. Lopes ◽  
...  

2016 ◽  
Author(s):  
Robert A. Power ◽  
Siva Davaniah ◽  
Anne Derache ◽  
Eduan Wilkinson ◽  
Frank Tanser ◽  
...  

AbstractGenome-wide association studies (GWAS) have considerably advanced our understanding of human traits and diseases. With the increasing availability of whole genome sequences (WGS) for pathogens, it is important to establish whether GWAS of viral genomes could reveal important biological insights. Here we perform the first proof of concept viral GWAS examining drug resistance (DR), a phenotype with well understood genetics.We performed a GWAS of DR in a sample of 343 HIV subtype C patients failing 1st line antiretroviral treatment in rural KwaZulu-Natal, South Africa. The majority and minority variants within each sequence were called using PILON, and GWAS was performed within PLINK. HIV WGS from patients failing on different antiretroviral treatments were compared to sequences derived from individuals naive to the respective treatment.GWAS methodology was validated by identifying five associations on a genetic level that led to amino acid changes known to cause DR. Further, we highlighted the ability of GWAS to identify epistatic effects, identifying two replicable variants within amino acid 68 of the reverse transcriptase protein previously described as potential fitness compensatory mutations. A possible additional DR variant within amino acid 91 of the matrix region of the Gag protein was associated with tenofovir failure, highlighting the ability of GWAS to identify variants outside classical candidate genes. Our results also suggest a polygenic component to DR.These results validate the applicability of GWAS to HIV WGS data even in relative small samples, and emphasise how high throughput sequencing can provide novel and clinically relevant insights. Further they suggested that for viruses like HIV, population structure was only minor concern compared to that seen in bacteria or parasite GWAS. Given the small genome length and reduced burden for multiple testing, this makes HIV an ideal candidate for GWAS.


Sign in / Sign up

Export Citation Format

Share Document