genotype data
Recently Published Documents


TOTAL DOCUMENTS

383
(FIVE YEARS 102)

H-INDEX

43
(FIVE YEARS 5)

2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 223-223
Author(s):  
Kamil Sicinski

Abstract Ever since releasing genotype data in 2017, the WLS continually expands resources available to users interested in genetic research. Key advantages to the WLS data for genetics research include its sibling sample and nearly full life course longitudinal study design. In 2021, we now have state-of-the-art polygenic scores available in multiple domains, such as health, cognition, fertility, personality, risk behaviors and attitudes, and life satisfaction. The scores cover phenotypes spanning from adventurousness, through educational attainment, to age at which voice deepened. Additionally, the genotype data was re-imputed in 2021 to the superior Haplotype Reference Consortium reference panel and the WLS expects to obtain copy number variants data next year. In addition to genetic data, we have a set of novel microbiome data on a subset of participants that allows researchers to study relationships between environments and gut microbial composition.


2021 ◽  
Author(s):  
Ariel DH Gewirtz ◽  
F William Townes ◽  
Barbara E Engelhardt

Expression quantitative trait loci (eQTLs), or single nucleotide polymorphisms (SNPs) that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multi-modal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA-sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across ten tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities, and identify associations within and across tissue types. We identify 53,358 cis-eQTLs and 1,173 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. All software is available at: https://github.com/gewirtz/TBLDA


2021 ◽  
Author(s):  
Zachary A Szpiech

Haplotype-based scans to identify recent and ongoing positive selection have become commonplace in evolutionary genomics studies of numerous species across the tree of life. However, the most widely adopted approaches require phased haplotypes to compute the key statistics. Here we release a major update to the selscan software that re-defines popular haplotype-based statistics for use with unphased "multi-locus genotype" data. We provide unphased implementations of iHS, nSL, XP-EHH, and XP-nSL and evaluate their performance across a range of important parameters in a generic demographic history. Source code and executables are available at https://www.github.com/szpiech/selscan.


2021 ◽  
Author(s):  
Matt Carland ◽  
Madhuchanda Bose ◽  
Biljana Novković ◽  
Haley Pedersen ◽  
Charles Manson ◽  
...  

The vast majority of human traits, including many disease phenotypes, are affected by alleles at numerous genomic loci. With a continually increasing set of variants with published clinical disease or biomarker associations, an easy-to-use tool for non-programmers to rapidly screen VCF files for risk alleles is needed. We have developed EZTraits as a tool to quickly evaluate genotype data (e.g. from microarrays), against a set of rules defined by the user. These rules can be defined directly in the scripting language Lua , for genotype calls using variant ID (RS number) or chromosomal position. Alternatively, EZTraits can parse simple and intuitive text including concepts like ' any ' or ' all '. Thus, EZTraits is designed to support rapid genetic analysis and hypothesis-testing by researchers, regardless of programming experience or technical background.


2021 ◽  
Author(s):  
Arif Ozgun Harmanci ◽  
Miran Kim ◽  
Su Wang ◽  
Wentao Li ◽  
Yongsoo Song ◽  
...  

As DNA sequencing data is available for personal use, genomic privacy is becoming a major challenge. Nevertheless, high-throughput genomic data analysis outsourcing is performed using pipelines that tend to overlook these challenges. Results: We present a client-server-based outsourcing framework for genotype imputation, an important step in genomic data analyses. Genotype data is encrypted by the client and encrypted data are used by the server that never observes the data in plain. Cloud-based framework can benefit from virtually unlimited computational resources while providing provable confidentiality. Availability: Server is publicly available at https://www.secureomics.org/OpenImpute. Users can anonymously test and use imputation server without registration.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Declan Bennett ◽  
Donal O’Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

AbstractOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2021 ◽  
Vol 12 ◽  
Author(s):  
Liping Jiang ◽  
Zhuo Li ◽  
Jessica J. Hayward ◽  
Kei Hayashi ◽  
Ursula Krotscheck ◽  
...  

Canine hip dysplasia (CHD) and rupture of the cranial cruciate ligament (RCCL) are two complex inherited orthopedic traits of dogs. These two traits may occur concurrently in the same dog. Genomic prediction of these two diseases would benefit veterinary medicine, the dog’s owner, and dog breeders because of their high prevalence, and because both traits result in painful debilitating osteoarthritis in affected joints. In this study, 842 unique dogs from 6 breeds with hip and stifle phenotypes were genotyped on a customized Illumina high density 183 k single nucleotide polymorphism (SNP) array and also analyzed using an imputed dataset of 20,487,155 SNPs. To implement genomic prediction, two different statistical methods were employed: Genomic Best Linear Unbiased Prediction (GBLUP) and a Bayesian method called BayesC. The cross-validation results showed that the two methods gave similar prediction accuracy (r = 0.3–0.4) for CHD (measured as Norberg angle) and RCCL in the multi-breed population. For CHD, the average correlation of the AUC was 0.71 (BayesC) and 0.70 (GBLUP), which is a medium level of prediction accuracy and consistent with Pearson correlation results. For RCCL, the correlation of the AUC was slightly higher. The prediction accuracy of GBLUP from the imputed genotype data was similar to the accuracy from DNA array data. We demonstrated that the genomic prediction of CHD and RCCL with DNA array genotype data is feasible in a multiple breed population if there is a genetic connection, such as breed, between the reference population and the validation population. Albeit these traits have heritability of about one-third, higher accuracy is needed to implement in a natural population and predicting a complex phenotype will require much larger number of dogs within a breed and across breeds. It is possible that with higher accuracy, genomic prediction of these orthopedic traits could be implemented in a clinical setting for early diagnosis and treatment, and the selection of dogs for breeding. These results need continuous improvement in model prediction through ongoing genotyping and data sharing. When genomic prediction indicates that a dog is susceptible to one of these orthopedic traits, it should be accompanied by clinical and radiographic screening at an acceptable age with appropriate follow-up.


2021 ◽  
Author(s):  
Declan Bennett ◽  
Dónal O'Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

Abstract Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sridevi Padakanti ◽  
Khong-Loon Tiong ◽  
Yan-Bin Chen ◽  
Chen-Hsiang Yeang

AbstractPrincipal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.


Author(s):  
Alexandra K Lobo ◽  
Lindsay L Traeger ◽  
Mark P Keller ◽  
Alan D Attie ◽  
Federico E Rey ◽  
...  

Abstract In a Diversity Outbred mouse project with genotype data on 500 mice, including 297 with microbiome data, we identified three sets of sample mix-ups (two pairs and one trio) as well as at least 15 microbiome samples that appear to be mixtures of pairs of mice. The microbiome data consisted of shotgun sequencing reads from fecal DNA, used to characterize the gut microbial communities present in these mice. These sequence reads included sufficient reads derived from the host mouse to identify the individual. A number of microbiome samples appeared to contain a mixture of DNA from two mice. We describe a method for identifying sample mix-ups in such microbiome data, as well as a method for evaluating sample mixtures in this context.


Sign in / Sign up

Export Citation Format

Share Document