scholarly journals Compression for population genetic data through finite-state entropy

2021 ◽  
Author(s):  
Winfield Chen ◽  
Lloyd T. Elliott

AbstractWe improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of sample ordering in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited to population genetic data. We show between 10% and 40% speed and size improvements over dictionary compression methods for population genetic data such as Zstd and Zlib in computation and and decompression tasks. We provide a prototype for genome-wide association study with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art.

Author(s):  
Winfield Chen ◽  
Lloyd T. Elliott

We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of samples in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited for compression of population genetic data. We show between [Formula: see text] and [Formula: see text] speed and size improvements over modern dictionary compression methods that are often used for population genetic data such as Zstd and Zlib in computation and decompression tasks. We provide open source prototype software for multi-phenotype GWAS with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art.


Author(s):  
Andrei Semikhodskii ◽  
Yevgeniy Krassotkin ◽  
Tatiana Makarova ◽  
Vladislav Zavarin ◽  
Viktoria Ilina ◽  
...  

2020 ◽  
Author(s):  
Nandita Mukhopadhyay ◽  
Eleanor Feingold ◽  
Lina Moreno-Uribe ◽  
George Wehby ◽  
Luz Consuelo Valencia-Ramirez ◽  
...  

AbstractOrofacial clefts (OFCs) are among the most prevalent craniofacial birth defects worldwide and create a significant public health burden. The majority of OFCs are non-syndromic and vary in prevalence by ethnicity. Africans have the lowest prevalence of OFCs (∼ 1/2,500), Asians have the highest prevalence (∼1/500), European and Latin Americans lie somewhere in the middle (∼1/800 and 1/900 respectively). Thus, ethnicity appears to be a major determinant of the risk of developing OFC. The Pittsburgh Orofacial Clefts Multiethnic study was designed to explore this ethnic variance, comprising a large number of families and individuals (∼12,000 individuals) from multiple populations worldwide: US and Europe, Asians, mixed Native American/Caucasians, and Africans. In this current study, we analyzed 2,915 OFC cases, 6,044 unaffected individuals related to the OFC cases, and 2,685 controls with no personal or family history of OFC. Participants were grouped by their ancestry into African, Asian, European, and Central and South American subsets, and genome-wide association run on the combined sample as well as the four ancestry-based groups. We observed 22 associations to cleft lip with or without cleft palate at 18 distinct loci with p-values < 1e-06, including 10 with genome-wide significance (< 5e-08), in the combined sample and within ancestry groups. Three loci - 2p12 (rs62164740, p=6.27e-07), 10q22.2 (rs150952246, p=3.14e-07), and 10q24.32 (rs118107597, p=8.21e-07) are novel. Nine were in or near known OFC loci - PAX7, IRF6, FAM49A, DCAF4L2, 8q24.21, NTN1, WNT3-WNT9B, TANC2, and RHPN2. The majority of the associations were observed only in the combined sample, European, and Central and South American groups. We investigated whether the observed differences in association strength were a) purely due to sample sizes, b) due to systematic allele frequency difference at the population level, or (c) due to the fact certain OFC-causing variants confer different amounts of risk depending on ancestral origin, by comparing effect sizes to observed allele frequencies of the effect allele in our ancestry-based groups. While some of the associations differ due to systematic differences in allele frequencies between groups, others show variation in effect size despite similar frequencies across ancestry groups.


2021 ◽  
pp. 1-6
Author(s):  
Safia A. Messaoudi ◽  
Saranya R. Babu ◽  
Abrar B. Alsaleh ◽  
Mohammed Albujja ◽  
Noora R. Al-Snan ◽  
...  

PLoS ONE ◽  
2019 ◽  
Vol 14 (8) ◽  
pp. e0220620 ◽  
Author(s):  
Noora R. Al-Snan ◽  
Safia Messaoudi ◽  
Saranya R. Babu ◽  
Moiz Bakhiet

2019 ◽  
Vol 19 (5) ◽  
pp. 1374-1377
Author(s):  
Mahmut Aydın ◽  
Igor S. Kryvoruchko ◽  
Muhammet Şakiroğlu

Sign in / Sign up

Export Citation Format

Share Document