scholarly journals Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians

2015 ◽  
Author(s):  
Ya Hu ◽  
Qiliang Ding ◽  
Yi Wang ◽  
Shuhua Xu ◽  
Yungang He ◽  
...  

Previous research reported that Papua New Guineans (PNG) and Australians contain introgressions from Denisovans. Here we present a genome-wide analysis of Denisovan introgressions in PNG and Australians. We firstly developed a two-phase method to detect Denisovan introgressions from whole-genome sequencing data. This method has relatively high detection power (79.74%) and low false positive rate (2.44%) based on simulations. Using this method, we identified 1.34 Gb of Denisovan introgressions from sixteen PNG and four Australian genomes, in which we identified 38,877 Denisovan introgressive alleles (DIAs). We found that 78 Denisovan introgressions were under positive selection. Genes located in the 78 introgressions are related to evolutionarily important functions, such as spermatogenesis, fertilization, cold acclimation, circadian rhythm, development of brain, neural tube, face, and olfactory pit, immunity, etc. We also found that 121 DIAs are missense. Genes harboring the 121 missense DIAs are also related to evolutionarily important functions, such as female pregnancy, development of face, lung, heart, skin, nervous system, and male gonad, visual and smell perception, response to heat, pain, hypoxia, and UV, lipid transport, metabolism, blood coagulation, wound healing, aging, etc. Taken together, this study suggests that Denisovan introgressions in PNG and Australians are evolutionarily important, and may help PNG and Australians in local adaptation. In this study, we also proposed a method that could efficiently identify archaic hominin introgressions in modern non-African genomes.

2016 ◽  
Author(s):  
Thomas Willems ◽  
Dina Zielinski ◽  
Assaf Gordon ◽  
Melissa Gymrek ◽  
Yaniv Erlich

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Irene M. Häfliger ◽  
Franz R. Seefried ◽  
Mirjam Spengeler ◽  
Cord Drögemüller

Abstract Background This study was carried out on the two Braunvieh populations reared in Switzerland, the dairy Brown Swiss (BS) and the dual-purpose Original Braunvieh (OB). We performed a genome-wide analysis of array data of trios (sire, dam, and offspring) from the routine genomic selection to identify candidate regions showing missing homozygosity and phenotypic associations with five fertility, ten birth, and nine growth-related traits. In addition, genome-wide single SNP regression studies based on 114,890 single nucleotide polymorphisms (SNPs) for each of the two populations were performed. Furthermore, whole-genome sequencing data of 430 cattle including 70 putative haplotype carriers were mined to identify potential candidate variants that were validated by genotyping the current population using a custom array. Results Using a trio-based approach, we identified 38 haplotype regions for BS and five for OB that segregated at low to moderate frequencies. For the BS population, we confirmed two known haplotypes, BH1 and BH2. Twenty-four variants that potentially explained the missing homozygosity and associated traits were detected, in addition to the previously reported TUBD1:p.His210Arg variant associated with BH2. For example, for BS we identified a stop-gain variant (p.Arg57*) in the MRPL55 gene in the haplotype region on chromosome 7. This region is associated with the ‘interval between first and last insemination’ trait in our data, and the MRPL55 gene is known to be associated with early pregnancy loss in mice. In addition, we discuss candidate missense variants in the CPT1C, MARS2, and ACSL5 genes for haplotypes mapped in BS. In OB, we highlight a haplotype region on chromosome 19, which is potentially caused by a frameshift variant (p.Lys828fs) in the LIG3 gene, which is reported to be associated with early embryonic lethality in mice. Furthermore, we propose another potential causal missense variant in the TUBGCP5 gene for a haplotype mapped in OB. Conclusions We describe, for the first time, several haplotype regions that segregate at low to moderate frequencies and provide evidence of causality by trait associations in the two populations of Swiss Braunvieh. We propose a list of six protein-changing variants as potentially causing missing homozygosity. These variants need to be functionally validated and incorporated in the breeding program.


2016 ◽  
Author(s):  
Suyash S. Shringarpure ◽  
Rasika A. Mathias ◽  
Ryan D. Hernandez ◽  
Timothy D. O’Connor ◽  
Zachary A. Szpiech ◽  
...  

ABSTRACTMotivationVariant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X).ResultsWe have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illumina’s single-sample caller CASAVA, Real Time Genomics’ multisample variant caller, and the GATK Unified Genotyper, respectively. Since most NGS sequencing data is accompanied by genotype data for the same samples, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g., a different set of criteria to determine quality for rare vs. common variants) and thereby provides insight into sequencing characteristics that indicate data quality for variants of different frequencies.AvailabilityCode will be made available prior to publication on Github.


2021 ◽  
Author(s):  
Jing Yu ◽  
Anita Szabo ◽  
Alistair T Pagnamenta ◽  
Ahmed Shalaby ◽  
Edoardo Giacopuzzi ◽  
...  

Discovery of disease-causing structural variants (dcSV) from whole genome sequencing data is difficult due to high number of false positives and a lack of efficient way to estimate allele frequency. Here we introduce SVRare, an application that aggregates structural variants (SV) called by other tools, and efficiently annotates rare SVs to aid dcSVs discovery. Applied in the Genomics England (GEL) research environment to data from the 100K Genomes Project, SVRare aggregated 554,060,126 SVs called by Manta and Canvas in all the 71,408 participants in the rare-disease arm. From a pilot study of 4313 families, SVRare identified 36 novel protein-coding disrupting SVs on diagnostic grade genes that may explain proband's phenotype. It is estimated that SVRare can increase SV-based diagnosis yield by at least 4-fold. We also performed a genome-wide association study, and uncovered clusters of dcSVs in genes with known pathogenicity, such as PKD1/2 - cystic kidney diseases and LDLR - familial hypercholesterolaemia.


2021 ◽  
Author(s):  
Anthony D Long ◽  
Alan Barbour ◽  
Phillip N Long ◽  
Vanessa J Cook ◽  
Arundhati Majumder

Although Peromyscus leucopus (deermouse) is not considered a genetic model system, its genus is well suited for addressing several questions of biologist interest, including the genetic bases of longevity, behavior, physiology, adaptation, and its ability to serve as a disease vector. Here we explore a diversity outbred approach for dissecting complex traits in Peromyscus leucopus, a non-traditional genetic model system. We take advantage of a closed colony of deer-mice founded from 38 individuals between 1982 and 1985 and subsequently maintained for 35+ years (~40-60 generations). From 405 low-pass (~1X) short-read sequenced deermice we accurately imputed genotypes at 17,751,882 SNPs. Conditional on observed genotypes for a subset of 297 individuals, simulations were conducted in which a QTL contributes 5% to a complex trait under three different genetic models. The power of either a haplotype- or marker-based statistical test was estimated to be 15-25% to detect the hidden QTL. Although modest, this power estimate is consistent with that of DO/HS mice and rat experiments for an experiment with ~300 individuals. This limitation in QTL detection is mostly associated with the stringent significance threshold required to hold the genome-wide false positive rate low, as in all cases we observe considerable linkage signal at the location of simulated QTL, suggesting a larger panel would exhibit greater power. For the subset of cases where a QTL was detected, localization ability appeared very desirable at ~1-2Mb. We finally carried out a GWAS on a demonstration trait, bleeding time. No tests exceeded the threshold for genome-wide significance, but one of four suggestive regions co-localizes with Von Willebrand factor. Our work suggests that complex traits can be dissected in founders-unknown P. leucopus colony mice in much the same manner as founders-known DO/HS mice and rats, with genotypes obtained from low pass sequencing data. Our results further suggest that the DO/HS approach can be powerfully extended to any system in which a founders-unknown closed colony has been maintained for several dozen generations.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Jing Shen ◽  
Shuang Wang ◽  
Abby B. Siegel ◽  
Helen Remotti ◽  
Qiao Wang ◽  
...  

Background.Previous studies, including ours, have examined the regulation of microRNAs (miRNAs) by DNA methylation, but whether this regulation occurs at a genome-wide level in hepatocellular carcinoma (HCC) is unclear.Subjects/Methods.Using a two-phase study design, we conducted genome-wide screening for DNA methylation and miRNA expression to explore the potential role of methylation alterations in miRNAs regulation.Results.We found that expressions of 25 miRNAs were statistically significantly different between tumor and nontumor tissues and perfectly differentiated HCC tumor from nontumor. Six miRNAs were overexpressed, and 19 were repressed in tumors. Among 133 miRNAs with inverse correlations between methylation and expression, 8 miRNAs (6%) showed statistically significant differences in expression between tumor and nontumor tissues. Six miRNAs were validated in 56 additional paired HCC tissues, and significant inverse correlations were observed for miR-125b and miR-199a, which is consistent with the inactive chromatin pattern found in HepG2 cells.Conclusion.These data suggest that the expressions of miR-125b and miR-199a are dramatically regulated by DNA hypermethylation that plays a key role in hepatocarcinogenesis.


BMC Genetics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Liping Guan ◽  
Ke Cao ◽  
Yong Li ◽  
Jian Guo ◽  
Qiang Xu ◽  
...  

Abstract Background Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8386~12,298 structure variants (SVs), 2111~2581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.


2019 ◽  
Vol 116 (12) ◽  
pp. 5653-5658 ◽  
Author(s):  
Lin Shao ◽  
Feng Xing ◽  
Conghao Xu ◽  
Qinghua Zhang ◽  
Jian Che ◽  
...  

Utilization of heterosis has greatly increased the productivity of many crops worldwide. Although tremendous progress has been made in characterizing the genetic basis of heterosis using genomic technologies, molecular mechanisms underlying the genetic components are much less understood. Allele-specific expression (ASE), or imbalance between the expression levels of two parental alleles in the hybrid, has been suggested as a mechanism of heterosis. Here, we performed a genome-wide analysis of ASE by comparing the read ratios of the parental alleles in RNA-sequencing data of an elite rice hybrid and its parents using three tissues from plants grown under four conditions. The analysis identified a total of 3,270 genes showing ASE (ASEGs) in various ways, which can be classified into two patterns: consistent ASEGs such that the ASE was biased toward one parental allele in all tissues/conditions, and inconsistent ASEGs such that ASE was found in some but not all tissues/conditions, including direction-shifting ASEGs in which the ASE was biased toward one parental allele in some tissues/conditions while toward the other parental allele in other tissues/conditions. The results suggested that these patterns may have distinct implications in the genetic basis of heterosis: The consistent ASEGs may cause partial to full dominance effects on the traits that they regulate, and direction-shifting ASEGs may cause overdominance. We also showed that ASEGs were significantly enriched in genomic regions that were differentially selected during rice breeding. These ASEGs provide an index of the genes for future pursuit of the genetic and molecular mechanism of heterosis.


Sign in / Sign up

Export Citation Format

Share Document