scholarly journals Ancestral Spectrum Analysis With Population-Specific Variants

2021 ◽  
Vol 12 ◽  
Author(s):  
Gang Shi ◽  
Qingmin Kuang

With the advance of sequencing technology, an increasing number of populations have been sequenced to study the histories of worldwide populations, including their divergence, admixtures, migration, and effective sizes. The variants detected in sequencing studies are largely rare and mostly population specific. Population-specific variants are often recent mutations and are informative for revealing substructures and admixtures in populations; however, computational methods and tools to analyze them are still lacking. In this work, we propose using reference populations and single nucleotide polymorphisms (SNPs) specific to the reference populations. Ancestral information, the best linear unbiased estimator (BLUE) of the ancestral proportion, is proposed, which can be used to infer ancestral proportions in recently admixed target populations and measure the extent to which reference populations serve as good proxies for the admixing sources. Based on the same panel of SNPs, the ancestral information is comparable across samples from different studies and is not affected by genetic outliers, related samples, or the sample sizes of the admixed target populations. In addition, ancestral spectrum is useful for detecting genetic outliers or exploring co-ancestry between study samples and the reference populations. The methods are implemented in a program, Ancestral Spectrum Analyzer (ASA), and are applied in analyzing high-coverage sequencing data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP). In the analyses of American populations from the 1000 Genomes Project, we demonstrate that recent admixtures can be dissected from ancient admixtures by comparing ancestral spectra with and without indigenous Americans being included in the reference populations.

2019 ◽  
Vol 2019 ◽  
pp. 1-11 ◽  
Author(s):  
Ying Wang ◽  
Jidong Ru ◽  
Tao Jin ◽  
Ming Sun ◽  
Lizhu Jia ◽  
...  

MicroRNAs (miRNAs) and single nucleotide polymorphisms (SNPs) play important roles in disease risk and development, especially cancer. Importantly, when SNPs are located in pre-miRNAs, they affect their splicing mechanism and change the function of miRNAs. To improve disease risk assessment, we propose an approach and developed a software tool, IsomiR_Find, to identify disease/phenotype-related SNPs and isomiRs in individuals. Our approach is based on the individual’s samples, with SNP information extracted from the 1000 Genomes Project. SNPs were mapped to pre-miRNAs based on whole-genome coordinates and then SNP-pre-miRNA sequences were constructed. Moreover, we developed matpred2, a software tool to identify the four splicing sites of mature miRNAs. Using matpred2, we identified isomiRs and then verified them by searching within individual miRNA sequencing data. Our approach yielded biomarkers for biological experiments, mined functions of miRNAs and SNPs, improved disease risk assessment, and provided a way to achieve individualized precision medicine.


2019 ◽  
Author(s):  
Mingrui Wang ◽  
Dapeng Wang ◽  
Jun Yu ◽  
Shi Huang

AbstractProteins were first used in the early 1960s to discover the molecular clock dating method and remain in common usage today in phylogenetic inferences based on neutral variations. To avoid substitution saturation, it is necessary to use slow evolving genes. However, it remains unclear whether fixed and standing missense changes in such genes may qualify as neutral. Here, based on the evolutionary rates as inferred from identity scores between orthologs in human and Macaca monkey, we found that the fraction of conservative amino acid mismatches between species was significantly higher in slow evolving proteins. We also examined the single nucleotide polymorphisms (SNPs) by using the 1000 genomes project data and found that missense SNPs in slow evolving proteins also had higher fraction of conservative changes, especially for common SNPs, consistent with more natural selection for SNPs, particularly rare ones, in fast evolving proteins. These results suggest that fixed and standing missense variations in slow evolving proteins are more likely to be neutral and hence better qualified for use in phylogenetic inferences.


2015 ◽  
Vol 32 (9) ◽  
pp. 1366-1372 ◽  
Author(s):  
Dmitry Prokopenko ◽  
Julian Hecker ◽  
Edwin K. Silverman ◽  
Marcello Pagano ◽  
Markus M. Nöthen ◽  
...  

2021 ◽  
Author(s):  
Scott T O’Donnell ◽  
Sorel T Fitz-Gibbon ◽  
Victoria L Sork

Abstract Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak.


Viruses ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 625 ◽  
Author(s):  
Jörg T. Wennmann ◽  
Jiangbin Fan ◽  
Johannes A. Jehle

Natural isolates of baculoviruses (as well as other dsDNA viruses) generally consist of homogenous or heterogenous populations of genotypes. The number and positions of single nucleotide polymorphisms (SNPs) from sequencing data are often used as suitable markers to study their genotypic composition. Identifying and assigning the specificities and frequencies of SNPs from high-throughput genome sequencing data can be very challenging, especially when comparing between several sequenced isolates or samples. In this study, the new tool “bacsnp”, written in R programming langue, was developed as a downstream process, enabling the detection of SNP specificities across several virus isolates. The basis of this analysis is the use of a common, closely related reference to which the sequencing reads of an isolate are mapped. Thereby, the specificities of SNPs are linked and their frequencies can be used to analyze the genetic composition across the sequenced isolate. Here, the downstream process and analysis of detected SNP positions is demonstrated on the example of three baculovirus isolates showing the fast and reliable detection of a mixed sequenced sample.


2018 ◽  
Vol 5 (suppl_1) ◽  
pp. S364-S364
Author(s):  
Roby Bhattacharyya ◽  
Alejandro Pironti ◽  
Bruce J Walker ◽  
Abigail Manson ◽  
Virginia Pierce ◽  
...  

Abstract Background Carbapenem-resistant Enterobacteriaceae (CRE) are a major public health threat. We report four clonally related Citrobacter freundii isolates harboring the blaKPC-3 carbapenemase in April–May 2017 that are nearly identical to a strain from 2014 at the same institution. Despite differing by ≤5 single nucleotide polymorphisms (SNPs), these isolates exhibited dramatic differences in carbapenemase plasmid architecture. Methods We sequenced four carbapenem-resistant C. freundii isolates from 2017 and compared them with an ongoing CRE surveillance project at our institution. SNPs were identified from Illumina MiSeq data aligned to a reference genome using the variant caller Pilon. Plasmids were assembled from Illumina and Oxford Nanopore sequencing data using Unicycler. Results The four 2017 isolates differed from one another by 0–5 chromosomal SNPs; two were identical. With one exception, these isolates differed by >38,000 SNPs from 25 C. freundii isolates sequenced from 2013 to 2017 at the same institution for CRE surveillance. The exception was a 2014 isolate that differed by 13–16 SNPs from each 2017 isolate, with 13 SNPs common to all four. Each C. freundii isolate harbored wild-type blaKPC-3. Despite the close relationship among the 2017 cluster, the plasmids harboring the blaKPC-3 genes differed dramatically: the carbapenemase occurred in one of the two different plasmids, with rearrangements between these plasmids across isolates. The related 2014 isolate harbored both plasmids, each with a separate copy of blaKPC-3. No transmission chains were found between any of the affected patients. Conclusion WGS confirmed clonality among four contemporaneous blaKPC-3-containing C. freundii isolates, and marked similarity with a 2014 isolate, within an institution. That only 13–16 SNPs varied between the 2014 and 2017 isolates suggests durable persistence of the blaKPC-3 gene within this lineage in a hospital ecosystem. The plasmids harboring these carbapenemase genes proved remarkably plastic, with plasmid loss and rearrangements occurring on the same time scale as two to three chromosomal point mutations. Combining short and long-read sequencing in a case cluster uniquely revealed unexpectedly rapid dynamics of carbapenemase plasmids, providing critical insight into their manner of spread. Disclosures M. J. Ferraro, SeLux Diagnostics: Scientific Advisor and Shareholder, Consulting fee. D. C. Hooper, SeLux Diagnostics: Scientific Advisor, Consulting fee.


2011 ◽  
Vol 300 (4) ◽  
pp. H1530-H1535 ◽  
Author(s):  
Carol Moreno ◽  
Jan M. Williams ◽  
Limin Lu ◽  
Mingyu Liang ◽  
Jozef Lazar ◽  
...  

Transfer of chromosome 13 from the Brown Norway (BN) rat onto the Dahl salt-sensitive (SS) genetic background attenuates the development of hypertension, but the genes involved remain to be identified. The purpose of the present study was to confirm by telemetry that a congenic strain [SS.BN-(D13Hmgc37-D13Got22)/Mcwi, line 5], carrying a 13.4-Mb segment of BN chromosome 13 from position 32.4 to 45.8 Mb, is protected from the development of hypertension and then to narrow the region of interest by creating and phenotyping 11 additional subcongenic strains. Mean arterial pressure (MAP) rose from 118 ± 1 to 186 ± 5 mmHg in SS rats fed a high-salt diet (8.0% NaCl) for 3 wk. Protein excretion increased from 56 ± 11 to 365 ± 37 mg/day. In contrast, MAP only increased to 152 ± 9 mmHg in the line 5 congenic strain. Six subcongenic strains carrying segments of BN chromosome 13 from 32.4 and 38.2 Mb and from 39.9 to 45.8 Mb were not protected from the development of hypertension. In contrast, MAP was reduced by ∼30 mmHg in five strains, carrying a 1.9-Mb common segment of BN chromosome 13 from 38.5 to 40.4 Mb. Proteinuria was reduced by ∼50% in these strains. Sequencing studies did not identify any nonsynonymous single nucleotide polymorphisms in the coding region of the genes in this region. RT-PCR studies indicated that 4 of the 13 genes in this region were differentially expressed in the kidney of two subcongenic strains that were partially protected from hypertension vs. those that were not. These results narrow the region of interest on chromosome 13 from 13.4 Mb (159 genes) to a 1.9-Mb segment containing only 13 genes, of which 4 are differentially expressed in strains partially protected from the development of hypertension.


2018 ◽  
Vol 78 (09) ◽  
pp. 866-870 ◽  
Author(s):  
Marlena Fejzo ◽  
Daria Arzy ◽  
Rayna Tian ◽  
Kimber MacGibbon ◽  
Patrick Mullin

Abstract Introduction Hyperemesis gravidarum (HG), a pregnancy complication characterized by severe nausea and vomiting in pregnancy, occurs in up to 2% of pregnancies. It is associated with both maternal and fetal morbidity. HG is highly heritable and recurs in approximately 80% of women. In a recent genome-wide association study, it was shown that placentation, appetite, and the cachexia gene GDF15 are linked to HG. The purpose of this study was to explore whether GDF15 alleles linked to overexpression of GDF15 protein segregate with the condition in families, and whether the GDF15 risk allele is associated with recurrence of HG. Methods We analyzed GDF15 overexpression alleles for segregation with disease using exome-sequencing data from 5 HG families. We compared the allele frequency of the GDF15 risk allele, rs16982345, in patients who had recurrence of HG with its frequency in those who did not have recurrence. Results Single nucleotide polymorphisms (SNPs) linked to higher levels of GDF15 segregated with disease in HG families. The GDF15 risk allele, rs16982345, was associated with an 8-fold higher risk of recurrence of HG. Conclusion The findings of this study support the hypothesis that GDF15 is involved in the pathogenesis of both familial and recurrent cases of HG. The findings may be applicable when counseling women with a familial history of HG or recurrent HG. The GDF15-GFRAL brainstem-activated pathway was recently identified and therapies to treat conditions of abnormal appetite are under development. Based on our findings, patients carrying GDF15 variants associated with GDF15 overexpression should be included in future studies of GDF15-GFRAL-based therapeutics. If safe, this approach could reduce maternal and fetal morbidity.


2020 ◽  
Vol 98 (6) ◽  
Author(s):  
Andre L S Garcia ◽  
Yutaka Masuda ◽  
Shogo Tsuruta ◽  
Stephen Miller ◽  
Ignacy Misztal ◽  
...  

Abstract Reliable single-nucleotide polymorphisms (SNP) effects from genomic best linear unbiased prediction BLUP (GBLUP) and single-step GBLUP (ssGBLUP) are needed to calculate indirect predictions (IP) for young genotyped animals and animals not included in official evaluations. Obtaining reliable SNP effects and IP requires a minimum number of animals and when a large number of genotyped animals are available, the algorithm for proven and young (APY) may be needed. Thus, the objectives of this study were to evaluate IP with an increasingly larger number of genotyped animals and to determine the minimum number of animals needed to compute reliable SNP effects and IP. Genotypes and phenotypes for birth weight, weaning weight, and postweaning gain were provided by the American Angus Association. The number of animals with phenotypes was more than 3.8 million. Genotyped animals were assigned to three cumulative year-classes: born until 2013 (N = 114,937), born until 2014 (N = 183,847), and born until 2015 (N = 280,506). A three-trait model was fitted using the APY algorithm with 19,021 core animals under two scenarios: 1) core 2013 (random sample of animals born until 2013) used for all year-classes and 2) core 2014 (random sample of animals born until 2014) used for year-class 2014 and core 2015 (random sample of animals born until 2015) used for year-class 2015. GBLUP used phenotypes from genotyped animals only, whereas ssGBLUP used all available phenotypes. SNP effects were predicted using genomic estimated breeding values (GEBV) from either all genotyped animals or only core animals. The correlations between GEBV from GBLUP and IP obtained using SNP effects from core 2013 were ≥0.99 for animals born in 2013 but as low as 0.07 for animals born in 2014 and 2015. Conversely, the correlations between GEBV from ssGBLUP and IP were ≥0.99 for animals born in all years. IP predictive abilities computed with GEBV from ssGBLUP and SNP predictions based on only core animals were as high as those based on all genotyped animals. The correlations between GEBV and IP from ssGBLUP were ≥0.76, ≥0.90, and ≥0.98 when SNP effects were computed using 2k, 5k, and 15k core animals. Suitable IP based on GEBV from GBLUP can be obtained when SNP predictions are based on an appropriate number of core animals, but a considerable decline in IP accuracy can occur in subsequent years. Conversely, IP from ssGBLUP based on large numbers of phenotypes from non-genotyped animals have persistent accuracy over time.


Sign in / Sign up

Export Citation Format

Share Document