The Identification of a 1916 Irish Rebel: New Approach for Estimating Relatedness From Low Coverage Homozygous Genomes

Mapping Intimacies ◽

10.1101/076992 ◽

2016 ◽

Author(s):

Daniel Fernandes ◽

Kendra Sirak ◽

Mario Novak ◽

John Finarelli ◽

John Byrne ◽

...

Keyword(s):

Ancient Dna ◽

Simulated Data ◽

Forensic Analysis ◽

Nucleotide Polymorphisms ◽

Traditional Methods ◽

Single Nucleotide ◽

Record Keeping ◽

Easter Rising ◽

Novel Approach ◽

Low Coverage

ABSTRACTThomas Kent was an Irish rebel who was executed by British forces in the aftermath of the Easter Rising armed insurrection of 1916 and buried in a shallow grave on Cork prison's grounds. In 2015, ninety-nine years after his death, a state funeral was offered to his living family to honor his role in the struggle for Irish independence. However, inaccuracies in record keeping did not allow the bodily remains that supposedly belonged to Kent to be identified with absolute certainty. Using a novel approach based on homozygous single nucleotide polymorphisms, we identified these remains to be those of Kent by comparing his genetic data to that of two known living relatives. As the DNA degradation found on Kent's DNA, characteristic of ancient DNA, rendered traditional methods of relatedness estimation unusable, we forced all loci homozygous, in a process we refer to as “forced homozygote approach”. The results were confirmed using simulated data for different relatedness classes. We argue that this method provides a necessary alternative for relatedness estimations, not only in forensic analysis, but also in ancient DNA studies, where reduced amounts of genetic information can limit the application of traditional methods.

A NOTE ON PHASING LONG GENOMIC REGIONS USING LOCAL HAPLOTYPE PREDICTIONS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002272 ◽

2006 ◽

Vol 04 (03) ◽

pp. 639-647 ◽

Cited By ~ 6

Author(s):

ELEAZAR ESKIN ◽

RODED SHARAN ◽

ERAN HALPERIN

Keyword(s):

Large Scale ◽

Computational Cost ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Novel Approach ◽

Maximum Likelihood Criterion ◽

The Common ◽

Genomic Regions ◽

High Computational Cost ◽

Combining Information

The common approaches for haplotype inference from genotype data are targeted toward phasing short genomic regions. Longer regions are often tackled in a heuristic manner, due to the high computational cost. Here, we describe a novel approach for phasing genotypes over long regions, which is based on combining information from local predictions on short, overlapping regions. The phasing is done in a way, which maximizes a natural maximum likelihood criterion. Among other things, this criterion takes into account the physical length between neighboring single nucleotide polymorphisms. The approach is very efficient and is applied to several large scale datasets and is shown to be successful in two recent benchmarking studies (Zaitlen et al., in press; Marchini et al., in preparation). Our method is publicly available via a webserver at .

Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3

Genetic Epidemiology ◽

10.1002/gepi.20106 ◽

2005 ◽

Vol 29 (S1) ◽

pp. S7-S28 ◽

Cited By ~ 22

Author(s):

Marsha A. Wilcox ◽

Elizabeth W. Pugh ◽

Heping Zhang ◽

Xiaoyun Zhong ◽

Douglas F. Levinson ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Linkage Analysis ◽

Genetic Analysis ◽

Microsatellite Markers ◽

Genetic Analysis Workshop ◽

Simulated Data ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Simulated Data Sets

Fabrication and Application of Single Nucleotide Polymorphisms Library on Magnetic Nanoparticles Using Adaptor PCR

Journal of Nanoscience and Nanotechnology ◽

10.1166/jnn.2008.18146 ◽

2008 ◽

Vol 8 (1) ◽

pp. 405-409 ◽

Cited By ~ 3

Author(s):

Hongna Liu ◽

Song Li ◽

Meiju Ji ◽

Libo Nie ◽

Jianrong Chen ◽

...

Keyword(s):

Magnetic Nanoparticles ◽

Single Nucleotide Polymorphisms ◽

Reliable Method ◽

Nucleotide Polymorphisms ◽

Probe Signal ◽

Single Nucleotide ◽

Novel Approach ◽

Mismatch Probe ◽

Allele Specific ◽

Agt Gene

We have developed a novel approach to fabricate single nucleotide polymorphisms (SNPs) library on magnetic nanoparticles (MNPs) based on adaptor PCR. Each SNP locus in the library was interrogated by hybridization with a pair of allele specific dual-color fluorescence (Cy3, Cy5) probes to determine SNP. Two SNPs loci (M235T and A-6G) associated with essential hypertension in the angiotensinogen (AGT) gene were detected by this method and their fluorescent signals were quantified. The fluorescent ratios (match probe: mismatch probe signal) of homozygous genotypes were over 3.0, whereas heterozygous genotypes had ratios near to 1.0. Without any complex multiplex PCR procedure, it is a simple, efficient and reliable method for the multiplex SNPs detection using limited amount of DNA samples from individuals.

TKGWV2: An ancient DNA relatedness pipeline for ultra-low coverage whole genome shotgun data

10.1101/2021.06.22.449449 ◽

2021 ◽

Author(s):

Daniel M Fernandes ◽

Olivia Cheronet ◽

Pere Gelabert ◽

Ron Pinhasi

Keyword(s):

Ancient Dna ◽

Simulated Data ◽

Error Rates ◽

Whole Genome Shotgun ◽

Published Data ◽

Whole Genome ◽

False Positive Error ◽

Kinship Analysis ◽

Methodological Improvement ◽

Low Coverage

Estimation of genetically related individuals is playing an increasingly important role in the ancient DNA field. In recent years, the numbers of sequenced individuals from single sites have been increasing, reflecting a growing interest in understanding the familial and social organisation of ancient populations. Although a few different methods have been specifically developed for ancient DNA, namely to tackle issues such as low-coverage homozygous data, they require a 0.1 - 1x minimum average genomic coverage per analysed pair of individuals between. Here we present an updated version of a method that enables estimates of 1st and 2nd-degrees of relatedness with as little as 0.026x average coverage, or around 1.3 million aligned reads per sample - 4 times less data than 0.1x. By using simulated data to estimate false positive error rates, we further show that a threshold even as low as 0.012x, or around 600,000 reads, will always show 1st-degree relationships as related. Lastly, by applying this method to published data, we are able to identify previously undocumented relationships using individuals previously excluded from kinship analysis due to their very low coverage. This methodological improvement has the potential to enable relatedness estimation on ancient whole genome shotgun data during routine low-coverage screening, and therefore improve project management when decisions need to be made on which individuals are to be further sequenced.

A Novel Approach to Exploring Potential Interactions among Single-Nucleotide Polymorphisms of Inflammation Genes in Gliomagenesis: An Exploratory Case-Only Study

Cancer Epidemiology Biomarkers & Prevention ◽

10.1158/1055-9965.epi-11-0203 ◽

2011 ◽

Vol 20 (8) ◽

pp. 1683-1689 ◽

Cited By ~ 6

Author(s):

E. Susan Amirian ◽

Michael E. Scheurer ◽

Yanhong Liu ◽

Anthony M. D'Amelio ◽

Richard S. Houlston ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Novel Approach ◽

Potential Interactions

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

A benchmark of transposon insertion detection tools using real data

Mobile DNA ◽

10.1186/s13100-019-0197-9 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Pol Vendrell-Mir ◽

Fabio Barteri ◽

Miriam Merenciano ◽

Josefa González ◽

Josep M. Casacuberta ◽

...

Keyword(s):

Simulated Data ◽

Real Data ◽

Good Precision ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Bioinformatic Tools ◽

Transposon Insertion ◽

Manual Curation ◽

Tool Performance ◽

Transposon Insertions

Abstract Background Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. Results We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. Conclusions We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines

GigaScience ◽

10.1093/gigascience/giaa007 ◽

2020 ◽

Vol 9 (2) ◽

Cited By ~ 17

Author(s):

Stephen J Bush ◽

Dona Foster ◽

David W Eyre ◽

Emily L Clark ◽

Nicola De Maio ◽

...

Keyword(s):

Reference Genome ◽

Simulated Data ◽

Real Data ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Snp Calling ◽

Single Nucleotide Polymorphism Calling ◽

Nucleotide Divergence

Abstract Background Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. Results We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.

Adjusted likelihood-ratio test for variants with unknown genotypes

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400206 ◽

2018 ◽

Vol 16 (05) ◽

pp. 1840020 ◽

Cited By ~ 2

Author(s):

Ronald J. Nowling ◽

Scott J. Emrich

Keyword(s):

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Malaria Vectors ◽

Similar Species ◽

Ratio Test ◽

Anopheles Coluzzii ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Novel Approach ◽

Lr Test

Association tests performed with the Likelihood-Ratio Test (LR Test) can be an alternative to [Formula: see text], which is often used in population genetics to find variants of interest. Because the LR Test has several properties that could make it preferable to [Formula: see text], we propose a novel approach for modeling unknown genotypes in highly-similar species. To show the effectiveness of this LR Test approach, we apply it to single-nucleotide polymorphisms (SNPs) associated with the recent speciation of the malaria vectors Anopheles gambiae and Anopheles coluzzii and compare to [Formula: see text].

Detecting Inversions with PCA in the Presence of Population Structure

10.1101/736900 ◽

2019 ◽

Author(s):

Ronald J. Nowling ◽

Krystal R. Manke ◽

Scott J. Emrich

Keyword(s):

Simulated Data ◽

Principal Component ◽

Real Data ◽

Malaria Vectors ◽

Anopheles Coluzzii ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Single Nucleotide ◽

Closely Related Species ◽

Proper Analysis

ABSTRACTChromosomal inversions are associated with reproductive isolation and adaptation in insects such as Drosophila melanogaster and the malaria vectors Anopheles gambiae and Anopheles coluzzii. While methods based on read alignment have been useful in humans for detecting inversions, these methods are less successful in insects due to long repeated sequences at the breakpoints. Alternatively, inversions can be detected using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). We apply PCA-based inversion detection to a simulated data set and real data from multiple insect species, which vary in complexity from a single inversion in samples drawn from a single population to analyzing multiple overlapping inversions occurring in closely-related species, samples of which that were generated from multiple geographic locations. We show empirically that proper analysis of these data can be challenging when multiple inversions or populations are present, and that our alternative framework is more robust in these more difficult scenarios.