Estimating Genetic Kin Relationships in Prehistoric Populations

A likelihood ratio approach for identifying three-quarter siblings in genetic databases

Heredity ◽

10.1038/s41437-020-00392-8 ◽

2021 ◽

Author(s):

Iván Galván-Femenía ◽

Carles Barceló-Vidal ◽

Lauro Sumoy ◽

Victor Moreno ◽

Rafael de Cid ◽

...

Keyword(s):

Likelihood Ratio ◽

Family Relationships ◽

Genetic Research ◽

Data Set ◽

Scientific Disciplines ◽

Genome Wide ◽

Standard Quality ◽

Control Procedures ◽

Identity By State ◽

Second Degree

AbstractThe detection of family relationships in genetic databases is of interest in various scientific disciplines such as genetic epidemiology, population and conservation genetics, forensic science, and genealogical research. Nowadays, screening genetic databases for related individuals forms an important aspect of standard quality control procedures. Relatedness research is usually based on an allele sharing analysis of identity by state (IBS) or identity by descent (IBD) alleles. Existing IBS/IBD methods mainly aim to identify first-degree relationships (parent–offspring or full siblings) and second degree (half-siblings, avuncular, or grandparent–grandchild) pairs. Little attention has been paid to the detection of in-between first and second-degree relationships such as three-quarter siblings (3/4S) who share fewer alleles than first-degree relationships but more alleles than second-degree relationships. With the progressively increasing sample sizes used in genetic research, it becomes more likely that such relationships are present in the database under study. In this paper, we extend existing likelihood ratio (LR) methodology to accurately infer the existence of 3/4S, distinguishing them from full siblings and second-degree relatives. We use bootstrap confidence intervals to express uncertainty in the LRs. Our proposal accounts for linkage disequilibrium (LD) by using marker pruning, and we validate our methodology with a pedigree-based simulation study accounting for both LD and recombination. An empirical genome-wide array data set from the GCAT Genomes for Life cohort project is used to illustrate the method.

G2P: a Genome-Wide-Association-Study simulation tool for genotype simulation, phenotype simulation and power evaluation

Bioinformatics ◽

10.1093/bioinformatics/btz126 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3852-3854 ◽

Cited By ~ 3

Author(s):

You Tang ◽

Xiaolei Liu

Keyword(s):

Association Study ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Supplementary Information ◽

Maximum Efficiency ◽

Genotype Data ◽

Simulation Tool ◽

Genome Wide ◽

Phenotype Data ◽

Power Evaluation

Abstract Motivation Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. Results Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. Availability and implementation The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. Supplementary information Supplementary data are available at Bioinformatics online.

Genome‐wide SNP typing of ancient DNA: Determination of hair and eye color of Bronze Age humans from their skeletal remains

American Journal of Physical Anthropology ◽

10.1002/ajpa.23996 ◽

2020 ◽

Vol 172 (1) ◽

pp. 99-109 ◽

Cited By ~ 1

Author(s):

Nicole Schmidt ◽

Katharina Schücker ◽

Ina Krause ◽

Thilo Dörk ◽

Michael Klintschar ◽

...

Keyword(s):

Bronze Age ◽

Ancient Dna ◽

Skeletal Remains ◽

Eye Color ◽

Snp Typing ◽

Genome Wide

OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis

10.1101/479097 ◽

2018 ◽

Author(s):

Florian Thibord ◽

Claire Perret ◽

Maguelonne Roux ◽

Pierre Suchon ◽

Marine Germain ◽

...

Keyword(s):

Mirna Sequence ◽

Biological Knowledge ◽

Genotype Data ◽

Link Type ◽

Genome Wide ◽

Heterozygous Carriers ◽

Alignment Analysis ◽

Mirna Editing ◽

Novel Algorithm ◽

Alignment Step

AbstractNext-generation sequencing is an increasingly popular and efficient approach to characterize the full set of microRNAs (miRNAs) present in human biosamples. MiRNAs’ detection and quantification still remain a challenge as they can undergo different post transcriptional modifications and might harbor genetic variations (polymiRs) that may impact on the alignment step. We present a novel algorithm, OPTIMIR, that incorporates biological knowledge on miRNA editing and genome-wide genotype data available in the processed samples to improve alignment accuracy.OPTIMIR was applied to 391 human plasma samples that had been typed with genome-wide genotyping arrays. OPTIMIR was able to detect genotyping errors, suggested the existence of novel miRNAs and highlighted the allelic imbalance expression of polymiRs in heterozygous carriers.OPTIMIR is written in python, and freely available on the GENMED website (http://www.genmed.fr/index.php/fr/) and on Github (github.com/FlorianThibord/OptimiR).

Genome-wide genetic data on ~500,000 UK Biobank participants

10.1101/166298 ◽

2017 ◽

Cited By ~ 303

Author(s):

Clare Bycroft ◽

Colin Freeman ◽

Desislava Petkova ◽

Gavin Band ◽

Lloyd T. Elliott ◽

...

Keyword(s):

Quality Control ◽

Allelic Variation ◽

Association Studies ◽

Genetic Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genotype Data ◽

Uk Biobank ◽

Genome Wide ◽

Wide Range

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.

A genome-wide association study for host resistance to Ostreid Herpesvirus in Pacific oysters (Crassostrea gigas)

10.1101/223032 ◽

2017 ◽

Author(s):

Alejandro P. Gutierrez ◽

Tim P. Bearf ◽

Chantelle Hooped ◽

Craig A. Stentort ◽

Matthew B. Sanders ◽

...

Keyword(s):

Host Resistance ◽

Selective Breeding ◽

Genome Wide Association Study ◽

Pacific Oyster ◽

Genetic Resistance ◽

Genome Wide Association ◽

Genotype Data ◽

Oyster Aquaculture ◽

Genome Wide ◽

A Genome

AbstractOstreid herpesvirus (OsHV) can cause mass mortality events in Pacific oyster aquaculture. While various factors impact on the severity of outbreaks, it is clear that genetic resistance of the host is an important determinant of mortality levels. This raises the possibility of selective breeding strategies to improve the genetic resistance of farmed oyster stocks, thereby contributing to disease control. Traditional selective breeding can be augmented by use of genetic markers, either via marker-assisted or genomic selection. The aim of the current study was to investigate the genetic architecture of resistance to OsHV in Pacific oyster, to identify genomic regions containing putative resistance genes, and to inform the use of genomics to enhance efforts to breed for resistance. To achieve this, a population of ~1,000 juvenile oysters were experimentally challenged with a virulent form of OsHV, with samples taken from mortalities and survivors for genotyping and qPCR measurement of viral load. The samples were genotyped using a recently-developed SNP array, and the genotype data were used to reconstruct the pedigree. Using these pedigree and genotype data, the first high density linkage map was constructed for Pacific oyster, containing 20,353 SNPs mapped to the ten pairs of chromosomes. Genetic parameters for resistance to OsHV were estimated, indicating a significant but low heritability for the binary trait of survival and also for viral load measures (h2 0.12 – 0.25). A genome-wide association study highlighted a region of linkage group 6 containing a significant QTL affecting host resistance. These results are an important step towards identification of genes underlying resistance to OsHV in oyster, and a step towards applying genomic data to enhance selective breeding for disease resistance in oyster aquaculture.

VCF2PopTree: a one-click client-side software to construct population phylogeny from genome-wide SNPs

10.7287/peerj.preprints.27682 ◽

2019 ◽

Author(s):

Sankar Subramanian ◽

Umayal Ramasamy ◽

David Chen

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Web Applications ◽

Third Party ◽

Genotype Data ◽

Whole Genome ◽

Genome Data ◽

Genome Wide ◽

Software Programs ◽

Computationally Intensive

In the past decades a number of software programs have been developed to deduce the phylogenetic relationship between populations. However, these programs are not suited for large-scale whole genome data. Recently, a few standalone or web applications have been developed to handle genome-wide data, but they were either computationally intensive, dependent on third party software or required significant time and resource of a web server. In the post-genomic era, researchers are able to obtain bioinformatically processed high-quality publication-ready whole genome data for many individuals in a population from next generation sequencing companies due to the reduction in the cost of sequencing and analysis. Such genotype data is typically presented in the Variant Call Format (VCF) and there is no simple software available that uses this data to construct the phylogeny of populations in a short time. To address this limitation, we have developed a one-click user-friendly software, VCF2PopTree that uses gnome-wide SNPs to construct and display phylogenetic trees in seconds to minutes. For example, it reads a 1 GB VCF file and draws a tree in less than 5 minutes. VCF2PopTree accepts genotype data from a local machine, constructs a tree using UPGMA and Neighbour-Joining algorithms and displays it on a web-browser. It also produces pairwise-diversity matrix in MEGA and PHYLIP file formats as well as trees in the Newick format which could be directly used by other popular phylogenetic software programs. The software including the source code, a test VCF input file and short documentation are available at: https://github.com/sansubs/vcf2pop.

VCF2PopTree: a one-click client-side software to construct population phylogeny from genome-wide SNPs

10.7287/peerj.preprints.27682v1 ◽

2019 ◽

Author(s):

Sankar Subramanian ◽

Umayal Ramasamy ◽

David Chen

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Web Applications ◽

Third Party ◽

Genotype Data ◽

Whole Genome ◽

Genome Data ◽

Genome Wide ◽

Software Programs ◽

Computationally Intensive

In the past decades a number of software programs have been developed to deduce the phylogenetic relationship between populations. However, these programs are not suited for large-scale whole genome data. Recently, a few standalone or web applications have been developed to handle genome-wide data, but they were either computationally intensive, dependent on third party software or required significant time and resource of a web server. In the post-genomic era, researchers are able to obtain bioinformatically processed high-quality publication-ready whole genome data for many individuals in a population from next generation sequencing companies due to the reduction in the cost of sequencing and analysis. Such genotype data is typically presented in the Variant Call Format (VCF) and there is no simple software available that uses this data to construct the phylogeny of populations in a short time. To address this limitation, we have developed a one-click user-friendly software, VCF2PopTree that uses gnome-wide SNPs to construct and display phylogenetic trees in seconds to minutes. For example, it reads a 1 GB VCF file and draws a tree in less than 5 minutes. VCF2PopTree accepts genotype data from a local machine, constructs a tree using UPGMA and Neighbour-Joining algorithms and displays it on a web-browser. It also produces pairwise-diversity matrix in MEGA and PHYLIP file formats as well as trees in the Newick format which could be directly used by other popular phylogenetic software programs. The software including the source code, a test VCF input file and short documentation are available at: http://sankarsubramanian.net/dat/index.html.

Towards a new history and geography of human genes informed by ancient DNA

10.1101/003517 ◽

2014 ◽

Cited By ~ 1

Author(s):

Joseph Pickrell ◽

David Reich

Keyword(s):

Natural Selection ◽

Ancient Dna ◽

Population Replacement ◽

Local Conditions ◽

Geographic Locations ◽

Technological Advances ◽

Genome Wide ◽

Human Genes ◽

Genome Wide Data ◽

History Of

Genetic information contains a record of the history of our species, and technological advances have transformed our ability to access this record. Many studies have used genome-wide data from populations today to learn about the peopling of the globe and subsequent adaptation to local conditions. Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture and population replacement have been the rule rather than the exception in human history. In light of this, we argue that it is time to critically re-evaluate current views of the peopling of the globe and the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically-known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.

Polygenic Prediction of Complex Traits with Iterative Screen Regression Models

10.1101/2020.11.29.402180 ◽

2020 ◽

Author(s):

Meng Luo ◽

Shiliang Gu

Keyword(s):

Genetic Variants ◽

Complex Traits ◽

Regression Models ◽

Association Studies ◽

Prediction Methods ◽

Genome Wide Association Studies ◽

Genotype Data ◽

Genetic Prediction ◽

Genome Wide ◽

Genome Prediction

AbstractAlthough genome-wide association studies have successfully identified thousands of markers associated with various complex traits and diseases, our ability to predict such phenotypes remains limited. A perhaps ignored explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. However, using genotype data for individuals to perform accurate genetic prediction of complex traits can promote genomic selection in animal and plant breeding and can lead to the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling genetic variants together via polygenic methods. Here, we also utilize our proposed polygenic methods, which refer to as the iterative screen regression model (ISR) for genome prediction. We compared ISR with several commonly used prediction methods with simulations. We further applied ISR to predicting 15 traits, including the five species of cattle, rice, wheat, maize, and mice. The results of the study indicate that the ISR method performs well than several commonly used polygenic methods and stability.