Detecting Genetic Ancestry and Adaptation in the Taiwanese Han People

Abstract The Taiwanese people are composed of diverse indigenous populations and the Taiwanese Han. About 95% of the Taiwanese identify themselves as Taiwanese Han, but this may not be a homogeneous population because they migrated to the island from various regions of continental East Asia over a period of 400 years. Little is known about the underlying patterns of genetic ancestry, population admixture, and evolutionary adaptation in the Taiwanese Han people. Here, we analyzed the whole-genome single-nucleotide polymorphism genotyping data from 14,401 individuals of Taiwanese Han collected by the Taiwan Biobank and the whole-genome sequencing data for a subset of 772 people. We detected four major genetic ancestries with distinct geographic distributions (i.e., Northern, Southeastern, Japonic, and Island Southeast Asian ancestries) and signatures of population mixture contributing to the genomes of Taiwanese Han. We further scanned for signatures of positive natural selection that caused unusually long-range haplotypes and elevations of hitchhiked variants. As a result, we identified 16 candidate loci in which selection signals can be unambiguously localized at five single genes: CTNNA2, LRP1B, CSNK1G3, ASTN2, and NEO1. Statistical associations were examined in 16 metabolic-related traits to further elucidate the functional effects of each candidate gene. All five genes appear to have pleiotropic connections to various types of disease susceptibility and significant associations with at least one metabolic-related trait. Together, our results provide critical insights for understanding the evolutionary history and adaption of the Taiwanese Han population.

Download Full-text

Risk prediction and marker selection in nonsynonymous single nucleotide polymorphisms using whole genome sequencing data

Animal Cells and Systems ◽

10.1080/19768354.2020.1860125 ◽

2020 ◽

Vol 24 (6) ◽

pp. 321-328

Author(s):

Young-Sup Lee ◽

KyeongHye Won ◽

Donghyun Shin ◽

Jae-Don Oh

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Risk Prediction ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Marker Selection

Download Full-text

Improvement of genomic prediction by integrating additional single nucleotide polymorphisms selected from imputed whole genome sequencing data

Heredity ◽

10.1038/s41437-019-0246-7 ◽

2019 ◽

Vol 124 (1) ◽

pp. 37-49 ◽

Cited By ~ 9

Author(s):

Aoxing Liu ◽

Mogens Sandø Lund ◽

Didier Boichard ◽

Emre Karaman ◽

Sebastien Fritz ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Prediction ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide

Download Full-text

In silico detection of phylogenetic informative Y-chromosomal single nucleotide polymorphisms from whole genome sequencing data

Electrophoresis ◽

10.1002/elps.201300459 ◽

2014 ◽

Vol 35 (21-22) ◽

pp. 3102-3110 ◽

Cited By ~ 5

Author(s):

Anneleen Van Geystelen ◽

Tom Wenseleers ◽

Ronny Decorte ◽

Maarten J. L. Caspers ◽

Maarten H. D. Larmuseau

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

In Silico ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide

Download Full-text

Genetic ancestry plays a central role in population pharmacogenomics

Communications Biology ◽

10.1038/s42003-021-01681-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Hsin-Chou Yang ◽

Chia-Wei Chen ◽

Yu-Ting Lin ◽

Shih-Kai Chu

Keyword(s):

Principal Component ◽

Genetic Ancestry ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Ancestry Informative Markers ◽

Sequencing Data ◽

Whole Genome Analysis ◽

Protein Coding ◽

Functional Variant ◽

Multiple Drugs

AbstractRecent studies have pointed out the essential role of genetic ancestry in population pharmacogenetics. In this study, we analyzed the whole-genome sequencing data from The 1000 Genomes Project (Phase 3) and the pharmacogenetic information from Drug Bank, PharmGKB, PharmaADME, and Biotransformation. Here we show that ancestry-informative markers are enriched in pharmacogenetic loci, suggesting that trans-ancestry differentiation must be carefully considered in population pharmacogenetics studies. Ancestry-informative pharmacogenetic loci are located in both protein-coding and non-protein-coding regions, illustrating that a whole-genome analysis is necessary for an unbiased examination over pharmacogenetic loci. Finally, those ancestry-informative pharmacogenetic loci that target multiple drugs are often a functional variant, which reflects their importance in biological functions and pathways. In summary, we develop an efficient algorithm for an ultrahigh-dimensional principal component analysis. We create genetic catalogs of ancestry-informative markers and genes. We explore pharmacogenetic patterns and establish a high-accuracy prediction panel of genetic ancestry. Moreover, we construct a genetic ancestry pharmacogenomic database Genetic Ancestry PhD (http://hcyang.stat.sinica.edu.tw/databases/genetic_ancestry_phd/).

Download Full-text

Assessment of software for somatic single nucleotide variant identification using simulated whole-genome sequencing data of cancer

10.18699/mm-hpc-bbb-2018-25 ◽

2018 ◽

pp. 34-34

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Single Nucleotide Variant ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Single Nucleotide ◽

Variant Identification

Download Full-text

From whole genome sequencing data toward a simple genotyping tool: application to the animal pathogen Mycobacterium bovis

10.26226/morressier.56d5ba2ad462b80296c965c0 ◽

2016 ◽

Author(s):

Lorraine Michelet

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Development and evaluation of an outbreak surveillance system integrating whole genome sequencing data for non-typhoidal Salmonella in London and South East of England, 2016-17

Epidemiology and Infection ◽

10.1017/s0950268821001400 ◽

2021 ◽

pp. 1-26

Author(s):

Karthik Paranthaman ◽

Piers Mook ◽

Daniele Curtis ◽

Edward-Wynne Evans ◽

Emma Crawley-Boevey ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Surveillance System ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Whole genome sequencing reveals high differentiation, low levels of genetic diversity and short runs of homozygosity among Swedish wels catfish

Heredity ◽

10.1038/s41437-021-00438-5 ◽

2021 ◽

Author(s):

Axel Jensen ◽

Mette Lillie ◽

Kristofer Bergström ◽

Per Larsson ◽

Jacob Höglund

Keyword(s):

Genetic Diversity ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Peripheral Populations ◽

Whole Genome ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Isolated Populations ◽

Native Populations

AbstractThe use of genetic markers in the context of conservation is largely being outcompeted by whole-genome data. Comparative studies between the two are sparse, and the knowledge about potential effects of this methodology shift is limited. Here, we used whole-genome sequencing data to assess the genetic status of peripheral populations of the wels catfish (Silurus glanis), and discuss the results in light of a recent microsatellite study of the same populations. The Swedish populations of the wels catfish have suffered from severe declines during the last centuries and persists in only a few isolated water systems. Fragmented populations generally are at greater risk of extinction, for example due to loss of genetic diversity, and may thus require conservation actions. We sequenced individuals from the three remaining native populations (Båven, Emån, and Möckeln) and one reintroduced population of admixed origin (Helge å), and found that genetic diversity was highest in Emån but low overall, with strong differentiation among the populations. No signature of recent inbreeding was found, but a considerable number of short runs of homozygosity were present in all populations, likely linked to historically small population sizes and bottleneck events. Genetic substructure within any of the native populations was at best weak. Individuals from the admixed population Helge å shared most genetic ancestry with the Båven population (72%). Our results are largely in agreement with the microsatellite study, and stresses the need to protect these isolated populations at the northern edge of the distribution of the species.

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text