scholarly journals Analysis of Five Deep-sequenced Trio-genomes of the Peninsular Malaysia Orang Asli and North Borneo Populations

2019 ◽  
Author(s):  
Lian Deng ◽  
Haiyi Lou ◽  
Xiaoxi Zhang ◽  
Thiruvahindrapuram Bhooma ◽  
Dongsheng Lu ◽  
...  

Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~30×) of five native trios from Malaysia, and discovered approximately 6.9 million single nucleotide variants (SNVs), 1.2 million small insertions and deletions (indels), and 9,000 copy number variants (CNVs) in the 15 samples. We found 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify autosomal de novo variants and estimated the mutation rates to be 0.81×10-8–1.33×10-8 , 1.0×10-9–2.9×10-9, and ~0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for accurate haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example was a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.

2019 ◽  
Author(s):  
Lian Deng ◽  
Haiyi Lou ◽  
Xiaoxi Zhang ◽  
Thiruvahindrapuram Bhooma ◽  
Dongsheng Lu ◽  
...  

Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~30×) of five native trios from Malaysia, and discovered approximately 6.9 million single nucleotide variants (SNVs), 1.2 million small insertions and deletions (indels), and 9,000 copy number variants (CNVs) in the 15 samples. We found 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify autosomal de novo variants and estimated the mutation rates to be 0.81×10-8–1.33×10-8 , 1.0×10-9–2.9×10-9, and ~0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for accurate haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example was a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.


2019 ◽  
Author(s):  
Lian Deng ◽  
Haiyi Lou ◽  
Xiaoxi Zhang ◽  
Thiruvahindrapuram Bhooma ◽  
Dongsheng Lu ◽  
...  

Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~30×) of five native trios from Malaysia, and discovered approximately 6.9 million single nucleotide variants (SNVs), 1.2 million small insertions and deletions (indels), and 9,000 copy number variants (CNVs) in the 15 samples. We found 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify autosomal de novo variants and estimated the mutation rates to be 0.81×10-8–1.33×10-8 , 1.0×10-9–2.9×10-9 , and ~0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for accurate haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example was a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.


2019 ◽  
Author(s):  
Lian Deng ◽  
Haiyi Lou ◽  
Xiaoxi Zhang ◽  
Thiruvahindrapuram Bhooma ◽  
Dongsheng Lu ◽  
...  

Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~30×) of five native trios from Malaysia, and discovered approximately 6.9 million single nucleotide variants (SNVs), 1.2 million small insertions and deletions (indels), and 9,000 copy number variants (CNVs) in the 15 samples. We found 3.9% SNVs, 4.7% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the mutation rates to be 0.81×10-8–1.33×10-8, 1.0×10-9–2.9×10-9, and ~0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for accurate haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example was a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Lian Deng ◽  
Haiyi Lou ◽  
Xiaoxi Zhang ◽  
Bhooma Thiruvahindrapuram ◽  
Dongsheng Lu ◽  
...  

Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~ 30×) of five native trios from Peninsular Malaysia and North Borneo, and characterized the genomic variants, including single nucleotide variants (SNVs), small insertions and deletions (indels) and copy number variants (CNVs). We discovered approximately 6.9 million SNVs, 1.2 million indels, and 9000 CNVs in the 15 samples, of which 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify de novo variants and estimated the autosomal mutation rates to be 0.81 × 10− 8 – 1.33 × 10− 8, 1.0 × 10− 9 – 2.9 × 10− 9, and ~ 0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example is a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.


2018 ◽  
Vol 137 (2) ◽  
pp. 161-173 ◽  
Author(s):  
Chee-Wei Yew ◽  
Dongsheng Lu ◽  
Lian Deng ◽  
Lai-Ping Wong ◽  
Rick Twee-Hee Ong ◽  
...  

2019 ◽  
Vol 37 (4) ◽  
pp. 994-1006 ◽  
Author(s):  
María C Ávila-Arcos ◽  
Kimberly F McManus ◽  
Karla Sandoval ◽  
Juan Esteban Rodríguez-Rodríguez ◽  
Viridiana Villa-Islas ◽  
...  

Abstract Native American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico’s Indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five Indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of Indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 KYA and subsequently diverging locally 6.5 and 5.7 KYA, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rarámuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rarámuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern Indigenous group from Oaxaca whose height is extremely low compared to other Native populations.


2020 ◽  
Author(s):  
Xiufeng Huang ◽  
Zi-Yang Xia ◽  
Xiaoyun Bin ◽  
Guanglin He ◽  
Jianxin Guo ◽  
...  

ABSTRACTSouthern China is the birthplace of rice-cultivating agriculture, different language families, and human migrations that facilitated these cultural diffusions. The fine-scale demographic history in situ, however, remains unclear. To comprehensively cover the genetic diversity in East and Southeast Asia, we generated genome-wide SNP data from 211 present-day Southern Chinese and co-analyzed them with more than 1,200 ancient and modern genomes. We discover that the previously described ‘Southern East Asian’ or ‘Yangtze River Farmer’ lineage is monophyletic but not homogeneous, comprising four regionally differentiated sub-ancestries. These ancestries are respectively responsible for the transmission of Austronesian, Kra-Dai, Hmong-Mien, and Austroasiatic languages and their original homelands successively distributed from East to West in Southern China. Multiple phylogenetic analyses support that the earliest living branching among East Asian-related populations is First Americans (∼27,700 BP), followed by the pre-LGM differentiation between Northern and Southern East Asians (∼23,400 BP) and the pre-Neolithic split between Coastal and Inland Southern East Asians (∼16,400 BP). In North China, distinct coastal and inland routes of south-to-north gene flow had established by the Holocene, and further migration and admixture formed the genetic profile of Sinitic speakers by ∼4,000 BP. Four subsequent massive migrations finalized the complete genetic structure of present-day Southern Chinese. First, a southward Sinitic migration and the admixture with Kra-Dai speakers formed the ‘Sinitic Cline’. Second, a bi-directional admixture between Hmong-Mien and Kra-Dai speakers gave rise to the ‘Hmong-Mien Cline’ in the interior of South China between ∼2,000 and ∼1,000 BP. Third, a southwestward migration of Kra-Dai speakers in recent ∼2,000 years impacted the genetic profile for the majority of Mainland Southeast Asians. Finally, an admixture between Tibeto-Burman incomers and indigenous Austroasiatic speakers formed the Tibeto-Burman speakers in Southeast Asia by ∼2,000 BP.


2019 ◽  
Author(s):  
María C. Ávila-Arcos ◽  
Kimberly F. McManus ◽  
Karla Sandoval ◽  
Juan Esteban Rodríguez-Rodríguez ◽  
Alicia R. Martin ◽  
...  

AbstractNative American genetic variation remains underrepresented in most catalogs of human genome sequencing data. Previous genotyping efforts have revealed that Mexico’s indigenous population is highly differentiated and substructured, thus potentially harboring higher proportions of private genetic variants of functional and biomedical relevance. Here we have targeted the coding fraction of the genome and characterized its full site frequency spectrum by sequencing 76 exomes from five indigenous populations across Mexico. Using diffusion approximations, we modeled the demographic history of indigenous populations from Mexico with northern and southern ethnic groups splitting 7.2 kya and subsequently diverging locally 6.5 kya and 5.7 kya, respectively. Selection scans for positive selection revealed BCL2L13 and KBTBD8 genes as potential candidates for adaptive evolution in Rarámuris and Triquis, respectively. BCL2L13 is highly expressed in skeletal muscle and could be related to physical endurance, a well-known phenotype of the northern Mexico Rarámuri. The KBTBD8 gene has been associated with idiopathic short stature and we found it to be highly differentiated in Triqui, a southern indigenous group from Oaxaca whose height is extremely low compared to other native populations.


Genetics ◽  
2000 ◽  
Vol 155 (3) ◽  
pp. 1429-1437
Author(s):  
Oliver G Pybus ◽  
Andrew Rambaut ◽  
Paul H Harvey

Abstract We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.


Sign in / Sign up

Export Citation Format

Share Document