The Origin of a Coastal Indigenous Horse Breed in China Revealed by Genome-Wide SNP Data

: The Jinjiang horse is a unique Chinese indigenous horse breed distributed in the southern coastal areas, but the ancestry of Jinjiang horses is not well understood. Here, we used Equine SNP70 Bead Array technology to genotype 301 horses representing 10 Chinese indigenous horse breeds, and we integrated the published genotyped data of 352 individuals from 14 foreign horse breeds to study the relationships between Jinjiang horses and horse breeds from around the world. Principal component analysis (PCA), linkage disequilibrium (LD), runs of homozygosity (ROH) analysis, and ancestry estimating methods were conducted to study the population relationships and the ancestral sources and genetic structure of Jinjiang horses. The results showed that there is no close relationship between foreign horse breeds and Jinjiang horses, and Jinjiang horses shared a similar genetic background with Baise horses. TreeMix analysis revealed that there was gene flow from Chakouyi horses to Jinjiang horses. The ancestry analysis showed that Baise horses and Chakouyi horses are the most closely related ancestors of Jinjiang horses. In conclusion, our results showed that Jinjiang horses have a native origin and that Baise horses and Chakouyi horses were key ancestral sources of Jinjiang horses. The study also suggested that ancient trade activities and the migration of human beings had important effects on indigenous horse breeds in China.

Download Full-text

Genome-wide SNP data of Izumo and Makurazaki populations support inner-dual structure model for origin of Yamato people

Journal of Human Genetics ◽

10.1038/s10038-020-00898-3 ◽

2021 ◽

Author(s):

Timothy Jinam ◽

Yosuke Kawai ◽

Yoichiro Kamatani ◽

Shunro Sonoda ◽

Kanro Makisumi ◽

...

Keyword(s):

Phylogenetic Network ◽

Principal Component ◽

Human Populations ◽

Structure Model ◽

Rice Farming ◽

Asian Continent ◽

Dual Structure ◽

Snp Data ◽

Genome Wide ◽

Kagoshima Prefecture

AbstractThe “Dual Structure” model on the formation of the modern Japanese population assumes that the indigenous hunter-gathering population (symbolized as Jomon people) admixed with rice-farming population (symbolized as Yayoi people) who migrated from the Asian continent after the Yayoi period started. The Jomon component remained high both in Ainu and Okinawa people who mainly reside in northern and southern Japan, respectively, while the Yayoi component is higher in the mainland Japanese (Yamato people). The model has been well supported by genetic data, but the Yamato population was mostly represented by people from Tokyo area. We generated new genome-wide SNP data using Japonica Array for 45 individuals in Izumo City of Shimane Prefecture and for 72 individuals in Makurazaki City of Kagoshima Prefecture in Southern Kyushu, and compared these data with those of other human populations in East Asia, including BioBank Japan data. Using principal component analysis, phylogenetic network, and f4 tests, we found that Izumo, Makurazaki, and Tohoku populations are slightly differentiated from Kanto (including Tokyo), Tokai, and Kinki regions. These results suggest the substructure within Mainland Japanese maybe caused by multiple migration events from the Asian continent following the Jomon period, and we propose a modified version of “Dual Structure” model called the “Inner-Dual Structure” model.

Download Full-text

Genomic comparisons of Persian Kurdish, Persian Arabian and American Thoroughbred horse populations

PLoS ONE ◽

10.1371/journal.pone.0247123 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247123

Author(s):

Navid Yousefi-Mashouf ◽

Hassan Mehrabani-Yeganeh ◽

Ardeshir Nejati-Javaremi ◽

Ernest Bailey ◽

Jessica L. Petersen

Keyword(s):

Principal Component ◽

Conservation Strategies ◽

Horse Population ◽

Snp Data ◽

Thoroughbred Horses ◽

Genome Wide ◽

Distinct Cluster ◽

Shared Ancestry ◽

Genetic Clusters ◽

Cluster 2

The present research aimed to characterize the Persian Kurdish horse population relative to the Persian Arabian and American Thoroughbred populations using genome-wide SNP data. Fifty-eight Kurdish, 38 Persian Arabian and 83 Thoroughbred horses were genotyped across 670,796 markers. After quality control and pruning to eliminate linkage disequilibrium between loci which resulted in 13,554 SNPs in 52 Kurdish, 24 Persian Arabian and 58 Thoroughbred horses, the Kurdish horses were generally distinguished from the Persian Arabian samples by Principal Component Analyses, cluster analyses and calculation of pairwise FST. Both Persian breeds were discriminated from the Thoroughbred. Pairwise FST between the two Persian samples (0.013) was significantly greater than zero and several fold less than those found between the Thoroughbred and Kurdish (0.052) or Thoroughbred and Persian Arabian (0.057). Cluster analysis assuming three genetic clusters assigned the Kurdish horse and Thoroughbred to distinct clusters (0.942 in cluster 2 and 0.953 in cluster 3 respectively); the Persian Arabian was not in a distinct cluster (0.519 in cluster 1), demonstrating shared ancestry or recent admixture with the Kurdish breed. Diversity as quantified by expected heterozygosity was the highest in the Kurdish horse (0.342), followed by the Persian Arabian (0.328) and the Thoroughbred (0.326). Analysis of Molecular Variance showed that 4.47% of the genetic variation was present among populations (P<0.001). Population-specific inbreeding indices (FIS) were not significantly different from zero in any of the populations. Analysis of individual inbreeding based on runs of homozygosity using a larger SNP set suggested greater diversity in both the Kurdish and Persian Arabian than in the Thoroughbred. These results have implications for developing conservation strategies to achieve sound breeding goals while maintaining genetic diversity.

Download Full-text

Fast Principal Component Analysis of Large-Scale Genome-Wide Data

10.1101/002238 ◽

2014 ◽

Cited By ~ 2

Author(s):

Gad Abraham ◽

Michael Inouye

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Single Nucleotide ◽

Snp Data ◽

Genome Wide ◽

Genome Wide Data ◽

Eigen Decomposition ◽

Traditional Approaches

Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.

Download Full-text

Systematic pathway-analysis of kinesin protein family (KIF) using genome-wide SNP data in patients with myocardial infarction: Genetic variation in KIFC3 gene associates with myocardial infarction

The Thoracic and Cardiovascular Surgeon ◽

10.1055/s-0029-1191643 ◽

2009 ◽

Vol 56 (S 01) ◽

Author(s):

S Eifert ◽

A Goetz ◽

P Linsel-Nitschke ◽

A Medack ◽

C Hengstenberg ◽

...

Keyword(s):

Myocardial Infarction ◽

Genetic Variation ◽

Pathway Analysis ◽

Protein Family ◽

Snp Data ◽

Genome Wide

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

Genome-wide SNPs redefines species boundaries and conservation units in the freshwater mussel genus Cyprogenia of North America

Scientific Reports ◽

10.1038/s41598-021-90325-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kyung Seok Kim ◽

Kevin J. Roe

Keyword(s):

Phylogenetic Analyses ◽

Freshwater Mussel ◽

Conservation Strategies ◽

Nucleotide Polymorphisms ◽

Conservation Units ◽

Genetic Structuring ◽

The North ◽

Snp Data ◽

Genome Wide ◽

Significant Difference

AbstractDetailed information on species delineation and population genetic structure is a prerequisite for designing effective restoration and conservation strategies for imperiled organisms. Phylogenomic and population genomic analyses based on genome-wide double digest restriction-site associated DNA sequencing (ddRAD-Seq) data has identified three allopatric lineages in the North American freshwater mussel genus Cyprogenia. Cyprogenia stegaria is restricted to the Eastern Highlands and displays little genetic structuring within this region. However, two allopatric lineages of C. aberti in the Ozark and Ouachita highlands exhibit substantial levels (mean uncorrected FST = 0.368) of genetic differentiation and each warrants recognition as a distinct evolutionary lineage. Lineages of Cyprogenia in the Ouachita and Ozark highlands are further subdivided reflecting structuring at the level of river systems. Species tree inference and species delimitation in a Bayesian framework using single nucleotide polymorphisms (SNP) data supported results from phylogenetic analyses, and supports three species of Cyprogenia over the currently recognized two species. A comparison of SNPs generated from both destructively and non-destructively collected samples revealed no significant difference in the SNP error rate, quality and amount of ddRAD sequence reads, indicating that nondestructive or trace samples can be effectively utilized to generate SNP data for organisms for which destructive sampling is not permitted.

Download Full-text

Genome-wide SNP data unravel the ancestry and signatures of divergent selection in Ghurrah pigs of India

Livestock Science ◽

10.1016/j.livsci.2021.104587 ◽

2021 ◽

pp. 104587

Author(s):

Arnav Mehrotra ◽

Bharat Bhushan ◽

Karthikeyan A ◽

Akansha Singh ◽

Snehasmita Panda ◽

...

Keyword(s):

Divergent Selection ◽

Snp Data ◽

Genome Wide

Download Full-text

Genetic variability and genome-wide association analysis of flavor and texture in cooked beans (Phaseolus vulgaris L.)

Theoretical and Applied Genetics ◽

10.1007/s00122-020-03745-3 ◽

2021 ◽

Author(s):

Amber Bassett ◽

Kelvin Kamfwa ◽

Daniel Ambachew ◽

Karen Cichy

Keyword(s):

Genetic Variability ◽

Seed Coat ◽

Principal Component ◽

Breeding Programs ◽

Flavor Intensity ◽

Genome Wide ◽

White Bean ◽

The Usa ◽

New Varieties ◽

End Use

Abstract Key message Cooked bean flavor and texture vary within and across 20 Andean seed types; SNPs are significantly associated with total flavor, beany, earthy, starchy, bitter, seed-coat perception, and cotyledon texture. Abstract Common dry beans are a nutritious food recognized as a staple globally, but their consumption is low in the USA. Improving bean flavor and texture through breeding has the potential to improve consumer acceptance and suitability for new end-use products. Little is known about genetic variability and inheritance of bean sensory characteristics. A total of 430 genotypes of the Andean Diversity Panel representing twenty seed types were grown in three locations, and cooked seeds were evaluated by a trained sensory panel for flavor and texture attribute intensities, including total flavor, beany, vegetative, earthy, starchy, sweet, bitter, seed-coat perception, and cotyledon texture. Extensive variation in sensory attributes was found across and within seed types. A set of genotypes was identified that exhibit extreme attribute intensities generally stable across all three environments. seed-coat perception and total flavor intensity had the highest broad-sense heritability (0.39 and 0.38, respectively), while earthy and vegetative intensities exhibited the lowest (0.14 and 0.15, respectively). Starchy and sweet flavors were positively correlated and highest in white bean genotypes according to principal component analysis. SNPs associated with total flavor intensity (six SNPs across three chromosomes), beany (five SNPs across four chromosomes), earthy (three SNPs across two chromosomes), starchy (one SNP), bitter (one SNP), seed-coat perception (three SNPs across two chromosomes), and cotyledon texture (two SNPs across two chromosomes) were detected. These findings lay a foundation for incorporating flavor and texture in breeding programs for the development of new varieties that entice growers, consumers, and product developers alike.

Download Full-text

On the origin and diversification of Podolian cattle breeds: testing scenarios of European colonization using genome-wide SNP data

Genetics Selection Evolution ◽

10.1186/s12711-021-00639-w ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Gabriele Senczuk ◽

Salvatore Mastrangelo ◽

Paolo Ajmone-Marsan ◽

Zsolt Becskei ◽

Paolo Colangelo ◽

...

Keyword(s):

Genetic Diversity ◽

Diversity Indices ◽

Melting Pot ◽

Cattle Breeds ◽

Snp Data ◽

Early Migration ◽

Genome Wide ◽

European Colonization ◽

Approximate Bayesian ◽

European Cattle

Abstract Background During the Neolithic expansion, cattle accompanied humans and spread from their domestication centres to colonize the ancient world. In addition, European cattle occasionally intermingled with both indicine cattle and local aurochs resulting in an exclusive pattern of genetic diversity. Among the most ancient European cattle are breeds that belong to the so-called Podolian trunk, the history of which is still not well established. Here, we used genome-wide single nucleotide polymorphism (SNP) data on 806 individuals belonging to 36 breeds to reconstruct the origin and diversification of Podolian cattle and to provide a reliable scenario of the European colonization, through an approximate Bayesian computation random forest (ABC-RF) approach. Results Our results indicate that European Podolian cattle display higher values of genetic diversity indices than both African taurine and Asian indicine breeds. Clustering analyses show that Podolian breeds share close genomic relationships, which suggests a likely common genetic ancestry. Among the simulated and tested scenarios of the colonization of Europe from taurine cattle, the greatest support was obtained for the model assuming at least two waves of diffusion. Time estimates are in line with an early migration from the domestication centre of non-Podolian taurine breeds followed by a secondary migration of Podolian breeds. The best fitting model also suggests that the Italian Podolian breeds are the result of admixture between different genomic pools. Conclusions This comprehensive dataset that includes most of the autochthonous cattle breeds belonging to the so-called Podolian trunk allowed us not only to shed light onto the origin and diversification of this group of cattle, but also to gain new insights into the diffusion of European cattle. The most well-supported scenario of colonization points to two main waves of migrations: with one that occurred alongside with the Neolithic human expansion and gave rise to the non-Podolian taurine breeds, and a more recent one that favoured the diffusion of European Podolian. In this process, we highlight the importance of both the Mediterranean and Danube routes in promoting European cattle colonization. Moreover, we identified admixture as a driver of diversification in Italy, which could represent a melting pot for Podolian cattle.

Download Full-text

Czechoslovakian Wolfdog Genomic Divergence from Its Ancestors Canis lupus, German Shepherd Dog, and Different Sheepdogs of European Origin

Genes ◽

10.3390/genes12060832 ◽

2021 ◽

Vol 12 (6) ◽

pp. 832

Author(s):

Nina Moravčíková ◽

Radovan Kasarda ◽

Radoslav Židek ◽

Luboš Vostrý ◽

Hana Vostrá-Vydrová ◽

...

Keyword(s):

Demographic History ◽

Genome Structure ◽

Principal Component ◽

Genetic Distances ◽

Nucleotide Polymorphisms ◽

German Shepherd ◽

German Shepherd Dog ◽

Genome Wide ◽

Scale Population ◽

History Effect

This study focused on the genomic differences between the Czechoslovakian wolfdog (CWD) and its ancestors, the Grey wolf (GW) and German Shepherd dog. The Saarloos wolfdog and Belgian Shepherd dog were also included to study the level of GW genetics retained in the genome of domesticated breeds. The dataset consisted of 131 animals and 143,593 single nucleotide polymorphisms (SNPs). The effects of demographic history on the overall genome structure were determined by screening the distribution of the homozygous segments. The genetic variance distributed within and between groups was quantified by genetic distances, the FST index, and discriminant analysis of principal components. Fine-scale population stratification due to specific morphological and behavioural traits was assessed by principal component and factorial analyses. In the CWD, a demographic history effect was manifested mainly in a high genome-wide proportion of short homozygous segments corresponding to a historical load of inbreeding derived from founders. The observed proportion of long homozygous segments indicated that the inbreeding events shaped the CWD genome relatively recently compared to other groups. Even if there was a significant increase in genetic similarity among wolf-like breeds, they were genetically separated from each other. Moreover, this study showed that the CWD genome carries private alleles that are not found in either wolves or other dog breeds analysed in this study.

Download Full-text