scholarly journals An approach using ddRADseq and machine learning for understanding speciation in Antarctic Antarctophilinidae gastropods

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Juan Moles ◽  
Shahan Derkarabetian ◽  
Stefano Schiaparelli ◽  
Michael Schrödl ◽  
Jesús S. Troncoso ◽  
...  

AbstractSampling impediments and paucity of suitable material for molecular analyses have precluded the study of speciation and radiation of deep-sea species in Antarctica. We analyzed barcodes together with genome-wide single nucleotide polymorphisms obtained from double digestion restriction site-associated DNA sequencing (ddRADseq) for species in the family Antarctophilinidae. We also reevaluated the fossil record associated with this taxon to provide further insights into the origin of the group. Novel approaches to identify distinctive genetic lineages, including unsupervised machine learning variational autoencoder plots, were used to establish species hypothesis frameworks. In this sense, three undescribed species and a complex of cryptic species were identified, suggesting allopatric speciation connected to geographic or bathymetric isolation. We further observed that the shallow waters around the Scotia Arc and on the continental shelf in the Weddell Sea present high endemism and diversity. In contrast, likely due to the glacial pressure during the Cenozoic, a deep-sea group with fewer species emerged expanding over great areas in the South-Atlantic Antarctic Ridge. Our study agrees on how diachronic paleoclimatic and current environmental factors shaped Antarctic communities both at the shallow and deep-sea levels, promoting Antarctica as the center of origin for numerous taxa such as gastropod mollusks.

2021 ◽  
Author(s):  
F. Gözde Çilingir ◽  
Dennis Hansen ◽  
Arpat Ozgul ◽  
Christine Grossen

Abstract The Aldabra giant tortoise ( Aldabrachelys gigantea ) is one of only two remaining giant tortoise species worldwide. Captive-bred A. gigantea are being used in rewilding projects in the Western Indian Ocean to functionally replace the extinct endemic giant tortoise species and restore degraded island ecosystems. Furthermore, large-scale translocations may become necessary as rising sea levels threaten the only wild population on the low-lying Aldabra Atoll. Critical management decisions would be greatly facilitated by insights on the genetic structure of breeding populations. We used a double-digest restriction-associated DNA sequencing (ddRAD-seq) approach to identify single nucleotide polymorphisms (SNP) among the wild population and two additional captive populations of A. gigantea . A total of 149 unlinked, putatively neutral genome-wide SNPs were identified. The values of expected heterozygosity ranged from 0.32 to 0.5, whereas the minor allele frequency ranged from 0.20 to 0.5. These novel SNP markers will serve as useful tools for informing the conservation of A. gigantea .


Author(s):  
Shanwen Sun ◽  
Benzhi Dong ◽  
Quan Zou

Abstract Over the last decade, genome-wide association studies (GWAS) have discovered thousands of genetic variants underlying complex human diseases and agriculturally important traits. These findings have been utilized to dissect the biological basis of diseases, to develop new drugs, to advance precision medicine and to boost breeding. However, the potential of GWAS is still underexploited due to methodological limitations. Many challenges have emerged, including detecting epistasis and single-nucleotide polymorphisms (SNPs) with small effects and distinguishing causal variants from other SNPs associated through linkage disequilibrium. These issues have motivated advancements in GWAS analyses in two contrasting cultures—statistical modelling and machine learning. In this review, we systematically present the basic concepts and the benefits and limitations in both methods. We further discuss recent efforts to mitigate their weaknesses. Additionally, we summarize the state-of-the-art tools for detecting the missed signals, ultrarare mutations and gene–gene interactions and for prioritizing SNPs. Our work can offer both theoretical and practical guidelines for performing GWAS analyses and for developing further new robust methods to fully exploit the potential of GWAS.


2020 ◽  
Vol 4 (5) ◽  
Author(s):  
Sangkyu Lee ◽  
Joseph O Deasy ◽  
Jung Hun Oh ◽  
Antonio Di Meglio ◽  
Agnes Dumas ◽  
...  

Abstract Background We aimed at predicting fatigue after breast cancer treatment using machine learning on clinical covariates and germline genome-wide data. Methods We accessed germline genome-wide data of 2799 early-stage breast cancer patients from the Cancer Toxicity study (NCT01993498). The primary endpoint was defined as scoring zero at diagnosis and higher than quartile 3 at 1 year after primary treatment completion on European Organization for Research and Treatment of Cancer quality-of-life questionnaires for Overall Fatigue and on the multidimensional questionnaire for Physical, Emotional, and Cognitive fatigue. First, we tested univariate associations of each endpoint with clinical variables and genome-wide variants. Then, using preselected clinical (false discovery rate < 0.05) and genomic (P < .001) variables, a multivariable preconditioned random-forest regression model was built and validated on a hold-out subset to predict fatigue. Gene set enrichment analysis identified key biological correlates (MetaCore). All statistical tests were 2-sided. Results Statistically significant clinical associations were found only with Emotional and Cognitive Fatigue, including receipt of chemotherapy, anxiety, and pain. Some single nucleotide polymorphisms had some degree of association (P < .001) with the different fatigue endpoints, although there were no genome-wide statistically significant (P < 5.00 × 10−8) associations. Only for Cognitive Fatigue, the predictive ability of the genomic multivariable model was statistically significantly better than random (area under the curve = 0.59, P = .01) and marginally improved with clinical variables (area under the curve = 0.60, P = .005). Single nucleotide polymorphisms found to be associated (P < .001) with Cognitive Fatigue belonged to genes linked to inflammation (false discovery rate adjusted P = .03), cognitive disorders (P = 1.51 × 10−12), and synaptic transmission (P = 6.28 × 10−8). Conclusions Genomic analyses in this large cohort of breast cancer survivors suggest a possible genetic role for severe Cognitive Fatigue that warrants further exploration.


2020 ◽  
Vol 11 ◽  
Author(s):  
Waldiodio Seck ◽  
Davoud Torkamaneh ◽  
François Belzile

Increasing the understanding genetic basis of the variability in root system architecture (RSA) is essential to improve resource-use efficiency in agriculture systems and to develop climate-resilient crop cultivars. Roots being underground, their direct observation and detailed characterization are challenging. Here, were characterized twelve RSA-related traits in a panel of 137 early maturing soybean lines (Canadian soybean core collection) using rhizoboxes and two-dimensional imaging. Significant phenotypic variation (P < 0.001) was observed among these lines for different RSA-related traits. This panel was genotyped with 2.18 million genome-wide single-nucleotide polymorphisms (SNPs) using a combination of genotyping-by-sequencing and whole-genome sequencing. A total of 10 quantitative trait locus (QTL) regions were detected for root total length and primary root diameter through a comprehensive genome-wide association study. These QTL regions explained from 15 to 25% of the phenotypic variation and contained two putative candidate genes with homology to genes previously reported to play a role in RSA in other species. These genes can serve to accelerate future efforts aimed to dissect genetic architecture of RSA and breed more resilient varieties.


2014 ◽  
Vol 17 (4) ◽  
Author(s):  
Raymond K. Walters ◽  
Charles Laurin ◽  
Gitta H. Lubke

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 686
Author(s):  
Alireza Nazarian ◽  
Alexander M. Kulminski

Almost all complex disorders have manifested epidemiological and clinical sex disparities which might partially arise from sex-specific genetic mechanisms. Addressing such differences can be important from a precision medicine perspective which aims to make medical interventions more personalized and effective. We investigated sex-specific genetic associations with colorectal (CRCa) and lung (LCa) cancers using genome-wide single-nucleotide polymorphisms (SNPs) data from three independent datasets. The genome-wide association analyses revealed that 33 SNPs were associated with CRCa/LCa at P < 5.0 × 10−6 neither males or females. Of these, 26 SNPs had sex-specific effects as their effect sizes were statistically different between the two sexes at a Bonferroni-adjusted significance level of 0.0015. None had proxy SNPs within their ±1 Mb regions and the closest genes to 32 SNPs were not previously associated with the corresponding cancers. The pathway enrichment analyses demonstrated the associations of 35 pathways with CRCa or LCa which were mostly implicated in immune system responses, cell cycle, and chromosome stability. The significant pathways were mostly enriched in either males or females. Our findings provided novel insights into the potential sex-specific genetic heterogeneity of CRCa and LCa at SNP and pathway levels.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kyung Seok Kim ◽  
Kevin J. Roe

AbstractDetailed information on species delineation and population genetic structure is a prerequisite for designing effective restoration and conservation strategies for imperiled organisms. Phylogenomic and population genomic analyses based on genome-wide double digest restriction-site associated DNA sequencing (ddRAD-Seq) data has identified three allopatric lineages in the North American freshwater mussel genus Cyprogenia. Cyprogenia stegaria is restricted to the Eastern Highlands and displays little genetic structuring within this region. However, two allopatric lineages of C. aberti in the Ozark and Ouachita highlands exhibit substantial levels (mean uncorrected FST = 0.368) of genetic differentiation and each warrants recognition as a distinct evolutionary lineage. Lineages of Cyprogenia in the Ouachita and Ozark highlands are further subdivided reflecting structuring at the level of river systems. Species tree inference and species delimitation in a Bayesian framework using single nucleotide polymorphisms (SNP) data supported results from phylogenetic analyses, and supports three species of Cyprogenia over the currently recognized two species. A comparison of SNPs generated from both destructively and non-destructively collected samples revealed no significant difference in the SNP error rate, quality and amount of ddRAD sequence reads, indicating that nondestructive or trace samples can be effectively utilized to generate SNP data for organisms for which destructive sampling is not permitted.


Sign in / Sign up

Export Citation Format

Share Document