scholarly journals Genomic Landscape of Mutational Biases in the Pacific Oyster Crassostrea gigas

2020 ◽  
Vol 12 (11) ◽  
pp. 1943-1952
Author(s):  
Kai Song

Abstract Mutation is a driving force of evolution that has been shaped by natural selection and is universally biased. Previous studies determined genome-wide mutational patterns for several species and investigated the heterogeneity of mutational patterns at fine-scale levels. However, little evidence of the heterogeneity of mutation rates over large genomic regions was shown. Hence, the mutational patterns of different large-scale genomic regions and their association with selective pressures still need to be explored. As the second most species-rich animal phylum, little is known about the mutational patterns in Mollusca, especially oysters. In this study, the mutational bias patterns are characterized by using whole-genome resequencing data in the Crassostrea gigas genome. I studied the genome-wide relative rates of the pair mutations and found that the predominant mutation is GC -> AT, irrespective of the genomic regions. This analysis reveals that mutational biases were associated with gene expression levels across the C. gigas genome. Genes with higher expression levels and breadth expression patterns, longer coding length, and more exon numbers had relatively higher GC -> AT rates. I also found that genes with larger dN/dS values had relatively higher GC -> AT rates. This work represents the first comprehensive research on the mutational biases in Mollusca species. Here, I comprehensively investigated the relationships between mutational biases with some intrinsic genetic factors and evolutionary indicators and proposed that selective pressures are important forces shaping the mutational biases across the C. gigas genome.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruifeng Cui ◽  
Xiaoge Wang ◽  
Waqar Afzal Malik ◽  
Xuke Lu ◽  
Xiugui Chen ◽  
...  

Abstract Background The Raffinose synthetase (RAFS) genes superfamily is critical for the synthesis of raffinose, which accumulates in plant leaves under abiotic stress. However, it remains unclear whether RAFS contributes to resistance to abiotic stress in plants, specifically in the Gossypium species. Results In this study, we identified 74 RAFS genes from G. hirsutum, G. barbadense, G. arboreum and G. raimondii by using a series of bioinformatic methods. Phylogenetic analysis showed that the RAFS gene family in the four Gossypium species could be divided into four major clades; the relatively uniform distribution of the gene number in each species ranged from 12 to 25 based on species ploidy, most likely resulting from an ancient whole-genome polyploidization. Gene motif analysis showed that the RAFS gene structure was relatively conservative. Promoter analysis for cis-regulatory elements showed that some RAFS genes might be regulated by gibberellins and abscisic acid, which might influence their expression levels. Moreover, we further examined the functions of RAFS under cold, heat, salt and drought stress conditions, based on the expression profile and co-expression network of RAFS genes in Gossypium species. Transcriptome analysis suggested that RAFS genes in clade III are highly expressed in organs such as seed, root, cotyledon, ovule and fiber, and under abiotic stress in particular, indicating the involvement of genes belonging to clade III in resistance to abiotic stress. Gene co-expressed network analysis showed that GhRFS2A-GhRFS6A, GhRFS6D, GhRFS7D and GhRFS8A-GhRFS11A were key genes, with high expression levels under salt, drought, cold and heat stress. Conclusion The findings may provide insights into the evolutionary relationships and expression patterns of RAFS genes in Gossypium species and a theoretical basis for the identification of stress resistance materials in cotton.


2019 ◽  
Vol 48 (D1) ◽  
pp. D659-D667 ◽  
Author(s):  
Wenqian Yang ◽  
Yanbo Yang ◽  
Cecheng Zhao ◽  
Kun Yang ◽  
Dongyang Wang ◽  
...  

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.


mBio ◽  
2020 ◽  
Vol 11 (4) ◽  
Author(s):  
José Luis López ◽  
Mauricio Javier Lozano ◽  
María Laura Fabre ◽  
Antonio Lagares

ABSTRACT Prokaryote genomes exhibit a wide range of GC contents and codon usages, both resulting from an interaction between mutational bias and natural selection. In order to investigate the basis underlying specific codon changes, we performed a comprehensive analysis of 29 different prokaryote families. The analysis of core gene sets with increasing ancestries in each family lineage revealed that the codon usages became progressively more adapted to the tRNA pools. While, as previously reported, highly expressed genes presented the most optimized codon usage, the singletons contained the less selectively favored codons. The results showed that usually codons with the highest translational adaptation were preferentially enriched. In agreement with previous reports, a C bias in 2- to 3-fold pyrimidine-ending codons, and a U bias in 4-fold codons occurred in all families, irrespective of the global genomic GC content. Furthermore, the U biases suggested that U3-mRNA–U34-tRNA interactions were responsible for a prominent codon optimization in both the most ancestral core and the highly expressed genes. A comparative analysis of sequences that encode conserved (cr) or variable (vr) translated products, with each one being under high (HEP) and low (LEP) expression levels, demonstrated that the efficiency was more relevant (by a factor of 2) than accuracy to modeling codon usage. Finally, analysis of the third position of codons (GC3) revealed that in genomes with global GC contents higher than 35 to 40%, selection favored a GC3 increase, whereas in genomes with very low GC contents, a decrease in GC3 occurred. A comprehensive final model is presented in which all patterns of codon usage variations are condensed in four distinct behavioral groups. IMPORTANCE The prokaryotic genomes—the current heritage of the most ancient life forms on earth—are comprised of diverse gene sets, all characterized by varied origins, ancestries, and spatial-temporal expression patterns. Such genetic diversity has for a long time raised the question of how cells shape their coding strategies to optimize protein demands (i.e., product abundance) and accuracy (i.e., translation fidelity) through the use of the same genetic code in genomes with GC contents that range from less than 20 to more than 80%. Here, we present evidence on how codon usage is adjusted in the prokaryotic tree of life and on how specific biases have operated to improve translation. Through the use of proteome data, we characterized conserved and variable sequence domains in genes of either high or low expression level and quantitated the relative weight of efficiency and accuracy—as well as their interaction—in shaping codon usage in prokaryotes.


2018 ◽  
Vol 116 (3) ◽  
pp. 900-908 ◽  
Author(s):  
Hamutal Arbel ◽  
Sumanta Basu ◽  
William W. Fisher ◽  
Ann S. Hammonds ◽  
Kenneth H. Wan ◽  
...  

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.


2021 ◽  
Author(s):  
Yang Yang ◽  
Amo Aduragbemi ◽  
Di Wei ◽  
Yongmao Chai ◽  
Jie Zheng ◽  
...  

Abstract Improving yield and yield-related traits are key goals in wheat breeding program. The integration of accumulated wheat genetic resources provides an opportunity to uncover important genomic regions and candidate genes that affect wheat yield. Here, a comprehensive Meta-QTL analysis was conducted on 2230 QTLs of yield-related traits obtained from 119 QTL studies. These QTLs were refined into 145 Meta-QTLs (MQTLs), and 89 MQTLs were verified by GWAS with different natural populations. The average confidence interval (CI) of these MQTLs was 2.92 times less than that of the initial QTLs. Furthermore, 76 core MQTL regions with a physical distance less than 25 Mb were detected. Based on the homology analysis and expression patterns, 237 candidate genes in the MQTLs involved in photoperiod response, grain development, multiple plant growth regulator pathways, carbon and nitrogen metabolism, and spike and flower organ development were determined. A novel candidate gene TaKAO-4A was confirmed to be significantly associated with grain size, and a CAPS marker was developed based on its dominant haplotype. In summary, this study clarified a method based on the integration of Meta-QTL, GWAS and homology comparison to reveal the genomic regions and candidate genes that affect important yield-related traits in wheat. This work will help to lay a foundation for the identification, transfer and aggregation of these important QTLs or candidate genes in wheat high-yield breeding.


2019 ◽  
Vol 21 (4) ◽  
pp. 407-416 ◽  

Schizophrenia is a debilitating psychiatric disorder with a complex genetic architecture and limited understanding of its neuropathology, reflected by the lack of diagnostic measures and effective pharmacological treatments. Geneticists have recently identified more than 145 risk loci comprising hundreds of common variants of small effect sizes, most of which lie in noncoding genomic regions. This review will discuss how the epigenetic toolbox can be applied to contextualize genetic findings in schizophrenia. Progress in next-generation sequencing, along with increasing methodological complexity, has led to the compilation of genome-wide maps of DNA methylation, histone modifications, DNA expression, and more. Integration of chromatin conformation datasets is one of the latest efforts in deciphering schizophrenia risk, allowing the identification of genes in contact with regulatory variants across 100s of kilobases. Large-scale multiomics studies will facilitate the prioritization of putative causal risk variants and gene networks that contribute to schizophrenia etiology, informing clinical diagnostics and treatment downstream.


2018 ◽  
Author(s):  
Akdes Serin Harmancı ◽  
Arif O. Harmanci ◽  
Xiaobo Zhou

AbstractRNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNV). Here, we propose CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER performs smoothing of the genome-wide RNA sequencing signal profiles in different multiscale resolutions, identifying CNV events at different length scales. CaSpER also employs a novel methodology for generation of genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it in multiscale fashion for correction of CNV calls. The shift in allelic signal is used to quantify the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER uses Hidden Markov Models (HMM) to assign copy number states to regions. The multiscale nature of CaSpER enables comprehensive analysis of focal and large-scale CNVs and LOH segments. CaSpER performs well in accuracy compared to gold standard SNP genotyping arrays. In particular, analysis of single cell Glioblastoma (GBM) RNA sequencing data with CaSpER reveals novel mutually exclusive and co-occurring CNV sub-clones at different length scales. Moreover, CaSpER discovers gene expression signatures of CNV sub-clones, performs gene ontology (GO) enrichment analysis and identifies potential therapeutic targets for the sub-clones. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data, especially in cancer research.


2020 ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Current attempts to detect cancer predisposition genomic regions are typically based on small-scale familial studies or genome-wide association studies (GWAS) over dedicated case-control cohorts. In this study, we utilized the UK Biobank as a large-scale prospective cohort to conduct a comprehensive analysis of cancer predisposition using both GWAS and proteome-wide association study (PWAS), a method that highlights genetic associations mediated by functional alterations to protein-coding genes. We discovered 137 unique genomic loci implicated with cancer risk in the white British population across nine cancer types and pan-cancer. While most of these genomic regions are supported by external evidence, our results highlight novel loci as well. We performed a comparative analysis of cancer predisposition between cancer types, finding that most of the implicated regions are cancer-type specific. We further analyzed the role of recessive genetic effects in cancer predisposition. We found that 30 of the 137 cancer regions were recovered only by a recessive model, highlighting the importance of recessive inheritance outside of familial studies. Finally, we show that many of the cancer associations exert substantial cancer risk in the studied cohort, suggesting their clinical relevance.


2021 ◽  
Vol 22 (24) ◽  
pp. 13568
Author(s):  
Zhengfu Yang ◽  
Hongmiao Jin ◽  
Junhao Chen ◽  
Caiyun Li ◽  
Jiani Wang ◽  
...  

The AP2 transcriptional factors (TFs) belong to the APETALA2/ ethylene-responsive factor (AP2/ERF) superfamily and regulate various biological processes of plant growth and development, as well as response to biotic and abiotic stresses. However, genome-wide research on the AP2 subfamily TFs in the pecan (Carya illinoinensis) is rarely reported. In this paper, we identify 30 AP2 subfamily genes from pecans through a genome-wide search, and they were unevenly distributed on the pecan chromosomes. Then, a phylogenetic tree, gene structure and conserved motifs were further analyzed. The 30 AP2 genes were divided into euAP2, euANT and basalANT three clades. Moreover, the cis-acting elements analysis showed many light responsive elements, plant hormone-responsive elements and abiotic stress responsive elements are found in CiAP2 promoters. Furthermore, a qPCR analysis showed that genes clustered together usually shared similar expression patterns in euAP2 and basalANT clades, while the expression pattern in the euANT clade varied greatly. In developing pecan fruits, CiAP2-5, CiANT1 and CiANT2 shared similar expression patterns, and their expression levels decreased with fruit development. CiANT5 displayed the highest expression levels in developing fruits. The subcellular localization and transcriptional activation activity assay demonstrated that CiANT5 is located in the nucleus and functions as a transcription factor with transcriptional activation activity. These results help to comprehensively understand the pecan AP2 subfamily TFs and lay the foundation for further functional research on pecan AP2 family genes.


Sign in / Sign up

Export Citation Format

Share Document