scholarly journals Exploration of the Germline Genome of the Ciliate Chilodonella uncinata through Single-Cell Omics (Transcriptomics and Genomics)

mBio ◽  
2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Xyrus X. Maurer-Alcalá ◽  
Rob Knight ◽  
Laura A. Katz

ABSTRACTSeparate germline and somatic genomes are found in numerous lineages across the eukaryotic tree of life, often separated into distinct tissues (e.g., in plants, animals, and fungi) or distinct nuclei sharing a common cytoplasm (e.g., in ciliates and some foraminifera). In ciliates, germline-limited (i.e., micronuclear-specific) DNA is eliminated during the development of a new somatic (i.e., macronuclear) genome in a process that is tightly linked to large-scale genome rearrangements, such as deletions and reordering of protein-coding sequences. Most studies of germline genome architecture in ciliates have focused on the model ciliatesOxytricha trifallax,Paramecium tetraurelia, andTetrahymena thermophila, for which the complete germline genome sequences are known. Outside of these model taxa, only a few dozen germline loci have been characterized from a limited number of cultivable species, which is likely due to difficulties in obtaining sufficient quantities of “purified” germline DNA in these taxa. Combining single-cell transcriptomics and genomics, we have overcome these limitations and provide the first insights into the structure of the germline genome of the ciliateChilodonella uncinata, a member of the understudied classPhyllopharyngea. Our analyses reveal the following: (i) large gene families contain a disproportionate number of genes from scrambled germline loci; (ii) germline-soma boundaries in the germline genome are demarcated by substantial shifts in GC content; (iii) single-cell omics techniques provide large-scale quality germline genome data with limited effort, at least for ciliates with extensively fragmented somatic genomes. Our approach provides an efficient means to understand better the evolution of genome rearrangements between germline and soma in ciliates.IMPORTANCEOur understanding of the distinctions between germline and somatic genomes in ciliates has largely relied on studies of a few model genera (e.g.,Oxytricha,Paramecium,Tetrahymena). We have used single-cell omics to explore germline-soma distinctions in the ciliateChilodonella uncinata, which likely diverged from the better-studied ciliates ~700 million years ago. The analyses presented here indicate that developmentally regulated genome rearrangements between germline and soma are demarcated by rapid transitions in local GC composition and lead to diversification of protein families. The approaches used here provide the basis for future work aimed at discerning the evolutionary impacts of germline-soma distinctions among diverse ciliates.

2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Paula Moolhuijzen ◽  
Pao Theen See ◽  
Caroline S. Moffat

Abstract Objectives The necrotrophic fungal pathogen Pyrenophora tritici-repentis (Ptr) is the causal agent of tan spot a major disease of wheat. We have generated a new genome resource for an Australian Ptr race 1 isolate V1 to support comparative ‘omics analyses. In particular, the V1 PacBio Biosciences long-read sequence assembly was generated to confirm the stability of large-scale genome rearrangements of the Australian race 1 isolate M4 when compared to the North American race 1 isolate Pt-1C-BFP. Results Over 1.3 million reads were sequenced by PacBio Sequel small-molecule real-time sequencing (SRMT) cell to yield 11.4 Gb for the genome assembly of V1 (285X coverage), with median and maximum read lengths of 8959 bp and 72,292 bp respectively. The V1 genome was assembled into 33 contiguous sequences with a of total length 40.4 Mb and GC content of 50.44%. A total of 14,050 protein coding genes were predicted and annotated for V1. Of these 11,519 genes were orthologous to both Pt-1C-BFP and M4. Whole genome alignment of the Australian long-read assemblies (V1 to M4) confirmed previously identified large-scale genome rearrangements between M4 and Pt-1C-BFP and presented small scale variations, which included a sequence break within a race-specific region for ToxA, a well-known necrotrophic effector gene.


2018 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

ABSTRACTThe significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Here we first characterize lncRNAs by contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between ORF (open reading frame) length and GC content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.


Author(s):  
Diego P. Rubert ◽  
Daniel Doerr ◽  
Marília D. V. Braga

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.


Author(s):  
Yun-Xia Luan ◽  
Yingying Cui ◽  
Wan-Jun Chen ◽  
Jianfeng Jin ◽  
Ai-Min Liu ◽  
...  

The collembolan Folsomia candida Willem, 1902, is an important representative soil arthropod that is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal “standard” because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we present two high-quality chromosome-level genomes of F. candida, for the parthenogenetic Danish strain (FCDK, 219.08 Mb, N50 of 38.47 Mb, 25,139 protein-coding genes) and the sexual Shanghai strain (FCSH, 153.09 Mb, N50 of 25.75 Mb, 21,609 protein-coding genes). The seven chromosomes of FCDK are each 25–54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to its broader environmental adaptation. In addition, the overall sequence identity of the two mitogenomes is only 78.2%, and FCDK has fewer strain-specific microRNAs than FCSH. In conclusion, FCDK and FCSH have accumulated independent genetic changes and evolved into distinct species since diverging 10 Mya. Our work shows that F. candida represents a good model of rapidly cryptic speciation. Moreover, it provides important genomic resources for studying the mechanisms of species differentiation, soil arthropod adaptation to soil ecosystems, and Wolbachia-induced parthenogenesis as well as the evolution of Collembola, a pivotal phylogenetic clade between Crustacea and Insecta.


Author(s):  
Yun Hu ◽  
Liang Lu ◽  
Tao Zhou ◽  
Kishor Kumar Sarker ◽  
Junman Huang ◽  
...  

Abstract Rhinogobius similis is distributed in East and Southeast Asia. It is an amphidromous species found mostly in freshwater and sometimes brackish waters. We have obtained a high-resolution assembly of the R. similis genome using nanopore sequencing, high throughput chromosome conformation capture (Hi-C) and transcriptomic data. The assembled genome was 890.10 Mb in size and 40.15% in GC content. Including 1,373 contigs with contig N50 is 1.54 Mb, and scaffold N50 is 41.51 Mb. All of the 1,373 contigs were anchored on 22 pairs of chromosomes. The BUSCO evaluation score was 93.02% indicating high quality of genome assembly. The repeat sequences accounted for 34.92% of the whole genome, with Retroelements (30.13%), DNA transposons (1.64%), simple repeats (2.34%) and etc. A total of 31,089 protein-coding genes were predicted in the genome and functionally annotated using Maker, of those genes, 26,893 (86.50%) were found in InterProScan5. There were 1,910 gene families expanded in R. similis, 1,171 gene families contracted and 170 gene families rapidly evolving. We have compared one rapidly change gene family (PF05970) commonly found in four species (Boleophthalmus pectinirostris, Neogobius melanostomus, Periophthalmus magnuspinnatus and R. similis), which was found probably related to the lifespan of those species. During 400 Ka-10 Ka, the period of the Guxiang Ice Age, the population of R. similis decreased drastically, and then increased gradually following the last interglacial period. A high-resolution genome of R. similis should be useful to study taxonomy, biogeography, comparative genomics and adaptive evolution of the most speciose freshwater goby genus, Rhinogobius.


1998 ◽  
Vol 06 (01) ◽  
pp. 49-70 ◽  
Author(s):  
Julius H. Jackson ◽  
Roy George ◽  
Hezekiah O. Adeyemi ◽  
Michael A. Winrow ◽  
Patricia A. Herring ◽  
...  

A Fourier Transform of Equal Symbols (FTES) was applied as a spectral density analysis method to identify DNA bases that repeat at any frequency in selected protein-coding genes. The analysis especially focused on identification of bases responsible for the dominant signal at frequency f=1/3 found in all protein-coding genes. The study included homologous sequences from two gene families and multiple unrelated sequences from single organisms. No signal pattern or spectrum specifically characterized either gene family. However, the patterns of bases comprising the signal at f=1/3 suggested the presence of a genome-specific label for protein-coding genes from the same genome. Data suggest that three factors form the informational basis for the signal structure at f=1/3: (1) codon base positional bias; (2) codon preference; and (3) codon arrangement. Quantitative measure of the contribution of each base to the period-3 signal suggests a basis to distinguish protein-coding genes from different organisms. Application of the FTES analysis characterized genes from Escherichia coli as different from the genes from Pseudomonas aeruginosa. Preliminary analyses of genes from these and three other bacteria by artificial neural nets, using FTES parameters, support our suggestion that the period-3 informational structure contains labels for the genomic origins of protein-coding genes. FTES analysis alone or in combination with other informational measures may reveal pathways and processes of gene flow into and through natural systems of microbial cell populations.


Author(s):  
Shanshan Liu ◽  
Shiyin Feng ◽  
Yuying Huang ◽  
Wenli An ◽  
Zerui Yang ◽  
...  

Abstract Background Buddleja lindleyana Fort., which belongs to the Loganiaceae with a distribution throughout the tropics, is widely used as an ornamental plant in China. Buddleja contains several morphologically similar species, which need to be identified by molecular identification. But there is little molecular research on the genus Buddleja. Objective Using molecular biology techniques to sequence and analyze the complete chloroplast (cp) genome of B. lindleyana Methods According to next-generation sequencing to sequence the genome data, a series of bioinformatics software were used to assembly and analysis the molecular structure of cp genome of B. lindleyana. Results The complete cp genome of B. lindleyana is a circular 154,487-bp-long molecule with a GC content of 38.1%. It has a familiar quadripartite structure, including a large single-copy region (LSC; 85,489 bp), a small single-copy region (SSC; 17,898bp) and a pair of inverted repeats (IRs; 25,550 bp). A total of 133 genes were identified in the genome, including 86 protein-coding genes, 37 tRNA genes, 8 rRNA genes and 2 pseudogenes. Conclusions These results suggested that B. lindelyana cp genome could be used as a potential genomic resource to resolve the phylogenetic positions and relationships of Loganiaceae, and will offer valuable information for future research in the identification of Buddleja species and will conduce to genomic investigations of these species.


2018 ◽  
Author(s):  
Seung Gu Park ◽  
Victor Luria ◽  
Jessica A. Weber ◽  
Sungwon Jeon ◽  
Hak-Min Kim ◽  
...  

AbstractThe endangered whale shark (Rhincodon typus) is the largest fish on Earth and is a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 81 animals and yeast. We examined scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic features also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture: GC content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to large introns highly enriched in repetitive elements such as CR1-like LINEs, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark’s genome had an expansion of gene families related to fatty acid metabolism and neurogenesis, with the slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan, and showed that the whale shark is a promising model for studies of neural architecture and lifespan.


2019 ◽  
Vol 35 (17) ◽  
pp. 2949-2956 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

Abstract Motivation The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Results Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species. Availability and implementation LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 40 (4) ◽  
pp. 390-400 ◽  
Author(s):  
Werner P Veldsman ◽  
Yaqin Wang ◽  
Jiaojiao Niu ◽  
J Antonio Baeza ◽  
Ka Hou Chu

Abstract We present a full description and analysis of the complete mitochondrial genome of a Pacific Ocean specimen of the coconut crab Birgus latro (Linnaeus, 1767), the largest extant terrestrial arthropod in the world. Our de novo-assembled mitogenome has a massive 16,161 times organelle read coverage, a length of 16,411 bp, contains 22 tDNAs (20 unique), 13 protein-coding genes, two rDNAs, and a putative control region of length 1,381 bp. The control region contains three microsatellites and two pairs of inverted repeats. Contrary to the mitochondrial sentinel gene concept, two-dimensional nucleotide analysis reveals higher GC-content in cox gene families than in nadh gene families. Moreover, cox gene families are more conserved than nadh gene families among the species of Coenobitidae selected for comparison. Secondary structure prediction of the 22 tDNAs shows major deviations from the cloverleaf pattern, which points to a relatively high rate of mutation in these genes. We also present a repertoire of mitochondrial variation between our male Okinawan coconut crab and an Indian Ocean specimen that consists of one insertion, one deletion, 135 SNPs, three MNPs and nine complex polymorphisms. We provide confirmatory evidence that the superfamily Paguroidea, to which the coconut crab belongs, is polyphyletic, that all the protein-coding genes of B. latro are under purifying selection, and that a Pacific versus Indian Ocean coconut crab population divergence occurred during the Pleistocene.


Sign in / Sign up

Export Citation Format

Share Document