HOMOLOGOUS SYNTENY BLOCK DETECTION BASED ON SUFFIX TREE ALGORITHMS

2013 ◽  
Vol 11 (06) ◽  
pp. 1343004 ◽  
Author(s):  
YU-LUN CHEN ◽  
CHIEN-MING CHEN ◽  
TUN-WEN PAI ◽  
HON-WAI LEONG ◽  
KET-FAH CHONG

A synteny block represents a set of contiguous genes located within the same chromosome and well conserved among various species. Through long evolutionary processes and genome rearrangement events, large numbers of synteny blocks remain highly conserved across multiple species. Understanding distribution of conserved gene blocks facilitates evolutionary biologists to trace the diversity of life, and it also plays an important role for orthologous gene detection and gene annotation in the genomic era. In this work, we focus on collinear synteny detection in which the order of genes is required and well conserved among multiple species. To achieve this goal, the suffix tree based algorithms for efficiently identifying homologous synteny blocks was proposed. The traditional suffix tree algorithm was modified by considering a chromosome as a string and each gene in a chromosome is encoded as a symbol character. Hence, a suffix tree can be built for different query chromosomes from various species. We can then efficiently search for conserved synteny blocks that are modeled as overlapped contiguous edges in our suffix tree. In addition, we defined a novel Synteny Block Conserved Index (SBCI) to evaluate the relationship of synteny block distribution between two species, and which could be applied as an evolutionary indicator for constructing a phylogenetic tree from multiple species instead of performing large computational requirements through whole genome sequence alignment.

2018 ◽  
Author(s):  
Alexander J. Hart ◽  
Samuel Ginzburg ◽  
Muyang (Sam) Xu ◽  
Cera R. Fisher ◽  
Nasim Rahmatpour ◽  
...  

ABSTRACTEnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates, while focusing primarily on protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.


2017 ◽  
Author(s):  
Jackson Z Lee ◽  
R Craig Everroad ◽  
Ulas Karaoz ◽  
Angela M Detweiler ◽  
Jennifer Pett-Ridge ◽  
...  

AbstractHypersaline photosynthetic microbial mats are stratified microbial communities known for their taxonomic and metabolic diversity and strong light-driven day-night environmental gradients. In this study of the upper photosynthetic zone of hypersaline microbial mats of Elkhorn Slough, California (USA), we show how reference-based and reference-free methods can be used to meaningfully assess microbial ecology and genetic partitioning in these complex microbial systems. Mapping of metagenome reads to the dominantCyanobacteriaobserved in the system,Coleofasciculus (Microcoleus) chthonoplastes, was used to examine strain variants within these metagenomes. Highly conserved gene subsystems indicate a core genome for the species, and a number of variant genes and subsystems suggest strain level differentiation, especially for carbohydrate utilization. Metagenome sequence coverage binning was used to assess ecosystem partitioning of remaining microbes. Functional gene annotation of these bins (primarily ofProteobacteria, Bacteroidetes,andCyanobacteria) recapitulated the known biogeochemical functions in microbial mats using a genetic basis, and also revealed evidence of novel functional diversity within theGemmatimonadetesandGammaproteobacteria. Combined, these two approaches show how genetic partitioning can inform biogeochemical partitioning of the metabolic diversity within microbial ecosystems.


2002 ◽  
Vol 277 (27) ◽  
pp. 24618-24624 ◽  
Author(s):  
Jason P. Salter ◽  
Youngchool Choe ◽  
Hugo Albrecht ◽  
Christopher Franklin ◽  
Kee-Chong Lim ◽  
...  

Plant Disease ◽  
2020 ◽  
Author(s):  
Xue Wang ◽  
Xian Wu ◽  
Shilong Jiang ◽  
Qiaoxiu Yin ◽  
Dongxue Li ◽  
...  

Didymella bellidis is a phytopathogenic fungus that causes leaf spot on tea plants (Camellia sinensis), which negatively affects the productivity and quality of tea leaves in Guizhou Province, China. D. bellidis isolate GZYQYQX2B was sequenced using Pacific Biosciences and Illumina technologies, and assembled into a whole genome of 35.5 Mbp. Transcripts of D. bellidis isolate GZYQYQX2B were predicted from the assembled genome and were further validated by RNA sequence data. In total, 10,731 genes were predicted by integrating three approaches, namely ab initio and homology-based gene prediction, as well as transcriptomics data. The whole-genome sequence of D. bellidis will provide a resource for future research on trait-specific genes of the pathogen and host-pathogen interactions.


2019 ◽  
Vol 20 (S24) ◽  
Author(s):  
Yilin Liu ◽  
Jiao Xu ◽  
Miaoxia Chen ◽  
Changfa Wang ◽  
Shuaicheng Li

Abstract Background Short tandem repeats (STRs) serve as genetic markers in forensic scenes due to their high polymorphism in eukaryotic genomes. A variety of STRs profiling systems have been developed for species including human, dog, cat, cattle, etc. Maintaining these systems simultaneously can be costly. These mammals share many high similar regions along their genomes. With the availability of the massive amount of the whole genomics data of these species, it is possible to develop a unified STR profiling system. In this study, our objective is to propose and develop a unified set of STR loci that could be simultaneously applied to multiple species. Result To find a unified STR set, we collected the whole genome sequence data of the concerned species and mapped them to the human genome reference. Then we extracted the STR loci across the species. From these loci, we proposed an algorithm which selected a subset of loci by incorporating the optimized combined power of discrimination. Our results show that the unified set of loci have high combined power of discrimination, >1−10−9, for both individual species and the mixed population, as well as the random-match probability, <10−7 for all the involved species, indicating that the identified set of STR loci could be applied to multiple species. Conclusions We identified a set of STR loci which shared by multiple species. It implies that a unified STR profiling system is possible for these species under the forensic scenes. The system can be applied to the individual identification or paternal test of each of the ten common species which are Sus scrofa (pig), Bos taurus (cattle), Capra hircus (goat), Equus caballus (horse), Canis lupus familiaris (dog), Felis catus (cat), Ovis aries (sheep), Oryctolagus cuniculus (rabbit), and Bos grunniens (yak), and Homo sapiens (human). Our loci selection algorithm employed a greedy approach. The algorithm can generate the loci under different forensic parameters and for a specific combination of species.


2006 ◽  
Vol 188 (4) ◽  
pp. 1540-1550 ◽  
Author(s):  
Ricardo Del Sol ◽  
Jonathan G. L. Mullins ◽  
Nina Grantcharova ◽  
Klas Flärdh ◽  
Paul Dyson

ABSTRACT The product of the crgA gene of Streptomyces coelicolor represents a novel family of small proteins. A single orthologous gene is located close to the origin of replication of all fully sequenced actinomycete genomes and borders a conserved gene cluster implicated in cell growth and division. In S. coelicolor, CrgA is important for coordinating growth and cell division in sporogenic hyphae. In this study, we demonstrate that CrgA is an integral membrane protein whose peak expression is coordinated with the onset of development of aerial hyphae. The protein localizes to discrete foci away from growing hyphal tips. Upon overexpression, CrgA localizes to apical syncytial cells of aerial hyphae and inhibits the formation of productive cytokinetic rings of the bacterial tubulin homolog FtsZ, leading to proteolytic turnover of this major cell division determinant. In the absence of known prokaryotic cell division inhibitors in actinomycetes, CrgA may have an important conserved function influencing Z-ring formation in these bacteria.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Qionghua Gao ◽  
Zijun Xiong ◽  
Rasmus Stenbak Larsen ◽  
Long Zhou ◽  
Jie Zhao ◽  
...  

Abstract Background Ants with complex societies have fascinated scientists for centuries. Comparative genomic and transcriptomic analyses across ant species and castes have revealed important insights into the molecular mechanisms underlying ant caste differentiation. However, most current ant genomes and transcriptomes are highly fragmented and incomplete, which hinders our understanding of the molecular basis for complex ant societies. Findings By hybridizing Illumina, Pacific Biosciences, and Hi-C sequencing technologies, we de novo assembled a chromosome-level genome for Monomorium pharaonis, with a scaffold N50 of 27.2 Mb. Our new assembly provides better resolution for the discovery of genome rearrangement events at the chromosome level. Analysis of full-length isoform sequencing (ISO-seq) suggested that ∼15 Gb of ISO-seq data were sufficient to cover most expressed genes, but the number of transcript isoforms steadily increased with sequencing data coverage. Our high-depth ISO-seq data greatly improved the quality of gene annotation and enabled the accurate detection of alternative splicing isoforms in different castes of M. pharaonis. Comparative transcriptome analysis across castes based on the ISO-seq data revealed an unprecedented number of transcript isoforms, including many caste-specific isoforms. We also identified a number of conserved long non-coding RNAs that evolved specifically in ant lineages and several that were conserved across insect lineages. Conclusions We produced a high-quality chromosome-level genome for M. pharaonis, which significantly improved previous short-read assemblies. Together with full-length transcriptomes for all castes, we generated a highly accurate annotation for this ant species. These long-read sequencing results provide a useful resource for future functional studies on the genetic mechanisms underlying the evolution of social behaviors and organization in ants.


2018 ◽  
Author(s):  
Jim Clauwaerts ◽  
Gerben Menschaert ◽  
Willem Waegeman

AbstractAnnotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network applying ribosome profiling data that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and Shine-Dalgarno region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, and is therefore evaluated on independent datasets. Through extensive validation of the model, including the use of multiple species sequence similarity and mass spectrometry, the effectiveness of the model is highlighted.


2021 ◽  
Vol 12 ◽  
Author(s):  
Qunhao Niu ◽  
Tianliu Zhang ◽  
Ling Xu ◽  
Tianzhen Wang ◽  
Zezhao Wang ◽  
...  

Bone weight is critical to affect body conformation and stature in cattle. In this study, we conducted a genome-wide association study for bone weight in Chinese Simmental beef cattle based on the imputed sequence variants. We identified 364 variants associated with bone weight, while 350 of them were not included in the Illumina BovineHD SNP array, and several candidate genes and GO terms were captured to be associated with bone weight. Remarkably, we identified four potential variants in a candidate region on BTA6 using Bayesian fine-mapping. Several important candidate genes were captured, including LAP3, MED28, NCAPG, LCORL, SLIT2, and IBSP, which have been previously reported to be associated with carcass traits, body measurements, and growth traits. Notably, we found that the transcription factors related to MED28 and LCORL showed high conservation across multiple species. Our findings provide some valuable information for understanding the genetic basis of body stature in beef cattle.


Sign in / Sign up

Export Citation Format

Share Document