scholarly journals HSDFinder: A BLAST-Based Strategy for Identifying Highly Similar Duplicated Genes in Eukaryotic Genomes

2021 ◽  
Vol 1 ◽  
Author(s):  
Xi Zhang ◽  
Yining Hu ◽  
David Roy Smith

Gene duplication is an important evolutionary mechanism capable of providing new genetic material for adaptive and nonadaptive evolution. However, bioinformatics tools for identifying duplicate genes are often limited to the detection of paralogs in multiple species or to specific types of gene duplicates, such as retrocopies. Here, we present a user-friendly, BLAST-based web tool, called HSDFinder, which can identify, annotate, categorize, and visualize highly similar duplicate genes (HSDs) in eukaryotic nuclear genomes. HSDFinder includes an online heatmap plotting option, allowing users to compare HSDs among different species and visualize the results in different Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway functional categories. The external software requirements are BLAST, InterProScan, and KEGG. The utility of HSDFinder was tested on various model eukaryotic species, including Chlamydomonas reinhardtii, Arabidopsis thaliana, Oryza sativa, and Zea mays as well as the psychrophilic green alga Chlamydomonas sp. UWO241, and was proven to be a practical and accurate tool for gene duplication analyses. The web tool is free to use at http://hsdfinder.com. Documentation and tutorials can be found via the GitHub: https://github.com/zx0223winner/HSDFinder.

2020 ◽  
Vol 89 (1) ◽  
pp. 637-666
Author(s):  
Emma J. Fenech ◽  
Shifra Ben-Dor ◽  
Maya Schuldiner

The evolution of eukaryotic genomes has been propelled by a series of gene duplication events, leading to an expansion in new functions and pathways. While duplicate genes may retain some functional redundancy, it is clear that to survive selection they cannot simply serve as a backup but rather must acquire distinct functions required for cellular processes to work accurately and efficiently. Understanding these differences and characterizing gene-specific functions is complex. Here we explore different gene pairs and families within the context of the endoplasmic reticulum (ER), the main cellular hub of lipid biosynthesis and the entry site for the secretory pathway. Focusing on each of the ER functions, we highlight specificities of related proteins and the capabilities conferred to cells through their conservation. More generally, these examples suggest why related genes have been maintained by evolutionary forces and provide a conceptual framework to experimentally determine why they have survived selection.


1990 ◽  
Vol 3 (1) ◽  
pp. 145
Author(s):  
DJ Colgan

This paper is a review of the use of information regarding the presence of duplicate genes and their regulation in systematics. The review concentrates on data derived from protein electrophoresis and restriction fragment length polymorphism analysis. The appearance of a duplication in a subset of a group of species implies that the members of the subset belong to the same clade. Suppression of the duplication may render this clade apparently paraphyletic, but may itself be informative of relations within the lineage through patterns of loss of expression in all, or some tissues, or through restrictions of the formation of functional heteropolymers in polymeric enzymes. Examples are given of studies which have used such information to establish phylogenetic hypotheses at the family level, to identify an auto- or allo-polyploid origin of polyploid species and to determine whether there have been single or multiple origins of such species. The likelihood of homoplasy in the patterns of appearance and regulation of duplicates depends on the molecular basis of the duplication. In particular, the contrast between the expected consequences of tandem duplication and the expression of pseudogenes emphasises the value of determining the mechanism of the original duplication. Many instances of sporadic gene duplication are now known, and polyploidisation is a common event in the evolutionary history of both plants and animals. So the opportunities to discover duplicationrelated characters will arise in many systematic studies. A program is presented to increase the chances that such useful information will be recognisable during the studies.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2019 ◽  
Vol 47 (W1) ◽  
pp. W52-W58 ◽  
Author(s):  
Ling Xu ◽  
Zhaobin Dong ◽  
Lu Fang ◽  
Yongjiang Luo ◽  
Zhaoyuan Wei ◽  
...  

Abstract OrthoVenn is a powerful web platform for the comparison and analysis of whole-genome orthologous clusters. Here we present an updated version, OrthoVenn2, which provides new features that facilitate the comparative analysis of orthologous clusters among up to 12 species. Additionally, this update offers improvements to data visualization and interpretation, including an occurrence pattern table for interrogating the overlap of each orthologous group for the queried species. Within the occurrence table, the functional annotations and summaries of the disjunctions and intersections of clusters between the chosen species can be displayed through an interactive Venn diagram. To facilitate a broader range of comparisons, a larger number of species, including vertebrates, metazoa, protists, fungi, plants and bacteria, have been added in OrthoVenn2. Finally, a stand-alone version is available to perform large dataset comparisons and to visualize results locally without limitation of species number. In summary, OrthoVenn2 is an efficient and user-friendly web server freely accessible at https://orthovenn2.bioinfotoolkits.net.


2016 ◽  
Vol 2016 ◽  
pp. 1-4 ◽  
Author(s):  
Na Han ◽  
Weiwen Yu ◽  
Yujun Qiang ◽  
Wen Zhang

Type IV secretion system (T4SS) can mediate the passage of macromolecules across cellular membranes and is essential for virulent and genetic material exchange among bacterial species. The Type IV Secretion Project 2.0 (T4SP 2.0) database is an improved and extended version of the platform released in 2013 aimed at assisting with the detection of Type IV secretion systems (T4SS) in bacterial genomes. This advanced version provides users with web server tools for detecting the existence and variations of T4SS genes online. The new interface for the genome browser provides a user-friendly access to the most complete and accurate resource of T4SS gene information (e.g., gene number, name, type, position, sequence, related articles, and quick links to other webs). Currently, this online database includes T4SS information of 5239 bacterial strains.Conclusions. T4SS is one of the most versatile secretion systems necessary for the virulence and survival of bacteria and the secretion of protein and/or DNA substrates from a donor to a recipient cell. This database on virB/D genes of the T4SS system will help scientists worldwide to improve their knowledge on secretion systems and also identify potential pathogenic mechanisms of various microbial species.


2018 ◽  
Vol 399 (9) ◽  
pp. 983-995
Author(s):  
Chenwei Wang ◽  
Leire Moya ◽  
Judith A. Clements ◽  
Colleen C. Nelson ◽  
Jyotsna Batra

AbstractThe dysregulation of the serine-protease family kallikreins (KLKs), comprising 15 genes, has been reportedly associated with cancer. Their expression in several tissues and physiological fluids makes them potential candidates as biomarkers and therapeutic targets. There are several databases available to mine gene expression in cancer, which often include clinical and pathological data. However, these platforms present some limitations when comparing a specific set of genes and can generate considerable unwanted data. Here, several datasets that showed significant differential expression (p<0.01) in cancer vs. normal (n=118), metastasis vs. primary (n=15) and association with cancer survival (n=21) have been compiled in a user-friendly format from two open and/or publicly available databases Oncomine and OncoLnc for the 15 KLKs. The data have been included in a free web application tool: the KLK-CANMAP https://cancerbioinformatics.shinyapps.io/klk-canmap/. This tool integrates, analyses and visualises data and it was developed with the R Shiny framework. Using KLK-CANMAP box-plots, heatmaps and Kaplan-Meier graphs can be generated for the KLKs of interest. We believe this new cancer KLK focused web tool will benefit the KLK community by narrowing the data visualisation to only the genes of interest.


2020 ◽  
Vol 36 (10) ◽  
pp. 3246-3247
Author(s):  
Vaclav Brazda ◽  
Jan Kolomaznik ◽  
Jean-Louis Mergny ◽  
Jiri Stastny

Abstract Motivation G-quadruplexes (G4) are important regulatory non-B DNA structures with therapeutic potential. A tool for rational design of mutations leading to decreased propensity for G4 formation should be useful in studying G4 functions. Although tools exist for G4 prediction, no easily accessible tool for the rational design of G4 mutations has been available. Results We developed a web-based tool termed G4Killer that is based on the G4Hunter algorithm. This new tool is a platform-independent and user-friendly application to design mutations crippling G4 propensity in a parsimonious way (i.e., keeping the primary sequence as close as possible to the original one). The tool is integrated into our DNA analyzer server and allows for generating mutated DNA sequences having the desired lowered G4Hunter score with minimal mutation steps. Availability and implementation The G4Killer web tool can be accessed at: http://bioinformatics.ibp.cz. Supplementary information Supplementary data are available at Bioinformatics online.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Jong-Heon Kim ◽  
Su-Hyeong Park ◽  
Jin Han ◽  
Pan-Woo Ko ◽  
Dongseop Kwon ◽  
...  

Abstract Glial cells are phenotypically heterogeneous non-neuronal components of the central and peripheral nervous systems. These cells are endowed with diverse functions and molecular machineries to detect and regulate neuronal or their own activities by various secreted mediators, such as proteinaceous factors. In particular, glia-secreted proteins form a basis of a complex network of glia–neuron or glia–glia interactions in health and diseases. In recent years, the analysis and profiling of glial secretomes have raised new expectations for the diagnosis and treatment of neurological disorders due to the vital role of glia in numerous physiological or pathological processes of the nervous system. However, there is no online database of glia-secreted proteins available to facilitate glial research. Here, we developed a user-friendly ‘Gliome’ database (available at www.gliome.org), a web-based tool to access and analyze glia-secreted proteins. The database provides a vast collection of information on 3293 proteins that are released from glia of multiple species and have been reported to have differential functions under diverse experimental conditions. It contains a web-based interface with the following four key features regarding glia-secreted proteins: (i) fundamental information, such as signal peptide, SecretomeP value, functions and Gene Ontology category; (ii) differential expression patterns under distinct experimental conditions; (iii) disease association; and (iv) interacting proteins. In conclusion, the Gliome database is a comprehensive web-based tool to access and analyze glia-secretome data obtained from diverse experimental settings, whereby it may facilitate the integration of bioinformatics into glial research.


Genome ◽  
1991 ◽  
Vol 34 (1) ◽  
pp. 151-155 ◽  
Author(s):  
P. M. Gaur ◽  
A. E. Slinkard

Fructose-bisphosphate aldolase (ALD, EC 4.1.2.13) was analysed in Cicer arietinum L. (the cultivated chickpea) and all eight annual wild Cicer species, C. bijugum Rech., C. chorassanicum (Bge.) M. Pop., C. cuneatum Rich., C. echinospermum Davis, C. judaicum Boiss., C. pinnatifidum J. &S., C. reticulatum Lad., and C. yamashitae Kit. Duplicate genes were identified for the plastid-specific isozyme of ALD in C. arietinum and all wild species except C. yamashitae and one accession of C. reticulatum. Gene duplication was indicated by the presence of a true-breeding five-banded zymotype of the tetrameric plastid ALD in these species. Monogenic inheritance was confirmed for the alleles of one of the loci. The occurrence of ALD gene duplication in most of the annual Cicer species suggests that this duplication is of ancient origin. However, this duplication must have occurred after divergence of Cicer from the closely related genera Pisum and Lens because the plastid ALD is controlled monogenically in these latter two genera.Key words: Cicer, isozymes, aldolase, gene duplication.


2016 ◽  
Author(s):  
Kousuke Hanada ◽  
Ayumi Tezuka ◽  
Masafumi Nozawa ◽  
Yutaka Suzuki ◽  
Sumio Sugano ◽  
...  

AbstractLineage-specifically duplicated genes likely contribute to the phenotypic divergence in closely related species. However, neither the frequency of duplication events nor the degree of selective pressures immediately after gene duplication is clear in the speciation process. Plants have substantially higher gene duplication rates than most other eukaryotes. Here, using Illumina short reads from Arabidopsis halleri, which has highly qualified plant genomes in close species (Brassica rapa, A. thaliana and A. lyrata), we succeeded in generating orthologous gene groups among B. rapa, A. thaliana, A. lyrata and A. halleri. The frequency of duplication events in the Arabidopsis lineage was approximately 10 times higher than the frequency inferred by comparative genomics of Arabidopsis, poplar, rice and moss. Of the currently retained genes in A. halleri, 11–24% had undergone gene duplication in the Arabidopsis lineage. To examine the degree of selective pressure for duplicated genes, we calculated the ratios of nonsynonymous to synonymous substitution rates (KA/KS) in the A. halleri-lyrata and A. halleri lineages. Using a maximum-likelihood framework, we examined positive (KA/KS > 1) and purifying selection (KA/KS < 1) at a significant level (P < 0.01). Duplicate genes tended to have a higher proportion of positive selection compared with non-duplicated genes. More interestingly, we found that functional divergence of duplicated genes was accelerated several million years after gene duplication at a higher proportion than immediately after gene duplication.


Sign in / Sign up

Export Citation Format

Share Document