scholarly journals VirClust, a tool for hierarchical clustering, core gene detection and annotation of (prokaryotic) viruses

2021 ◽  
Author(s):  
Cristina Moraru

Recent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called megataxonomy of viruses, recognizes five different viral realms, defined based on the presence of viral hallmark genes. Within the realms, viruses are classified into hierarchical taxons, ideally defined by their shared genes. Therefore, there is currently a need for virus classification tools based on such shared genes / proteins. Here, VirClust is presented: a novel tool capable of performing i) hierarchical clustering of viruses based on intergenomic distances calculated from their protein cluster content, ii) identification of core proteins and iii) annotation of viral proteins. VirClust groups proteins into clusters both based on BLASTP sequence similarity, which identifies more related proteins, and also based on hidden markow models (HMM), which identifies more distantly related proteins. Furthermore, VirClust provides an integrated visualization of the hierarchical clustering tree and of the distribution of the protein content, which allows the identification of the genomic features responsible for the respective clustering. By using different intergenomic distances, the hierarchical trees produced by VirClust can be split into viral genome clusters of different taxonomic ranks. VirClust is freely available, as web-service (virclust.icbm.de) and stand-alone tool.

2021 ◽  
Vol 13 (15) ◽  
pp. 2909
Author(s):  
Chuanpeng Zhao ◽  
Cheng-Zhi Qin

Accurate large-area mangrove classification is a challenging task due to the complexity of mangroves, such as abundant species within the mangrove category, and various appearances resulting from a large latitudinal span and varied habitats. Existing studies have improved mangrove classifications by introducing time series images, constructing new indices sensitive to mangroves, and correcting classifications by empirical constraints and visual inspections. However, false positive misclassifications are still prevalent in current classification results before corrections, and the key reason for false positive misclassification in large-area mangrove classifications is unknown. To address this knowledge gap, a hypothesis that an inadequate classification scheme (i.e., the choice of categories) is the key reason for such false positive misclassification is proposed in this paper. To validate this hypothesis, new categories considering non-mangrove vegetation near water (i.e., within one pixel from water bodies) were introduced, which is inclined to be misclassified as mangroves, into a normally-used standard classification scheme, so as to form a new scheme. In controlled conditions, two experiments were conducted. The first experiment using the same total features to derive direct mangrove classification results in China for the year 2018 on the Google Earth Engine with the standard scheme and the new scheme respectively. The second experiment used the optimal features to balance the probability of a selected feature to be effective for the scheme. A comparison shows that the inclusion of the new categories reduced the false positive pixels with a rate of 71.3% in the first experiment, and a rate of 66.3% in the second experiment. Local characteristics of false positive pixels within 1 × 1 km cells, and direct classification results in two selected subset areas were also analyzed for quantitative and qualitative validation. All the validation results from the two experiments support the finding that the hypothesis is true. The validated hypothesis can be easily applied to other studies to alleviate the prevalence of false positive misclassifications.


FEBS Letters ◽  
1994 ◽  
Vol 338 (3) ◽  
pp. 251-256 ◽  
Author(s):  
Michael Arand ◽  
David F. Grant ◽  
Jeffrey K. Beetham ◽  
Thomas Friedberg ◽  
Franz Oesch ◽  
...  

Author(s):  
Soon Dong Lee ◽  
In Seop Kim ◽  
Hanna Choe ◽  
Ji-Sun Kim

A Gram-negative, facultatively anaerobic bacterium, designated SAP-6T, was isolated from sap extracted from Acer pictum in Mt. Halla in Jeju, Republic of Korea and its precise taxonomic status was determined by a polyphasic approach. Cells were non-sporulating, motile, short rods and showed growth at 4–37 °C, pH 6.0–8.0 and 0–4% NaCl. Phylogenomic analysis based on 92 core gene sequences showed that strain SAP-6T belonged to the family Pectobacteriaceae and formed a distinct clade between members of the genera Sodalis and Biostraticola with gene support index of 89. The closest phylogenetic neighbours were Biostraticola tofi DSM 19580T (97.3% 16S rRNA gene sequence similarity) and Sodalis praecaptivus HS1T (96.8%), with the average amino acid identity values of 75.3% and 74.0%, respectively. The major polar lipids were diphosphatidylglycerol, phosphatidylcholine, phosphatidylethanolamine, phosphatidylglycerol and an unidentified aminophospholipid. The major isoprenoid quinones were Q-7 and Q-8. The predominant fatty acids were C16:0, C17:0 cyclo and summed feature 3. The DNA G+C content was 57.0%. On the basis of data presented here, strain SAP-6T (=KCTC 52622T=DSM 104038T) represents a novel species of a new genus in the family Pectobacteriaceae , for which the name Acerihabitans arboris gen. nov., sp. nov. is proposed.


Author(s):  
Lingmin Jiang ◽  
Won Yong Jung ◽  
Zhun Li ◽  
Mi-Kyung Lee ◽  
Seung-Hwan Park ◽  
...  

A Gram-stain-positive, facultatively anaerobic, endospore-forming, rod-shaped strain, AGMB 02131T, which grew at 20–40 °C (optimum 30 °C), pH 3.0–11.0 (optimum pH 4.0) and in the presence of 0–18 % (w/v) NaCl (optimum 10 %), was isolated from a cow faecal sample and identified as a novel strain using a polyphasic taxonomic approach. The phylogenetic analysis based on 16S rRNA gene sequences along with the whole genome (92 core gene sets) revealed that AGMB 02131T formed a group within the genus Peribacillus , and showed the highest sequence similarity with Peribacillus endoradicis DSM 28131T (96.9 %), following by Peribacillus butanolivorans DSM 18926T (96.6 %). The genome of AGMB 02131T comprised 70 contigs, the chromosome length was 4 038 965 bp and it had a 38.5 % DNA G+C content. Digital DNA–DNA hybridization revealed that AGMB 02131T displayed 21.4 % genomic DNA relatedness with the most closely related strain, P. butanolivorans DSM 18926T. AGMB 02131T contains all of the conserved signature indels that are specific for members of the genus Peribacillus . The major cellular fatty acids (>10 %) of AGMB 02131T were C18 : 1ω9c, C18:0 and C16 : 0. The major polar lipids present were diphosphatidylglycerol, phosphatidylglycerol and phosphatidylethanolamine. On the basis of the phenotypic, phylogenetic, genomic and chemotaxonomic features, AGMB 02131T represents a novel species of the genus Peribacillus , for which the name Peribacillus faecalis sp. nov. is proposed. The type strain is AGMB 02131T (=KCTC 43221T=CCTCC AB 2020077T).


1999 ◽  
Vol 276 (3) ◽  
pp. F398-F408 ◽  
Author(s):  
John C. Edwards

Several closely related proteins that have been implicated as chloride channels of intracellular membranes have recently been described. We report here the molecular cloning and characterization of a new member of this family from human cells. On the basis of sequence similarity, we conclude that this new protein represents the human version of a previously described protein from rat brain named p64H1. The human version of p64H1 (huH1) is a 28.7-kDa protein that shows an apparent molecular mass of 31 kDa by SDS-PAGE. A single 4.5-kb message is detected on Northern blots and is present in all tissues probed. The protein is expressed in an intracellular vesicular pattern in Panc-1 cells that is distinct from the endoplasmic reticulum, fluid-phase endocytic, and transferrin-recycling compartments, but which does colocalize with caveolin. In human kidney, huH1 is highly expressed in a diffuse pattern in the apical domain of proximal tubule cells. huH1 is expressed less abundantly in a vesicular pattern in glomeruli and distal nephron.


2020 ◽  
Vol 49 (D1) ◽  
pp. D452-D457
Author(s):  
Lisanna Paladin ◽  
Martina Bevilacqua ◽  
Sara Errigo ◽  
Damiano Piovesan ◽  
Ivan Mičetić ◽  
...  

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.


2012 ◽  
Vol 140 (10) ◽  
pp. 1823-1829 ◽  
Author(s):  
S. KHARE ◽  
S. S. NEGI ◽  
S. SINGH ◽  
M. SINGHAL ◽  
S. KUMAR ◽  
...  

SUMMARYWe investigated an unprecedented outbreak of fulminant hepatitis B virus (HBV) that occurred in Modasa, Gujarat (India) in 2009. Genomic analysis of all fulminant hepatic failure cases confirmed exclusive predominance of subgenotype D1. A1762T, G1764A basal core promoter (BCP) mutations, insertion of isoleucine after nt 1843, stop codon mutation G1896A, G1862T transversion plus seven other mutations in the core gene caused inhibition of HBeAg expression implicating them as circulatingprecore/BCP mutant virus. Two rare mutations at amino acids 89 (Ile→Ala) and 119 (Leu→Ser) in addition to other mutations in thepolymerase (pol)gene may have caused some alteration in either of fourpolgene domains to affect encapsidation of pregenomic RNA to enhance pathogenicity. Sequence similarity among patients' sequences suggested an involvement of a single hepatitis B mutant strain/source to corroborate the finding of gross and continued usage of HBV mutant-contaminated syringes/needles by a physician which resulted in this unprecedented outbreak of fulminant hepatitis B. The fulminant exacerbation of the disease might be attributed to mutations in the BCP/precore/coreandpolgenes that may have occurred due to selection pressure during rapid spread/mutation of the virus.


2020 ◽  
Author(s):  
Julia Koehler Leman ◽  
Richard Bonneau

AbstractStructures of membrane proteins are challenging to determine experimentally and currently represent only about 2% of the structures in the ProteinDataBank. Because of this disparity, methods for modeling membrane proteins are fewer and of lower quality than those for modeling soluble proteins. However, better expression, crystallization, and cryo-EM techniques have prompted a recent increase in experimental structures of membrane proteins, which can act as templates to predict the structure of closely related proteins through homology modeling. Because homology modeling relies on a structural template, it is easier and more accurate than fold recognition methods or de novo modeling, which are used when the sequence similarity between the query sequence and the sequence of related proteins in structural databases is below 25%. In homology modeling, a query sequence is mapped onto the coordinates of a single template and refined. With the increase in available templates, several templates often cover overlapping segments of the query sequence. Multi-template modeling can be used to identify the best template for local segments and join them into a single model. Here we provide a protocol for modeling membrane proteins from multiple templates in the Rosetta software suite. This approach takes advantage of several integrated frameworks, namely RosettaScripts, RosettaCM, and RosettaMP with the membrane scoring function.


Sign in / Sign up

Export Citation Format

Share Document