sequence similarities
Recently Published Documents


TOTAL DOCUMENTS

750
(FIVE YEARS 117)

H-INDEX

65
(FIVE YEARS 6)

2022 ◽  
Vol 12 ◽  
Author(s):  
Aaron J. Robinson ◽  
Hajnalka E. Daligault ◽  
Julia M. Kelliher ◽  
Erick S. LeBrun ◽  
Patrick S. G. Chain

Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted ‘raw’ shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.


2021 ◽  
Author(s):  
Ying Yang ◽  
Yue Zhu ◽  
Wei Li ◽  
Yajing Ren ◽  
Shengxiong Huang ◽  
...  

Abstract A novel diazotrophic bacterium, designated CCTCC AB 2021101T, was isolated from fresh roots of kiwifruit. Cells of strain CCTCC AB 2021101T were Gram-negative, aerobic and rod-shaped, with motility provided by peritrichous flagella. The 16S rRNA analysis showed that strain CCTCC AB 2021101T belongs to the genus Azospirillum and is closely related to Azospirillum melinis (98.32%), Azospirillum oryzae (97.73%), Azospirillum lipoferum (96.98%), Azospirillum humicireducens (96.49%) and Azospirillum largimobile (96.01%) and lower sequence similarity (<96.0 %) to all other species of the genus Azospirillum. Strain CCTCC AB 2021101T was able to grow well at 35–40℃ and pH 6.0–7.0, and tolerated up to 3.0 % (w/v) NaCl. The major saturated fatty acids are C14:0, C16:0 and C18:0. C18:1 ω7c and C16:0 3-OH were the major unsaturated and hydroxylated fatty acid. The G+C content was 67.8 mol%. Strain CCTCC AB 2021101T gave positive amplification for dinitrogen reductase (nifH gene). Highest nifH gene sequence similarities were obtained with Azospirillum brasilense AWB14T(95.9%), Azospirillum zeae Gr24T(95.56%), Azospirillum picis DSM 19922T(96.79%), Azospirillum lipoferum B22T(94.88%) and Azospirillum oryzae COC8T(94.88%). The activity of the nitrogenase of the strain was further confirmed by acetylene-reduction assay, which was recorded as 81 nmol ethylene h-1. Based on these data, strain CCTCC AB 2021101T is considered to represent a novel endophytic diazotrophs species in the genus Azospirillum, for which the name Azospirillum actinidiae sp. nov. is proposed. The type strain is CCTCC AB 2021101T.


Author(s):  
Jarosław Król ◽  
Aneta Nowakiewicz ◽  
Alicja Błaszków ◽  
Maria Brodala ◽  
Adrianna Domagała ◽  
...  

AbstractThe aim of the present study was to characterize bacteria of the genus Streptococcus isolated from the oral cavity of the guinea pig as well as to assess the significance of these microorganisms as potential veterinary and human pathogens. Sixty-two streptococcal isolates recovered from 27 clinically healthy guinea pigs were examined genotypically by sequencing the 16S rRNA and groEL genes. Among these isolates, only 13 could be assigned to a species described previously (mainly Streptococcus parasanguinis, S. mitis and S. suis), and the majority of the remaining ones differed considerably from the streptococcal species known to date (16S rRNA and groEL sequence similarities were < 97% and < 87%, respectively). Based on 16S rRNA sequences, these unidentified isolates were divided into seven groups (clades), of which clades I through III comprised most of the isolates examined and had also the widest distribution among guinea pig colonies. Upon groEL gene sequence analysis, however, members of the three clades grouped together without forming such distinct clusters. The remaining clades distinguished by 16S rRNA sequencing could also be discerned by the second gene, and they contained only a few isolates often restricted to one or a few animal colonies. The present work reveals that the guinea pig mouth is inhabited by a vast number of phylogenetically diverse, so far unrecognized populations of streptococci, most of them being apparently host-specific genomospecies. On the contrary, S. parasanguinis and S. mitis are also common human commensals and S. suis is a well-recognized zoonotic pathogen.


Biomolecules ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1862
Author(s):  
Luciana Esposito ◽  
Nicole Balasco ◽  
Giovanni Smaldone ◽  
Rita Berisio ◽  
Alessia Ruggiero ◽  
...  

One of the most striking features of KCTD proteins is their involvement in apparently unrelated yet fundamental physio-pathological processes. Unfortunately, comprehensive structure–function relationships for this protein family have been hampered by the scarcity of the structural data available. This scenario is rapidly changing due to the release of the protein three-dimensional models predicted by AlphaFold (AF). Here, we exploited the structural information contained in the AF database to gain insights into the relationships among the members of the KCTD family with the aim of facilitating the definition of the structural and molecular basis of key roles that these proteins play in many biological processes. The most important finding that emerged from this investigation is the discovery that, in addition to the BTB domain, the vast majority of these proteins also share a structurally similar domain in the C-terminal region despite the absence of general sequence similarities detectable in this region. Using this domain as reference, we generated a novel and comprehensive structure-based pseudo-phylogenetic tree that unraveled previously undetected similarities among the protein family. In particular, we generated a new clustering of the KCTD proteins that will represent a solid ground for interpreting their many functions.


2021 ◽  
Author(s):  
Guang Zheng ◽  
Xiaohu Lu ◽  
Yarong Shi ◽  
Junnan Chen ◽  
Yajie Gao ◽  
...  

2021 ◽  
Vol 8 ◽  
Author(s):  
Kazuyoshi Ikeda ◽  
Takuo Doi ◽  
Masami Ikeda ◽  
Kentaro Tomii

Given the abundant computational resources and the huge amount of data of compound–protein interactions (CPIs), constructing appropriate datasets for learning and evaluating prediction models for CPIs is not always easy. For this study, we have developed a web server to facilitate the development and evaluation of prediction models by providing an appropriate dataset according to the task. Our web server provides an environment and dataset that aid model developers and evaluators in obtaining a suitable dataset for both proteins and compounds, in addition to attributes necessary for deep learning. With the web server interface, users can customize the CPI dataset derived from ChEMBL by setting positive and negative thresholds to be adjusted according to the user’s definitions. We have also implemented a function for graphic display of the distribution of activity values in the dataset as a histogram to set appropriate thresholds for positive and negative examples. These functions enable effective development and evaluation of models. Furthermore, users can prepare their task-specific datasets by selecting a set of target proteins based on various criteria such as Pfam families, ChEMBL’s classification, and sequence similarities. The accuracy and efficiency of in silico screening and drug design using machine learning including deep learning can therefore be improved by facilitating access to an appropriate dataset prepared using our web server (https://binds.lifematics.work/).


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260360
Author(s):  
Ehsan Ahmadi ◽  
Mohammad Reza Zabihi ◽  
Ramin Hosseinzadeh ◽  
Leila Mohamed Khosroshahi ◽  
Farshid Noorbakhsh

Recent emergence of SARS-CoV-2 and associated COVID-19 pandemic have posed a great challenge for the scientific community. In this study, we performed bioinformatic analyses on SARS-CoV-2 protein sequences, trying to unravel potential molecular similarities between this newly emerged pathogen with non-coronavirus ssRNA viruses. Comparing the proteins of SARS-CoV-2 with non-coronavirus positive and negative strand ssRNA viruses revealed multiple sequence similarities between SARS-CoV-2 and non-coronaviruses, including similarities between RNA-dependent RNA-polymerases and helicases (two highly-conserved proteins). We also observed similarities between SARS-CoV-2 surface (i.e. spike) protein with paramyxovirus fusion proteins. This similarity was restricted to a segment of spike protein S2 subunit which is involved in cell fusion. We next analyzed spike proteins from SARS-CoV-2 “variants of concern” (VOCs) and “variants of interests” (VOIs) and found that some of these variants show considerably higher spike-fusion similarity with paramyxoviruses. The ‘spike-fusion’ similarity was also observed for some pathogenic coronaviruses other than SARS-CoV-2. Epitope analysis using experimentally verified data deposited in Immune Epitope Database (IEDB) revealed that several B cell epitopes as well as T cell and MHC binding epitopes map within the spike-fusion similarity region. These data indicate that there might be a degree of convergent evolution between SARS-CoV-2 and paramyxovirus surface proteins which could be of pathogenic and immunological importance.


2021 ◽  
Vol 948 (1) ◽  
pp. 012016
Author(s):  
S Akram ◽  
N I Ab Ghani ◽  
S Khamis ◽  
S Zulkifly

Abstract Flavonoids are secondary metabolites. To date, 2000 naturally occurring flavonoids are known present in plants. These diverse groups of antioxidants are abundant in rhizomes and leaves of Zingiberaceae species. Flavonoids are expressed by many genes. The most studied is chalcone synthase (CHS) gene. However, no study has been performed to study CHS gene in four endemic and pharmacological known Zingiberaceae species: Alpinia mutica, Alpinia rafflesiana, Hornstedtia leonurus and Scaphochlamys kunstleri. Furthermore, A. rafflesiana and S. kunstleri are threatened species. Thus, this study aimed to develop new CHS primers for these selected species. A total of 43 sequences of CHS genes belong to Zingiberaceae and Costaceae were retrieved from the NCBI database. Then, BLASTN was used to check sequence similarities of the retrieved CHS sequences to these four studied species, other Zingiberaceae and Costaceae. In follow, all redundant sequences were excluded and 15 sequences were saved as the final dataset. These 15 sequences were used to design genic primers using Primer3 software and in-silico analysed using OligoAnalyzer™ Tool. This study had successfully designed 12 new CHS genic primers. All the primers can be used for future studies to determine the presence and expression of CHS gene in these four species.


2021 ◽  
Author(s):  
Mrinalini Mrinalini ◽  
Nalini Puniamoorthy

Abstract BackgroundOxford Nanopore Technologies (ONT) long-read transcriptomes offer many advantages including long reads (>10kbp), end-to-end transcripts, structural variants, isoform-level resolution of genes and expression. However, uptake of ONT transcriptomics is still low, largely due to high error rates (2 to 13%) and reliance on reference databases that are unavailable for many non-model species. Additionally, bioinformatics tools and pipelines for de novo ONT transcriptomics are still in early stages of development. ResultsHere, we use de novo ONT GridION transcriptomics to discover novel genes from the male accessory glands (AG) of a widespread, non-model dung fly, Sepsis punctum. Insect AGs are of particular interest for this as they are hotspots for rapid evolution of novel reproductive genes, and they synthesize seminal fluid proteins that lack homology to any other known proteins. We implement a completely de novo ONT GridION transcriptome pipeline, incorporating quality-filtering and rigorous error-correction procedures, to characterize this novel gene set and to quantify their expression. Specifically, we compare these ONT genes and their expression against de novo lllumina HiSeq transcriptome data. We find 40 high-quality and high-confidence ONT genes that cross-verify against Illumina genes; twenty-six of which are novel and specific to S. punctum. Read count based expression quantification in ONT samples is highly congruent with Illumina’s Transcript per Million (TPM), both in overall pattern and within functional categories. Novel genes account for an average of 81% of total gene expression underscoring their functional importance in S. punctum AGs. Eighty percentage of these genes are secretory in nature, responsible for 74% total gene expression. Notably, median sequence similarities of ONT nucleotide and protein sequences match within-Illumina sequence similarities indicating that our de novo ONT transcriptome pipeline successfully mitigated sequencing errors. ConclusionsThis is the first study to adapt ONT transcriptomics for completely de novo characterization of novel genes in animals. Our study demonstrates that ONT long-reads, constituting a quarter of the number of bases sequenced at less than a third the cost of Illumina reads, can be a resource-friendly and cost-effective solution for end-to-end sequencing of unknown genes even in the absence of a reference database.


Sign in / Sign up

Export Citation Format

Share Document