scholarly journals MINTIA: a metagenomic INserT integrated assembly and annotation tool

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11885
Author(s):  
Philippe Bardou ◽  
Sandrine Laguerre ◽  
Sarah Maman Haddad ◽  
Sabrina Legoueix Rodriguez ◽  
Elisabeth Laville ◽  
...  

The earth harbors trillions of bacterial species adapted to very diverse ecosystems thanks to specific metabolic function acquisition. Most of the genes responsible for these functions belong to uncultured bacteria and are still to be discovered. Functional metagenomics based on activity screening is a classical way to retrieve these genes from microbiomes. This approach is based on the insertion of large metagenomic DNA fragments into a vector and transformation of a host to express heterologous genes. Metagenomic libraries are then screened for activities of interest, and the metagenomic DNA inserts of active clones are extracted to be sequenced and analysed to identify genes that are responsible for the detected activity. Hundreds of metagenomics sequences found using this strategy have already been published in public databases. Here we present the MINTIA software package enabling biologists to easily generate and analyze large metagenomic sequence sets, retrieved after activity-based screening. It filters reads, performs assembly, removes cloning vector, annotates open reading frames and generates user friendly reports as well as files ready for submission to international sequence repositories. The software package can be downloaded from https://github.com/Bios4Biol/MINTIA.

2020 ◽  
Vol 17 (3) ◽  
pp. 537-544
Author(s):  
Nguyen Thi Thao ◽  
Do Thi Huyen ◽  
Truong Nam Hai

In lower termite such as Coptotermes gestroi, cellulose and hemicellulose are hydrolysed by cellulases and hemicellulases secreted from bacteria, archaea, protozoa and fungy in the hindgut. In which, majority of the enzymes are contributed by protozoa. From the metagenomic DNA data (125,423 open reading frames -ORFs) of free-living bacteria in the gut of C. gestroi harvested in Southern Vietnam and by MEGA 4.0 software, 100.340 ORFs were classified into 1,368 species, 628 genera, 217 families, 97 orders, 41 classes and 22 phyla (Do et al., 2014). Among these, 2,131 ORFs (2,12%) belong to 24 bacterial species (account 1,75% bacterial species), 11 families, 9 orders, 8 classes and 5 phyla were predicted have ability to produce cellulases; 679 ORFs belong to 18 bacterial species 8 families, 6 orders, 5 classes, 4 phyla were predicted have ability to produce hemicellulase. Majority of cellulase producers were species which of Firmicutes (15/24 species), accumulated in class Clostridia, order Clostridiales. The most abundant cellulase producer was Pseudomonas fluorescens (1,258 ORFs) of order Pseudomonadaceae. Out of the 18 hemicellulase producers, the most abundant species was Clostridium thermocellum (113 ORFs) in the phylum Firmicutes, followed by 3 species belonging to the phylum Bacteroidetes. The species predicted to produce both cellulase, hemicellulase were C. thermocellum, Ruminococcusns flavefaciens and Bacillus subtilis. Our study provides  a data of gut cellulose and hemicellulose - degrading bacteria composition of C. gestroi


2020 ◽  
Author(s):  
Sebastien A. Choteau ◽  
Audrey Wagner ◽  
Philippe Pierre ◽  
Lionel Spinelli ◽  
Christine Brun

ABSTRACTThe development of high-throughput technologies revealed the existence of non-canonical short open reading frames (sORFs) on most eukaryotic RNAs. They are ubiquitous genetic elements highly conserved across species and suspected to be involved in numerous cellular processes. MetamORF (http://metamorf.hb.univ-amu.fr/) aims to provide a repository of unique sORFs identified in the human and mouse genomes with both experimental and computational approaches. By gathering publicly available sORF data, normalizing it and summarizing redundant information, we were able to identify a total of 1,162,675 unique sORFs. Despite the usual characterization of ORFs as short, upstream or downstream, there is currently no clear consensus regarding the definition of these categories. Thus, the data has been reprocessed using a normalized nomenclature. MetamORF enables new analyses at loci, gene, transcript and ORF levels, that should offer the possibility to address new questions regarding sORF functions in the future. The repository is available through an user-friendly web interface, allowing easy browsing, visualization, filtering over multiple criteria and export possibilities. sORFs could be searched starting from a gene, a transcript, an ORF ID, or looking in a genome area. The database content has also been made available through track hubs at UCSC Genome Browser.


2020 ◽  
Vol 49 (D1) ◽  
pp. D236-D242 ◽  
Author(s):  
Wendi Huang ◽  
Yunchao Ling ◽  
Sirui Zhang ◽  
Qiguang Xia ◽  
Ruifang Cao ◽  
...  

Abstract TransCirc (https://www.biosino.org/transcirc/) is a specialized database that provide comprehensive evidences supporting the translation potential of circular RNAs (circRNAs). This database was generated by integrating various direct and indirect evidences to predict coding potential of each human circRNA and the putative translation products. Seven types of evidences for circRNA translation were included: (i) ribosome/polysome binding evidences supporting the occupancy of ribosomes onto circRNAs; (ii) experimentally mapped translation initiation sites on circRNAs; (iii) internal ribosome entry site on circRNAs; (iv) published N-6-methyladenosine modification data in circRNA that promote translation initiation; (v) lengths of the circRNA specific open reading frames; (vi) sequence composition scores from a machine learning prediction of all potential open reading frames; (vii) mass spectrometry data that directly support the circRNA encoded peptides across back-splice junctions. TransCirc provides a user-friendly searching/browsing interface and independent lines of evidences to predicte how likely a circRNA can be translated. In addition, several flexible tools have been developed to aid retrieval and analysis of the data. TransCirc can serve as an important resource for investigating the translation capacity of circRNAs and the potential circRNA-encoded peptides, and can be expanded to include new evidences or additional species in the future.


2001 ◽  
Vol 183 (6) ◽  
pp. 1909-1920 ◽  
Author(s):  
Jesús Mercado-Blanco ◽  
Koen M. G. M. van der Drift ◽  
Per E. Olsson ◽  
Jane E. Thomas-Oates ◽  
Leendert C. van Loon ◽  
...  

ABSTRACT Mutants of Pseudomonas fluorescens WCS374 defective in biosynthesis of the fluorescent siderophore pseudobactin still display siderophore activity, indicating the production of a second siderophore. A recombinant cosmid clone (pMB374-07) of a WCS374 gene library harboring loci necessary for the biosynthesis of salicylic acid (SA) and this second siderophore pseudomonine was isolated. The salicylate biosynthesis region of WCS374 was localized in a 5-kb EcoRI fragment of pMB374-07. The SA and pseudomonine biosynthesis region was identified by transfer of cosmid pMB374-07 to a pseudobactin-deficient strain of P. putida. Sequence analysis of the 5-kb subclone revealed the presence of four open reading frames (ORFs). Products of two ORFs (pmsC andpmsB) showed homologies with chorismate-utilizing enzymes; a third ORF (pmsE) encoded a protein with strong similarity with enzymes involved in the biosynthesis of siderophores in other bacterial species. The region also contained a putative histidine decarboxylase gene (pmsA). A putative promoter region and two predicted iron boxes were localized upstream of pmsC. We determined by reverse transcriptase-mediated PCR that thepmsCEAB genes are cotranscribed and that expression is iron regulated. In vivo expression of SA genes was achieved in P. putida and Escherichia coli cells. In E. coli, deletions affecting the first ORF (pmsC) diminished SA production, whereas deletion of pmsBabolished it completely. The pmsB gene induced low levels of SA production in E. coli when expressed under control of the lacZ promoter. Several lines of evidence indicate that SA and pseudomonine biosynthesis are related. Moreover, we isolated a Tn5 mutant (374-05) that is simultaneously impaired in SA and pseudomonine production.


2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Vicki Ann Luna ◽  
Kimmy Nguyen ◽  
Damian H. Gilling

The distribution of the virulent plasmid pBC210 of B. cereus that carries several B. anthracis genes and has been implicated in lethal anthrax-like pulmonary disease is unknown. We screened our collection of 103 B. cereus isolates and 256 soil samples using a quantitative PCR (qPCR) assay that targeted three open reading frames putatively unique to pBC210. When tested with DNA from 2 B. cereus strains carrying pBC210, and 64 Gram-positive and 55 Gram-negative bacterial species, the assay had 100% sensitivity and specificity. None of the DNA from the B. cereus isolates yielded positive amplicons but DNA extracted from five soils collected in Florida gave positive results for all three target sequences of pBC210. While screening confirms that pBC210 is uncommon in B. cereus, this study is the first to report that pBC210 is present in Florida soils. This study improves our knowledge of the distribution of pBC210 in soils and, of public health importance, the potential threat of B. cereus isolates carrying the toxin-carrying plasmid. We demonstrated that sequences of pBC210 can be found in a larger geographical area than previously thought and that finding more B. cereus carrying the virulent plasmid is a possibility in the future.


2009 ◽  
Vol 191 (7) ◽  
pp. 2257-2265 ◽  
Author(s):  
Mark R. Davies ◽  
Josephine Shera ◽  
Gary H. Van Domselaar ◽  
Kadaba S. Sriprakash ◽  
David J. McMillan

ABSTRACT Lateral gene transfer is a significant contributor to the ongoing evolution of many bacterial pathogens, including β-hemolytic streptococci. Here we provide the first characterization of a novel integrative conjugative element (ICE), ICESde3396, from Streptococcus dysgalactiae subsp. equisimilis (group G streptococcus [GGS]), a bacterium commonly found in the throat and skin of humans. ICESde3396 is 64 kb in size and encodes 66 putative open reading frames. ICESde3396 shares 38 open reading frames with a putative ICE from Streptococcus agalactiae (group B streptococcus [GBS]), ICESa2603. In addition to genes involves in conjugal processes, ICESde3396 also carries genes predicted to be involved in virulence and resistance to various metals. A major feature of ICESde3396 differentiating it from ICESa2603 is the presence of an 18-kb internal recombinogenic region containing four unique gene clusters, which appear to have been acquired from streptococcal and nonstreptococcal bacterial species. The four clusters include two cadmium resistance operons, an arsenic resistance operon, and genes with orthologues in a group A streptococcus (GAS) prophage. Streptococci that naturally harbor ICESde3396 have increased resistance to cadmium and arsenate, indicating the functionality of genes present in the 18-kb recombinogenic region. By marking ICESde3396 with a kanamycin resistance gene, we demonstrate that the ICE is transferable to other GGS isolates as well as GBS and GAS. To investigate the presence of the ICE in clinical streptococcal isolates, we screened 69 isolates (30 GGS, 19 GBS, and 20 GAS isolates) for the presence of three separate regions of ICESde3396. Eleven isolates possessed all three regions, suggesting they harbored ICESde3396-like elements. Another four isolates possessed ICESa2603-like elements. We propose that ICESde3396 is a mobile genetic element that is capable of acquiring DNA from multiple bacterial sources and is a vehicle for dissemination of this DNA through the wider β-hemolytic streptococcal population.


2021 ◽  
Vol 75 (1) ◽  
pp. 649-672
Author(s):  
Eduardo A. Groisman ◽  
Carissa Chan

Mg2+ is the most abundant divalent cation in living cells. It is essential for charge neutralization, macromolecule stabilization, and the assembly and activity of ribosomes and as a cofactor for enzymatic reactions. When experiencing low cytoplasmic Mg2+, bacteria adopt two main strategies: They increase the abundance and activity of Mg2+ importers and decrease the abundance of Mg2+-chelating ATP and rRNA. These changes reduce regulated proteolysis by ATP-dependent proteases and protein synthesis in a systemic fashion. In many bacterial species, the transcriptional regulator PhoP controls expression of proteins mediating these changes. The 5′ leader region of some mRNAs responds to low cytoplasmic Mg2+ or to disruptions in translation of open reading frames in the leader regions by furthering expression of the associated coding regions, which specify proteins mediating survival when the cytoplasmic Mg2+ concentration is low. Microbial species often utilize similar adaptation strategies to cope with low cytoplasmic Mg2+ despite relying on different genes to do so.


2018 ◽  
Author(s):  
Erik JJ Eppenhof ◽  
Lourdes Peña-Castillo

Bacterial small non-coding RNAs (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel putative sRNAs reported in the literature makes it impractical to validate in the wet lab each of them. In this work, we applied five machine learning approaches to construct twenty models to discriminate bona fide sRNAs from random genomic sequences in five bacterial species. Sequences were represented using seven features including free energy of their predicted secondary structure, their distances to the closest predicted promoter site and Rho-independent terminator, and their distance to the closest open reading frames (ORFs). To automatically calculate these features, we developed an sRNA Characterization Pipeline (sRNACharP). All sevens features used in the classification task contributed positively to the performance of the predictive models. The five best performing models obtained a median precision of 100% at 10% recall and of 60% at 40% recall across all five bacterial species. Our results suggest that even though there is limited sRNA sequence conservation across different bacterial species, there are intrinsic features of sRNAs that are conserved across taxa. We show that these features are exploited by machine learning approaches to learn a species-independent model to prioritize bona fide bacterial sRNAs.


2021 ◽  
Vol 19 (3) ◽  
pp. 519-528
Author(s):  
Dao Trong Khoa ◽  
Do Thi Huyen ◽  
Truong Nam Hai

Endo-1,4-beta-xylanases (xylanases) are classified into 9 glycoside hydrolase families, GH5, 8, 10, 11, 30, 43, 51, 98, and 141 based on the CAZy database. The probe sequences representing the enzymes were constructed from published sequences of actual experimental studies with xylan decomposition activity. From online databases, we found one sequence belonging to the GH5 family, 6 sequences belonging to the GH8 family and 5 sequences belonging to the GH30 family exhibiting xylanase activity. Thus specific probes for xylanase GH8 and GH30 families were designed with the length of 351 and 425 amino acids respectively. The reference values for the probe of the GH8 family were defined as the sequences with maximum score greater than 168, the lowest coverage was 84%, the lowest similarity was 36%; for the probe GH30, the maximum score was greater than 316, the coverage was greater than 98%, the similarity was greater than 41%. Using the built probes, including the probe of the two GH10 and GH11 families, we found 41 xylanase-encoding sequences from the metagenomic DNA data of bacteria in Vietnamese goats’rumen. Of the 41 exploited sequences, 19 were identical to the BGI company's annotation result based on KEGG database, whereas there were 16 sequences that are not annotated by the BGI company. Total 28 of 41 exploited sequences were complete open reading frames, of which the predicted ternary structure was highly similar to the published structures of xylanase.


2018 ◽  
Author(s):  
Erik JJ Eppenhof ◽  
Lourdes Peña-Castillo

Bacterial small non-coding RNAs (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel putative sRNAs reported in the literature makes it impractical to validate in the wet lab each of them. In this work, we applied five machine learning approaches to construct twenty models to discriminate bona fide sRNAs from random genomic sequences in five bacterial species. Sequences were represented using seven features including free energy of their predicted secondary structure, their distances to the closest predicted promoter site and Rho-independent terminator, and their distance to the closest open reading frames (ORFs). To automatically calculate these features, we developed an sRNA Characterization Pipeline (sRNACharP). All sevens features used in the classification task contributed positively to the performance of the predictive models. The five best performing models obtained a median precision of 100% at 10% recall and of 60% at 40% recall across all five bacterial species. Our results suggest that even though there is limited sRNA sequence conservation across different bacterial species, there are intrinsic features of sRNAs that are conserved across taxa. We show that these features are exploited by machine learning approaches to learn a species-independent model to prioritize bona fide bacterial sRNAs.


Sign in / Sign up

Export Citation Format

Share Document