sPepFinder expedites genome-wide identification of small proteins in bacteria

Mapping Intimacies ◽

10.1101/2020.05.05.079178 ◽

2020 ◽

Author(s):

Lei Li ◽

Yanjie Chao

Keyword(s):

De Novo ◽

Bacterial Species ◽

Computational Prediction ◽

Ribosome Profiling ◽

Support Vector ◽

Initiation Rate ◽

E Coli ◽

Small Proteins ◽

Genome Wide ◽

A Genome

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.

Download Full-text

Translational initiation in E. coli occurs at the correct sites genome-wide in the absence of mRNA-rRNA base-pairing

eLife ◽

10.7554/elife.55002 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 14

Author(s):

Kazuki Saito ◽

Rachel Green ◽

Allen R Buskirk

Keyword(s):

Ribosome Profiling ◽

Base Pairing ◽

Contributing Factors ◽

Translational Initiation ◽

E Coli ◽

Genome Wide ◽

A Genome ◽

Genome Wide Study

Shine-Dalgarno (SD) motifs are thought to play an important role in translational initiation in bacteria. Paradoxically, ribosome profiling studies in E. coli show no correlation between the strength of an mRNA’s SD motif and how efficiently it is translated. Performing profiling on ribosomes with altered anti-Shine-Dalgarno sequences, we reveal a genome-wide correlation between SD strength and ribosome occupancy that was previously masked by other contributing factors. Using the antibiotic retapamulin to trap initiation complexes at start codons, we find that the mutant ribosomes select start sites correctly, arguing that start sites are hard-wired for initiation through the action of other mRNA features. We show that A-rich sequences upstream of start codons promote initiation. Taken together, our genome-wide study reveals that SD motifs are not necessary for ribosomes to determine where initiation occurs, though they do affect how efficiently initiation occurs.

Download Full-text

Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome

Microbiome ◽

10.1186/s40168-020-00981-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Hannes Petruschke ◽

Christian Schori ◽

Sebastian Canzler ◽

Sarah Riesbeck ◽

Anja Poehlein ◽

...

Keyword(s):

Microbial Communities ◽

Intestinal Microbiota ◽

De Novo ◽

Bacterial Species ◽

Intestinal Microbiome ◽

Single Strain ◽

Small Proteins ◽

Human Intestinal Microbiota ◽

Wide Range

Abstract Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.

Download Full-text

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00182-9 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Kingshuk Mukherjee ◽

Massimiliano Rossi ◽

Leena Salmela ◽

Christina Boucher

Keyword(s):

Single Molecule ◽

De Bruijn Graph ◽

Anabas Testudineus ◽

E Coli ◽

Genome Wide ◽

A Genome ◽

De Bruijn ◽

Optical Maps ◽

Definition Of ◽

Numeric Representation

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.

Download Full-text

A genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft lip/palate risk

BMC Genetics ◽

10.1186/1471-2156-15-24 ◽

2014 ◽

Vol 15 (1) ◽

pp. 24 ◽

Cited By ~ 18

Author(s):

Samuel G Younkin ◽

Robert B Scharpf ◽

Holger Schwender ◽

Margaret M Parker ◽

Alan F Scott ◽

...

Keyword(s):

Cleft Lip ◽

De Novo ◽

Candidate Locus ◽

Genome Wide ◽

A Genome ◽

Cleft Lip Palate ◽

Genome Wide Study

Download Full-text

Elucidating acetate tolerance in E. coli using a genome-wide approach

Metabolic Engineering ◽

10.1016/j.ymben.2010.12.001 ◽

2011 ◽

Vol 13 (2) ◽

pp. 214-224 ◽

Cited By ~ 49

Author(s):

Nicholas R. Sandoval ◽

Tirzah Y. Mills ◽

Min Zhang ◽

Ryan T. Gill

Keyword(s):

E Coli ◽

Genome Wide ◽

A Genome

Download Full-text

Energetic Evolution of Cellular Transportomes

10.1101/218396 ◽

2017 ◽

Author(s):

Behrooz Darbani ◽

Douglas B. Kell ◽

Irina Borodina

Keyword(s):

Ion Channels ◽

Bacterial Species ◽

Living Cells ◽

Energetic Efficiency ◽

Cellular Functions ◽

Transporter Proteins ◽

Genome Wide Analysis ◽

Solute Carriers ◽

Genome Wide ◽

A Genome

ABSTRACTTransporter proteins mediate the translocation of substances across the membranes of living cells. We performed a genome-wide analysis of the compositional reshaping of cellular transporters (the transportome) across the kingdoms of bacteria, archaea, and eukarya. We show that the transportomes of eukaryotes evolved strongly towards a higher energetic efficiency, as ATP-dependent transporters diminished and secondary transporters and ion channels proliferated. This change has likely been important in the development of tissues performing energetically costly cellular functions. The transportome analysis also indicated seven bacterial species, includingNeorickettsia risticiiandNeorickettsia sennetsu, as likely origins of the mitochondrion in eukaryotes, due to the restricted presence therein of clear homologues of modern mitochondrial solute carriers.

Download Full-text

Genome assembly of the JD17 soybean provides a new reference genome for Comparative genomics

10.1101/2021.11.23.469778 ◽

2021 ◽

Author(s):

Xinxin Yi ◽

Jing Liu ◽

Shengcai Chen ◽

Hao Wu ◽

Min Liu ◽

...

Keyword(s):

Nitrogen Fixation ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Genomic Analysis ◽

Comparative Genomic ◽

High Quality ◽

Genome Wide ◽

A Genome ◽

Cultivated Soybean

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.

Download Full-text

Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs

BMC Genomics ◽

10.1186/1471-2164-14-648 ◽

2013 ◽

Vol 14 (1) ◽

pp. 648 ◽

Cited By ~ 58

Author(s):

Jeroen Crappé ◽

Wim Van Criekinge ◽

Geert Trooskens ◽

Eisuke Hayakawa ◽

Walter Luyten ◽

...

Keyword(s):

In Silico ◽

Ribosome Profiling ◽

In Silico Prediction ◽

Genome Wide ◽

A Genome ◽

Genome Wide Search

Download Full-text

Each of 3,323 metabolic innovations in the evolution ofE. coliarose through the horizontal transfer of a single DNA segment

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1718997115 ◽

2018 ◽

Vol 116 (1) ◽

pp. 187-192 ◽

Cited By ~ 11

Author(s):

Tin Yau Pang ◽

Martin J. Lercher

Keyword(s):

E Coli ◽

New Genes ◽

Metabolic Adaptations ◽

Genome Wide ◽

A Genome ◽

Individual Strain ◽

History Of ◽

Phenotype Space ◽

Genome Content ◽

Metabolic Models

Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations—those requiring the independent acquisition of multiple new genes—can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53Escherichia colistrains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of theE. coliclade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combinedE. colipangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of theE. colilineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.

Download Full-text

Next generation sequencing allows deeper analysis and understanding of genomes and transcriptomes including aspects to fertility

Reproduction Fertility and Development ◽

10.1071/rd10247 ◽

2011 ◽

Vol 23 (1) ◽

pp. 75 ◽

Cited By ~ 7

Author(s):

Thomas Werner

Keyword(s):

Next Generation Sequencing ◽

Transcriptional Control ◽

Target Genes ◽

De Novo ◽

Alternative Promoters ◽

Next Generation ◽

Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Generation Sequencing

Reproduction and fertility are controlled by specific events naturally linked to oocytes, testes and early embryonal tissues. A significant part of these events involves gene expression, especially transcriptional control and alternative transcription (alternative promoters and alternative splicing). While methods to analyse such events for carefully predetermined target genes are well established, until recently no methodology existed to extend such analyses into a genome-wide de novo discovery process. With the arrival of next generation sequencing (NGS) it becomes possible to attempt genome-wide discovery in genomic sequences as well as whole transcriptomes at a single nucleotide level. This does not only allow identification of the primary changes (e.g. alternative transcripts) but also helps to elucidate the regulatory context that leads to the induction of transcriptional changes. This review discusses the basics of the new technological and scientific concepts arising from NGS, prominent differences from microarray-based approaches and several aspects of its application to reproduction and fertility research. These concepts will then be illustrated in an application example of NGS sequencing data analysis involving postimplantation endometrium tissue from cows.

Download Full-text