effectR: An expandable R package to predict candidate effectors

ABSTRACTEffectors are by one definition small, secreted proteins that facilitate infection of host plants by all major groups of plant pathogens. Effector protein identification in oomycetes relies on identification of open reading frames with certain amino acid motifs among additional minor criteria. To date, identification of effectors relies on custom scripts to identify motifs in candidate open reading frames. Here, we developed the R package effectR that provides a convenient tool for rapid prediction of effectors in oomycete genomes, or with custom scripts for any genome, in a reproducible way. The effectR package relies on a combination of regular expressions statements and hidden Markov model approaches to predict candidate RxLR and CRN effectors. Other custom motifs for novel effectors can easily be implemented and added to package updates. The effectR package has been validated with published oomycete genomes. This package provides a convenient tool for reproducible identification of candidate effectors in oomycete genomes.

effectR: An Expandable R Package to Predict Candidate RxLR and CRN Effectors in Oomycetes Using Motif Searches

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-10-18-0279-ta ◽

2019 ◽

Vol 32 (9) ◽

pp. 1067-1076 ◽

Cited By ~ 5

Author(s):

Javier F. Tabima ◽

Niklaus J. Grünwald

Keyword(s):

Protein Identification ◽

Plant Pathogens ◽

R Package ◽

Open Reading Frames ◽

Regular Expressions ◽

Convenient Tool ◽

Rapid Prediction ◽

Wet Lab ◽

Minor Criteria ◽

Effectors are small, secreted proteins that facilitate infection of host plants by all major groups of plant pathogens. Effector protein identification in oomycetes relies on identification of open reading frames with certain amino acid motifs among additional minor criteria. To date, identification of effectors relies on custom scripts to identify motifs in candidate open reading frames. Here, we developed the R package effectR, which provides a convenient tool for rapid prediction of effectors in oomycete genomes, or with custom scripts for any genome, in a reproducible way. The effectR package relies on a combination of regular expressions statements and hidden Markov model approaches to predict candidate RxLR and crinkler effectors. Other custom motifs for novel effectors can easily be implemented and added to package updates. The effectR package has been validated with published oomycete genomes. This package provides a convenient tool for wet lab researchers interested in reproducible identification of candidate effectors in oomycete genomes.

ORFhunteR: an accurate approach for the automatic identification and annotation of open reading frames in human mRNA molecules

10.1101/2021.02.05.429963 ◽

2021 ◽

Author(s):

Vasily V. Grinev ◽

Mikalai M. Yatskou ◽

Victor V. Skakun ◽

Maryna K. Chepeleva ◽

Petr V. Nazarov

Keyword(s):

Single Molecule ◽

Web Application ◽

R Package ◽

Nucleotide Sequences ◽

Open Reading Frames ◽

Classification Model ◽

Automatic Identification ◽

Large Set ◽

Link Type ◽

AbstractMotivationModern methods of whole transcriptome sequencing accurately recover nucleotide sequences of RNA molecules present in cells and allow for determining their quantitative abundances. The coding potential of such molecules can be estimated using open reading frames (ORF) finding algorithms, implemented in a number of software packages. However, these algorithms show somewhat limited accuracy, are intended for single-molecule analysis and do not allow selecting proper ORFs in the case of long mRNAs containing multiple ORF candidates.ResultsWe developed a computational approach, corresponding machine learning model and a package, dedicated to automatic identification of the ORFs in large sets of human mRNA molecules. It is based on vectorization of nucleotide sequences into features, followed by classification using a random forest. The predictive model was validated on sets of human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. The developed methods and pre-trained classification model were implemented in a powerful ORFhunteR computational tool that performs an automatic identification of true ORFs among large set of human mRNA molecules.Availability and implementationThe developed open-source R package ORFhunteR is available for the community at GitHub repository (https://github.com/rfctbio-bsu/ORFhunteR), from Bioconductor (https://bioconductor.org/packages/devel/bioc/html/ORFhunteR.html) and as a web application (http://orfhunter.bsu.by).

Identification of Open Reading Frames Unique to a Select Agent: Ralstonia solanacearum Race 3 Biovar 2

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-19-0069 ◽

2006 ◽

Vol 19 (1) ◽

pp. 69-79 ◽

Cited By ~ 90

Author(s):

Dean W. Gabriel ◽

Caitilyn Allen ◽

Mark Schell ◽

Timothy P. Denny ◽

Jean T. Greenberg ◽

...

Keyword(s):

Ralstonia Solanacearum ◽

Plant Pathogens ◽

Draft Genome ◽

Gc Content ◽

Gene Organization ◽

Open Reading Frames ◽

United States Department ◽

Detection And Identification ◽

Reading Frames ◽

Unique Genes

An 8× draft genome was obtained and annotated for Ralstonia solanacearum race 3 biovar 2 (R3B2) strain UW551, a United States Department of Agriculture Select Agent isolated from geranium. The draft UW551 genome consisted of 80,169 reads resulting in 582 contigs containing 5,925,491 base pairs, with an average 64.5% GC content. Annotation revealed a predicted 4,454 protein coding open reading frames (ORFs), 43 tRNAs, and 5 rRNAs; 2,793 (or 62%) of the ORFs had a functional assignment. The UW551 genome was compared with the published genome of R. solanacearum race 1 biovar 3 tropical tomato strain GMI1000. The two phylogenetically distinct strains were at least 71% syntenic in gene organization. Most genes encoding known pathogenicity determinants, including predicted type III secreted effectors, appeared to be common to both strains. A total of 402 unique UW551 ORFs were identified, none of which had a best hit or >45% amino acid sequence identity with any R. solanacearum predicted protein; 16 had strong (E < 10-13) best hits to ORFs found in other bacterial plant pathogens. Many of the 402 unique genes were clustered, including 5 found in the hrp region and 38 contiguous, potential prophage genes. Conservation of some UW551 unique genes among R3B2 strains was examined by polymerase chain reaction among a group of 58 strains from different races and biovars, resulting in the identification of genes that may be potentially useful for diagnostic detection and identification of R3B2 strains. One 22-kb region that appears to be present in GMI1000 as a result of horizontal gene transfer is absent from UW551 and encodes enzymes that likely are essential for utilization of the three sugar alcohols that distinguish biovars 3 and 4 from biovars 1 and 2.

Identification and Characterization of a Gene Cluster for Synthesis of the Polyketide Antibiotic 2,4-Diacetylphloroglucinol from Pseudomonas fluorescens Q2-87

10.1128/jb.181.10.3155-3163.1999 ◽

1999 ◽

Vol 181 (10) ◽

pp. 3155-3163 ◽

Cited By ~ 229

Author(s):

M. Gita Bangera ◽

Linda S. Thomashow

Keyword(s):

Genomic Dna ◽

Plant Pathogens ◽

Polyketide Synthase ◽

Open Reading Frames ◽

Fungal Plant Pathogens ◽

Repressor Proteins ◽

Flanking Regions ◽

Identification And Characterization ◽

The polyketide metabolite 2,4-diacetylphloroglucinol (2,4-DAPG) is produced by many strains of fluorescent Pseudomonas spp. with biocontrol activity against soilborne fungal plant pathogens. Genes required for 2,4-DAPG synthesis by P. fluorescensQ2-87 are encoded by a 6.5-kb fragment of genomic DNA that can transfer production of 2,4-DAPG to 2,4-DAPG-nonproducing recipientPseudomonas strains. In this study the nucleotide sequence was determined for the 6.5-kb fragment and flanking regions of genomic DNA from strain Q2-87. Six open reading frames were identified, four of which (phlACBD) comprise an operon that includes a set of three genes (phlACB) conserved between eubacteria and archaebacteria and a gene (phlD) encoding a polyketide synthase with homology to chalcone and stilbene synthases from plants. The biosynthetic operon is flanked on either side by phlEand phlF, which code respectively for putative efflux and regulatory (repressor) proteins. Expression in Escherichia coli of phlA, phlC, phlB, andphlD, individually or in combination, identified a novel polyketide biosynthetic pathway in which PhlD is responsible for the production of monoacetylphloroglucinol (MAPG). PhlA, PhlC, and PhlB are necessary to convert MAPG to 2,4-DAPG, and they also may function in the synthesis of MAPG.

Identification and Characterization of Phytoplasmal Genes, Employing a Novel Method of Isolating Phytoplasmal Genomic DNA

10.1128/jb.185.22.6513-6521.2003 ◽

2003 ◽

Vol 185 (22) ◽

pp. 6513-6521 ◽

Cited By ~ 8

Author(s):

Sharon Melamed ◽

Edna Tanne ◽

Raz Ben-Haim ◽

Orit Edelbaum ◽

David Yogev ◽

...

Keyword(s):

Genomic Dna ◽

Plant Pathogens ◽

Open Reading Frames ◽

Rrna Genes ◽

Unique Method ◽

Novel Method ◽

Identification And Characterization ◽

Reading Frames ◽

Large Background

ABSTRACT Phytoplasmas are unculturable, insect-transmissible plant pathogens belonging to the class Mollicutes. To be transmitted, the phytoplasmas replicate in the insect body and are delivered to the insect's salivary glands, from where they are injected into the recipient plant. Because phytoplasmas cannot be cultured, any attempt to recover phytoplasmal DNA from infected plants or insects has resulted in preparations with a large background of host DNA. Thus, studies of the phytoplasmal genome have been greatly hampered, and aside from the rRNA genes, only a few genes have hitherto been isolated and characterized. We developed a unique method to obtain host-free phytoplasmal genomic DNA from the insect vector's saliva, and we demonstrated the feasibility of this method by isolating and characterizing 78 new putative phytoplasmal open reading frames and their deduced proteins. Based on the newly accumulated information on phytoplasmal genes, preliminary characteristics of the phytoplasmal genome are discussed.

Identification of GtgE, a Novel Virulence Factor Encoded on the Gifsy-2 Bacteriophage of Salmonella enterica Serovar Typhimurium

10.1128/jb.184.19.5234-5239.2002 ◽

2002 ◽

Vol 184 (19) ◽

pp. 5234-5239 ◽

Cited By ~ 64

Author(s):

Theresa D. Ho ◽

Nara Figueroa-Bossi ◽

Minhua Wang ◽

Sergio Uzzau ◽

Lionello Bossi ◽

...

Keyword(s):

Virulence Factor ◽

Salmonella Enterica ◽

Virulence Genes ◽

Salmonella Enterica Serovar Typhimurium ◽

Systemic Infection ◽

Open Reading Frames ◽

Effector Protein ◽

Protein Sequence Analysis ◽

Serovar Typhimurium ◽

ABSTRACT The Gifsy-2 temperate bacteriophage of Salmonella enterica serovar Typhimurium contributes significantly to the pathogenicity of strains that carry it as a prophage. Previous studies have shown that Gifsy-2 encodes SodCI, a periplasmic Cu/Zn superoxide dismutase, and at least one additional virulence factor. Gifsy-2 encodes a Salmonella pathogenicity island 2 type III secreted effector protein. Sequence analysis of the Gifsy-2 genome also identifies several open reading frames with homology to those of known virulence genes. However, we found that null mutations in these genes did not individually have a significant effect on the ability of S. enterica serovar Typhimurium to establish a systemic infection in mice. Using deletion analysis, we have identified a gene, gtgE, which is necessary for the full virulence of S. enterica serovar Typhimurium Gifsy-2 lysogens. Together, GtgE and SodCI account for the contribution of Gifsy-2 to S. enterica serovar Typhimurium virulence in the murine model.

The Shiga Toxin 1-Converting Bacteriophage BP-4795 Encodes an NleA-Like Type III Effector Protein

10.1128/jb.187.24.8494-8498.2005 ◽

2005 ◽

Vol 187 (24) ◽

pp. 8494-8498 ◽

Cited By ~ 32

Author(s):

Kristina Creuzburg ◽

Jürgen Recktenwald ◽

Volker Kuhle ◽

Sylvia Herold ◽

Michael Hensel ◽

...

Keyword(s):

Shiga Toxin ◽

Type Iii Secretion ◽

Regulatory Region ◽

Secretion System ◽

Open Reading Frames ◽

Effector Protein ◽

Type Iii Effector ◽

Type Iii ◽

Enterocyte Effacement ◽

ABSTRACT In this study, the complete DNA sequence of Shiga toxin 1-converting bacteriophage BP-4795 was determined. The genome of BP-4795 consists of 85 open reading frames, including two complete IS629 elements and three morons at the end of its late regulatory region. One of these morons encodes a type III effector that is translocated by the locus of enterocyte effacement-encoded type III secretion system into HeLa cells, where it localizes with the Golgi apparatus.

utr.annotation: a tool for annotating genomic variants that could influence post-transcriptional regulation

Bioinformatics ◽

10.1093/bioinformatics/btab635 ◽

2021 ◽

Author(s):

Yating Liu ◽

Joseph D Dougherty

Keyword(s):

R Package ◽

Open Reading Frames ◽

Supplementary Information ◽

Translation Start ◽

Upstream Open Reading Frames ◽

Mouse Species ◽

Translational Regulators ◽

Post Transcriptional Regulation ◽

Annotated Translation ◽

Abstract Summary Whole genome sequencing of patient populations is identifying thousands of new variants in UnTranslated Regions(UTRs). While the consequences of UTR mutations are not as easily predicted from primary sequence as coding mutations are, there are some known features of UTRs that modulate their function. utr.annotation is an R package that can be used to annotate potential deleterious variants in the UTR regions for both human and mouse species. Given a CSV or VCF format variant file, utr.annotation provides information of each variant on whether and how it alters known translational regulators including: upstream Open Reading Frames (uORFs), upstream Kozak sequences, polyA signals, Kozak sequences at the annotated translation start site, start codons, and stop codons, conservation scores in the variant position, and whether and how it changes ribosome loading based on a model derived from empirical data. Availability utr.annotation is freely available on Bitbucket (https://bitbucket.org/jdlabteam/utr.annotation/src/master/) and CRAN (https://cran.r-project.org/web/packages/utr.annotation/index.html) Supplementary information Supplementary data are available at https://wustl.box.com/s/yye99bryfin89nav45gv91l5k35fxo7z.