Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions

Abstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and has been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain is incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapts under external pressure. Results The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 120 hypothetical genes or pseudogenes, and resulted in the addition of 65 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 144 (74%) as coding genes. The remaining pseudogenes without explicit function were removed. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3’ non-coding region) positioned 19-bp away from the lac I gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing.Conclusion The ER2566 strain is used by both the general scientific community and the biotechnology industry. Reannotation of the E.coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might involve in the transcription and translation of genes encoding the lactose operon repressor. We propose that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study may facilitate a better understanding of gene function for the ER2566 strain under external burden and provide more clues to engineer bacteria for biotechnological applications.

Download Full-text

Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions

10.21203/rs.2.21231/v2 ◽

2020 ◽

Author(s):

Lizhi Zhou ◽

Kaihang Wang ◽

Tingting Chen ◽

Yue Ma ◽

Yang Huang ◽

...

Keyword(s):

Escherichia Coli ◽

High Throughput ◽

High Speed ◽

High Throughput Sequencing ◽

Recombinant Protein Expression ◽

External Pressure ◽

Universal Method ◽

Coding Region ◽

Protein Coding ◽

Human Papillomavirus 16

Abstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and has been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain is incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapts under external pressure. Results The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 120 hypothetical genes or pseudogenes, and resulted in the addition of 65 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 144 (74%) as coding genes. The remaining pseudogenes without explicit function were removed. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3’ non-coding region) positioned 19-bp away from the lac I gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing. Conclusion The ER2566 strain is used by both the general scientific community and the biotechnology industry. Reannotation of the E.coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might involve in the transcription and translation of genes encoding the lactose operon repressor. We propose that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study may facilitate a better understanding of gene function for the ER2566 strain under external burden and provide more clues to engineer bacteria for biotechnological applications.

Download Full-text

Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions

10.21203/rs.2.21231/v3 ◽

2020 ◽

Author(s):

Lizhi Zhou ◽

Hai Yu ◽

Kaihang Wang ◽

Tingting Chen ◽

Yue Ma ◽

...

Keyword(s):

Escherichia Coli ◽

High Throughput ◽

High Speed ◽

High Throughput Sequencing ◽

Recombinant Protein Expression ◽

External Pressure ◽

Universal Method ◽

Coding Region ◽

Human Papillomavirus 16 ◽

E Coli

Abstract Background: The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapted under external pressure.Results: The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3’ non-coding region) positioned 19-bp away from the lacI gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing.Conclusion: The ER2566 strain was used by both the general scientific community and the biotechnology industry. Reannotation of the E. coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor. We proposed that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications.

Download Full-text

A High-Throughput Strategy for Recombinant Protein Expression and Solubility Screen in Escherichia coli : A Case of Sensor Histidine Kinase

Methods in Molecular Biology - Histidine Phosphorylation ◽

10.1007/978-1-4939-9884-5_2 ◽

2019 ◽

pp. 19-36 ◽

Cited By ~ 1

Author(s):

Agnieszka Szmitkowska ◽

Blanka Pekárová ◽

Jan Hejátko

Keyword(s):

Escherichia Coli ◽

Recombinant Protein ◽

Protein Expression ◽

High Throughput ◽

Histidine Kinase ◽

Recombinant Protein Expression ◽

Sensor Histidine Kinase

Download Full-text

High-Throughput Identification of MiR-145 Targets in Human Articular Chondrocytes

Life ◽

10.3390/life10050058 ◽

2020 ◽

Vol 10 (5) ◽

pp. 58

Author(s):

Aida Martinez-Sanchez ◽

Stefano Lazzarano ◽

Eshita Sharma ◽

Helen Lockstone ◽

Christopher L. Murphy

Keyword(s):

Cartilage Repair ◽

High Throughput ◽

High Throughput Sequencing ◽

Articular Chondrocytes ◽

Articular Chondrocyte ◽

Coding Region ◽

Cartilage Development ◽

Rna Immunoprecipitation ◽

Important Regulator ◽

Mmp13 Expression

MicroRNAs (miRNAs) play key roles in cartilage development and homeostasis and are dysregulated in osteoarthritis. MiR-145 modulation induces profound changes in the human articular chondrocyte (HAC) phenotype, partially through direct repression of SOX9. Since miRNAs can simultaneously silence multiple targets, we aimed to identify the whole targetome of miR-145 in HACs, critical if miR-145 is to be considered a target for cartilage repair. We performed RIP-seq (RNA-immunoprecipitation and high-throughput sequencing) of miRISC (miRNA-induced silencing complex) in HACs overexpressing miR-145 to identify miR-145 direct targets and used cWords to assess enrichment of miR-145 seed matches in the identified targets. Further validations were performed by RT-qPCR, Western immunoblot, and luciferase assays. MiR-145 affects the expression of over 350 genes and directly targets more than 50 mRNAs through the 3′UTR or, more commonly, the coding region. MiR-145 targets DUSP6, involved in cartilage organization and development, at the translational level. DUSP6 depletion leads to MMP13 upregulation, suggesting a contribution towards the effect of miR-145 on MMP13 expression. In conclusion, miR-145 directly targets several genes involved in the expression of the extracellular matrix and inflammation in primary chondrocytes. Thus, we propose miR-145 as an important regulator of chondrocyte function and a new target for cartilage repair.

Download Full-text

Multilocus Characterization, Gene Expression Analysis of Putative Immunodominant Protein Coding Regions, and Development of Recombinase Polymerase Amplification Assay for Detection of ‘Candidatus Phytoplasma Pruni’ in Prunus avium

Phytopathology ◽

10.1094/phyto-09-18-0326-r ◽

2019 ◽

Vol 109 (6) ◽

pp. 983-992 ◽

Cited By ~ 4

Author(s):

Dan Edward V. Villamor ◽

Kenneth C. Eastwell

Keyword(s):

High Throughput Sequencing ◽

Sweet Cherry ◽

Prunus Avium ◽

Protein A ◽

Recombinase Polymerase Amplification ◽

Sequencing Data ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Reverse Transcription Pcr

Western X (WX) disease, caused by ‘Candidatus Phytoplasma pruni’, is a devastating disease of sweet cherry resulting in the production of small, bitter-flavored fruits that are unmarketable. Escalation of WX disease in Washington State prompted the development of a rapid detection assay based on recombinase polymerase amplification (RPA) to facilitate timely removal and replacement of diseased trees. Here, we report on a reliable RPA assay targeting putative immunodominant protein coding regions that showed comparable sensitivity to polymerase chain reaction (PCR) in detecting ‘Ca. Phytoplasma pruni’ from crude sap of sweet cherry tissues. Apart from the predominant strain of ‘Ca. Phytoplasma pruni’, the RPA assay also detected a novel strain of phytoplasma from several WX-affected trees. Multilocus sequence analyses using the immunodominant protein A (idpA), imp, rpoE, secY, and 16S ribosomal RNA regions from several ‘Ca. Phytoplasma pruni’ isolates from WX-affected trees showed that this novel phytoplasma strain represents a new subgroup within the 16SrIII group. Examination of high-throughput sequencing data from total RNA of WX-affected trees revealed that the imp coding region is highly expressed, and as supported by quantitative reverse transcription PCR data, it showed higher RNA transcript levels than the previously proposed idpA coding region of ‘Ca. Phytoplasma pruni’.

Download Full-text

High-throughput sequencing screen reveals novel, transforming RAS mutations in myeloid leukemia patients

Blood ◽

10.1182/blood-2008-04-152157 ◽

2009 ◽

Vol 113 (8) ◽

pp. 1749-1755 ◽

Cited By ~ 84

Author(s):

Jeffrey W. Tyner ◽

Heidi Erickson ◽

Michael W. N. Deininger ◽

Stephanie G. Willis ◽

Christopher A. Eide ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Myeloid Leukemia ◽

High Throughput Sequencing ◽

Point Mutations ◽

Causative Role ◽

Coding Region ◽

Ras Mutations ◽

Oncogenic Ras ◽

Frequent Mutations

Abstract Transforming mutations in NRAS and KRAS are thought to play a causative role in the development of numerous cancers, including myeloid malignancies. Although mutations at amino acids 12, 13, or 61 account for the majority of oncogenic Ras variants, we hypothesized that less frequent mutations at alternate residues may account for disease in some patients with cancer of unexplained genetic etiology. To search for additional, novel RAS mutations, we sequenced all coding exons in NRAS, KRAS, and HRAS in 329 acute myeloid leukemia (AML) patients, 32 chronic myelomonocytic leukemia (CMML) patients, and 96 healthy individuals. We detected 4 “noncanonical” point mutations in 7 patients: N-RasG60E, K-RasV14I, K-RasT74P, and K-RasA146T. All 4 Ras mutants exhibited oncogenic properties in comparison with wild-type Ras in biochemical and functional assays. The presence of transforming RAS mutations outside of positions 12, 13, and 61 reveals that alternate mechanisms of transformation by RAS may be overlooked in screens designed to detect only the most common RAS mutations. Our results suggest that RAS mutations may play a greater role in leukemogenesis than currently believed and indicate that high-throughput screening for mutant RAS alleles in cancer should include analysis of the entire RAS coding region.

Download Full-text

Probiotic potential of Lactobacillus on the intestinal microflora against Escherichia coli induced mice model through high-throughput sequencing

Microbial Pathogenesis ◽

10.1016/j.micpath.2019.103760 ◽

2019 ◽

Vol 137 ◽

pp. 103760 ◽

Cited By ~ 4

Author(s):

Yaping Wang ◽

Aoyun Li ◽

Lihong Zhang ◽

Muhammad Waqas ◽

Khalid Mehmood ◽

...

Keyword(s):

Escherichia Coli ◽

High Throughput ◽

Intestinal Microflora ◽

High Throughput Sequencing ◽

Mice Model ◽

Probiotic Potential

Download Full-text

Ribosomal stalling landscapes revealed by high-throughput inverse toeprinting of mRNA libraries

Life Science Alliance ◽

10.26508/lsa.201800148 ◽

2018 ◽

Vol 1 (5) ◽

pp. e201800148 ◽

Cited By ~ 3

Author(s):

Britta Seip ◽

Guénaël Sacheau ◽

Denis Dupuy ◽

C Axel Innis

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Amino Acid Sequence ◽

High Throughput ◽

Quantitative Measure ◽

Ribosome Profiling ◽

Coding Region ◽

Versatile Tool

Although it is known that the amino acid sequence of a nascent polypeptide can impact its rate of translation, dedicated tools to systematically investigate this process are lacking. Here, we present high-throughput inverse toeprinting, a method to identify peptide-encoding transcripts that induce ribosomal stalling in vitro. Unlike ribosome profiling, inverse toeprinting protects the entire coding region upstream of a stalled ribosome, making it possible to work with random or focused transcript libraries that efficiently sample the sequence space. We used inverse toeprinting to characterize the stalling landscapes of free and drug-boundEscherichia coliribosomes, obtaining a comprehensive list of arrest motifs that were validated in vivo, along with a quantitative measure of their pause strength. Thanks to the modest sequencing depth and small amounts of material required, inverse toeprinting provides a highly scalable and versatile tool to study sequence-dependent translational processes.

Download Full-text

High-throughput recombinant protein expression in Escherichia coli : current status and future perspectives

Open Biology ◽

10.1098/rsob.160196 ◽

2016 ◽

Vol 6 (8) ◽

pp. 160196 ◽

Cited By ~ 103

Author(s):

Baolei Jia ◽

Che Ok Jeon

Keyword(s):

Escherichia Coli ◽

Protein Expression ◽

High Throughput ◽

Rational Design ◽

Genetic Manipulation ◽

Recombinant Protein Expression ◽

Current System ◽

Low Cost ◽

Current Status ◽

Post Translational Modification

The ease of genetic manipulation, low cost, rapid growth and number of previous studies have made Escherichia coli one of the most widely used microorganism species for producing recombinant proteins. In this post-genomic era, challenges remain to rapidly express and purify large numbers of proteins for academic and commercial purposes in a high-throughput manner. In this review, we describe several state-of-the-art approaches that are suitable for the cloning, expression and purification, conducted in parallel, of numerous molecules, and we discuss recent progress related to soluble protein expression, mRNA folding, fusion tags, post-translational modification and production of membrane proteins. Moreover, we address the ongoing efforts to overcome various challenges faced in protein expression in E. coli , which could lead to an improvement of the current system from trial and error to a predictable and rational design.

Download Full-text

SALTS – SURFR (sncRNA) And LAGOOn (lncRNA) Transcriptomics Suite

10.1101/2021.02.08.430280 ◽

2021 ◽

Author(s):

Mohan V Kasukurthi ◽

Dominika Houserova ◽

Yulong Huang ◽

Addison A. Barchie ◽

Justin T. Roberts ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Protein Coding ◽

Web Based ◽

Functional Roles ◽

Sequencing Technologies ◽

Active Research ◽

The Cost ◽

Analysis Platform ◽

Transcriptional Output

ABSTRACTThe widespread utilization of high-throughput sequencing technologies has unequivocally demonstrated that eukaryotic transcriptomes consist primarily (>98%) of non-coding RNA (ncRNA) transcripts significantly more diverse than their protein-coding counterparts.ncRNAs are typically divided into two categories based on their length. (1) ncRNAs less than 200 nucleotides (nt) long are referred as small non-coding RNAs (sncRNAs) and include microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), transfer ribonucleic RNAs (tRNAs), etc., and the majority of these are thought to function primarily in controlling gene expression. That said, the full repertoire of sncRNAs remains fairly poorly defined as evidenced by two entirely new classes of sncRNAs only recently being reported, i.e., snoRNA-derived RNAs (sdRNAs) and tRNA-derived fragments (tRFs). (2) ncRNAs longer than 200 nt long are known as long ncRNAs (lncRNAs). lncRNAs represent the 2nd largest transcriptional output of the cell (behind only ribosomal RNAs), and although functional roles for several lncRNAs have been reported, most lncRNAs remain largely uncharacterized due to a lack of predictive tools aimed at guiding functional characterizations.Importantly, whereas the cost of high-throughput transcriptome sequencing is now feasible for most active research programs, tools necessary for the interpretation of these sequencings typically require significant computational expertise and resources markedly hindering widespread utilization of these datasets. In light of this, we have developed a powerful new ncRNA transcriptomics suite, SALTS, which is highly accurate, markedly efficient, and extremely user-friendly. SALTS stands for SURFR (sncRNA) And LAGOOn (lncRNA) Transcriptomics Suite and offers platforms for comprehensive sncRNA and lncRNA profiling and discovery, ncRNA functional prediction, and the identification of significant differential expressions among datasets. Notably, SALTS is accessed through an intuitive Web-based interface, can be used to analyze either user-generated, standard next-generation sequencing (NGS) output file uploads (e.g., FASTQ) or existing NCBI Sequence Read Archive (SRA) data, and requires absolutely no dataset pre-processing or knowledge of library adapters/oligonucleotides.SALTS constitutes the first publically available, Web-based, comprehensive ncRNA transcriptomic NGS analysis platform designed specifically for users with no computational background, providing a much needed, powerful new resource capable of enabling more widespread ncRNA transcriptomic analyses. The SALTS WebServer is freely available online at http://salts.soc.southalabama.edu.

Download Full-text