Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries

Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical six- to eight-residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases, and the ectodomains of the sheddases, ADAMs 10 and 17. The first library (Lib 10AA) allowed us to identify 104to 105unique cleavage sites over a 1,000-fold dynamic range of NGS counts and produced consensus and optimal cleavage motifs based position-specific scoring matrices. A second SPD-NGS library (Lib hP), which displayed virtually the entire human proteome tiled in contiguous 49 amino acid sequences with 25 amino acid overlaps, enabled us to identify candidate human proteome sequences. We identified up to 104natural linear cut sites, depending on the protease, and captured most of the examples previously identified by proteomics and predicted 10- to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, and renewable, with unprecedented depth of coverage for substrate sequences, and is an important tool for protease biologists interested in protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.

Download Full-text

Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries

10.1101/2020.05.09.086264 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jie Zhou ◽

Shantao Li ◽

Kevin K. Leung ◽

Brian O’Donovan ◽

James Y. Zou ◽

...

Keyword(s):

Next Generation Sequencing ◽

Positive Selection ◽

Synthetic Peptide ◽

Human Proteome ◽

Cleavage Sites ◽

Next Generation ◽

Protein Substrates ◽

Natural Protein ◽

Substrate Peptide ◽

Generation Sequencing

AbstractProteolysis is a major post-translational regulator of biology both inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical 6 to 8 residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases 2, 3, 6, 7 and 8, and the ectodomains of the membrane sheddases, ADAMs 10 and 17. The first library (Lib 10AA) was used to determine substrate cleavage motifs. Lib 10AA contains a highly diverse randomized 10-mer substrate peptide sequences (109 unique members) that was displayed mono-valently on filamentous phage and bound to magnetic beads via an N-terminal biotin. The protease was allowed to cleave the SPD beads, and the released phage subjected to up to three total rounds of positive selection followed by next generation sequencing (NGS). This allowed us to identify from 104 to 105 unique cleavage sites over a 1000-fold dynamic range of NGS counts (ranging from 3-4000), and produced consensus and optimal cleavage motifs based positional sequencing scoring matrices that closely matched synthetic peptide data. A second SPD-NGS library (Lib hP) was constructed that allowed us to identify candidate human proteome sequences. Lib hP displayed virtually the entire human proteome tiled in contiguous 49AA sequences with 25AA overlaps (nearly 1 million members). After three rounds of positive selection we identified up to 104 natural linear cut sites depending on the protease and captured most of the examples previously identified by proteomics (ranging from 30 to 1500) and predicted 10 to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, renewable, with unprecedented depth of coverage for substrate sequences. SPD-NGS is an important tool for protease biologists interested protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.

Download Full-text

Role of Amino Acid Sequences Flanking Dibasic Cleavage Sites in Precursor Proteolytic Processing. The Importance of the First Residue C-Terminal of the Cleavage Site

European Journal of Biochemistry ◽

10.1111/j.1432-1033.1995.tb20192.x ◽

1995 ◽

Vol 227 (3) ◽

pp. 707-714 ◽

Cited By ~ 51

Author(s):

Mohamed Rholam ◽

Noureddine Brakch ◽

Doris Germain ◽

David Y. Thomas ◽

Christine Fahy ◽

...

Keyword(s):

Amino Acid ◽

Cleavage Site ◽

Amino Acid Sequences ◽

Proteolytic Processing ◽

Cleavage Sites

Download Full-text

Improved Coreceptor Usage Prediction and GenotypicMonitoring of R5-to-X4 Transition by Motif Analysis of HumanImmunodeficiency Virus Type 1 env V3 LoopSequences

Journal of Virology ◽

10.1128/jvi.77.24.13376-13388.2003 ◽

2003 ◽

Vol 77 (24) ◽

pp. 13376-13388 ◽

Cited By ~ 313

Author(s):

Mark A. Jensen ◽

Fu-Sheng Li ◽

Angélique B. van ’t Wout ◽

David C. Nickle ◽

Daniel Shriner ◽

...

Keyword(s):

Amino Acid ◽

Virus Type ◽

Amino Acid Sequences ◽

Coreceptor Usage ◽

Motif Analysis ◽

Scoring Method ◽

Ccr5 Chemokine Receptor ◽

Scoring Matrices ◽

Hiv 1

ABSTRACT Early in infection, human immunodeficiency virus type 1 (HIV-1) generally uses the CCR5 chemokine receptor (along with CD4) for cellular entry. In many HIV-1-infected individuals, viral genotypic changes arise that allow the virus to use CXCR4 (either in addition to CCR5 or alone) as an entry coreceptor. This switch has been associated with an acceleration of both CD3+ T-cell decline and progression to AIDS. While it is well known that the V3 loop of gp120 largely determines coreceptor usage and that positively charged residues in V3 play an important role, the process of genetic change in V3 leading to altered coreceptor usage is not well understood. Further, the methods for biological phenotyping of virus for research or clinical purposes are laborious, depend on sample availability, and present biosafety concerns, so reliable methods for sequence-based“ virtual phenotyping” are desirable. We introduce a simple bioinformatic method of scoring V3 amino acid sequences that reliably predicts CXCR4 usage (sensitivity, 84%; specificity, 96%). This score (as determined on the basis of position-specific scoring matrices [PSSM]) can be interpreted as revealing a propensity to use CXCR4 as follows: known R5 viruses had low scores, R5X4 viruses had intermediate scores, and X4 viruses had high scores. Application of the PSSM scoring method to reconstructed virus phylogenies of 11 longitudinally sampled individuals revealed that the development of X4 viruses was generally gradual and involved the accumulation of multiple amino acid changes in V3. We found that X4 viruses were lost in two ways: by the dying off of an established X4 lineage or by mutation back to low-scoring V3 loops.

Download Full-text

Amino acid sequences around exofacial proteolytic cleavage sites of band 3 from bovine and porcine erythrocytes

International Journal of Biochemistry ◽

10.1016/0020-711x(94)90206-2 ◽

1994 ◽

Vol 26 (1) ◽

pp. 133-137 ◽

Cited By ~ 1

Author(s):

Moriyama Ryuichi ◽

Nagatomi Yuji ◽

Hoshino Fumihiko ◽

Making Shio

Keyword(s):

Amino Acid ◽

Proteolytic Cleavage ◽

Band 3 ◽

Amino Acid Sequences ◽

Cleavage Sites

Download Full-text

Analysis of retroviral protease cleavage sites reveals two types of cleavage sites and the structural requirements of the P1 amino acid

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)98720-x ◽

1991 ◽

Vol 266 (22) ◽

pp. 14539-14547

Author(s):

S.C. Pettit ◽

J. Simsic ◽

D.D. Loeb ◽

L. Everitt ◽

C.A. Hutchison ◽

...

Keyword(s):

Amino Acid ◽

Cleavage Sites ◽

Structural Requirements ◽

Protease Cleavage ◽

Protease Cleavage Sites ◽

Retroviral Protease

Download Full-text

Role of Amino Acid Sequences Flanking Dibasic Cleavage Sites in Precursor Proteolytic Processing

European Journal of Biochemistry ◽

10.1111/j.1432-1033.1995.0707p.x ◽

2008 ◽

Vol 227 (3) ◽

pp. 707-714 ◽

Cited By ~ 2

Author(s):

Mohamed Rholam ◽

Noureddine Brakch ◽

Doris Germain ◽

David Y. Thomas ◽

Christine Fahy ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Proteolytic Processing ◽

Cleavage Sites

Download Full-text

Terminal amino acid sequences and proteolytic cleavage sites of mouse mammary tumor virus env gene products.

Journal of Virology ◽

10.1128/jvi.48.1.314-319.1983 ◽

1983 ◽

Vol 48 (1) ◽

pp. 314-319 ◽

Cited By ~ 8

Author(s):

L E Henderson ◽

R Sowder ◽

G Smythers ◽

S Oroszlan

Keyword(s):

Amino Acid ◽

Mouse Mammary Tumor Virus ◽

Amino Acid Sequences ◽

Gene Products ◽

Cleavage Sites ◽

Terminal Amino Acid ◽

Mammary Tumor Virus ◽

Mouse Mammary Tumor ◽

Tumor Virus ◽

Terminal Amino

Download Full-text

Validating amino acid variants in proteogenomics using sequence coverage by multiple reads

10.1101/2022.01.08.475497 ◽

2022 ◽

Author(s):

Lev I. Levitsky ◽

Ksenia Kuznetsova ◽

Anna A. Kliuchnikova ◽

Irina Y. Ilina ◽

Anton O. Goncharov ◽

...

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Cell Line ◽

Protein Sequence ◽

Proteome Analysis ◽

Amino Acid Sequences ◽

Single Nucleotide Variants ◽

Sequence Coverage ◽

Hek 293 ◽

Hek 293 Cell Line

Mass spectrometry-based proteome analysis usually implies matching mass spectra of proteolytic peptides to amino acid sequences predicted from nucleic acid sequences. At the same time, due to the stochastic nature of the method when it comes to proteome-wide analysis, in which only a fraction of peptides are selected for sequencing, the completeness of protein sequence identification is undermined. Likewise, the reliability of peptide variant identification in proteogenomic studies is suffering. We propose a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each position in a sequence could be provided by overlapping distinct peptides, thus, confirming the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. The source of overlapping distinct peptides are, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, peptides generated by several proteases with different specificities after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease proteomic datasets and our own data generated for HEK-293 cell line digests obtained using trypsin, LysC and GluC proteases. From 5000 to 8000 protein groups are identified for each digest corresponding to up to 30% of the whole proteome coverage. Most of this coverage was provided by a single read, while up to 7% of the observed protein sequences were covered two-fold and more. The proteogenomic analysis of HEK-293 cell line revealed 36 peptide variants associated with SNP, seven of which were supported by multiple reads. The efficiency of the multiple reads approach depends strongly on the depth of proteome analysis, the digesting features such as the level of miscleavages, and will increase with the number of different proteases used in parallel proteome digestion.

Download Full-text

Multiple Sites on SARS-CoV-2 Spike Protein are Susceptible to Proteolysis by Cathepsins B, K, L, S, and V

10.1101/2021.02.17.431617 ◽

2021 ◽

Author(s):

Keval Bollavaram ◽

Tiffanie H. Leeman ◽

Maggie W. Lee ◽

Akhil Kulkarni ◽

Sophia G. Upshaw ◽

...

Keyword(s):

Molecular Docking ◽

Amino Acid ◽

Viral Entry ◽

Infection Process ◽

Amino Acid Sequences ◽

Spike Protein ◽

Cleavage Sites ◽

Docking Analysis ◽

Computational Platform ◽

Molecular Docking Analysis

AbstractSARS-CoV-2 is the coronavirus responsible for the COVID-19 pandemic. Proteases are central to the infection process of SARS-CoV-2. Cleavage of the spike protein on the virus’s capsid causes the conformational change that leads to membrane fusion and viral entry into the target cell. Since inhibition of one protease, even the dominant protease like TMPRSS2, may not be sufficient to block SARS-CoV-2 entry into cells, other proteases that may play an activating role and hydrolyze the spike protein must be identified. We identified amino acid sequences in all regions of spike protein, including the S1/S2 region critical for activation and viral entry, that are susceptible to cleavage by furin and cathepsins B, K, L, S, and V using PACMANS, a computational platform that identifies and ranks preferred sites of proteolytic cleavage on substrates, and verified with molecular docking analysis and immunoblotting to determine if binding of these proteases can occur on the spike protein that were identified as possible cleavage sites. Together, this study highlights cathepsins B, K, L, S, and V for consideration in SARS-CoV-2 infection and presents methodologies by which other proteases can be screened to determine a role in viral entry. This highlights additional proteases to be considered in COVID-19 studies, particularly regarding exacerbated damage in inflammatory preconditions where these proteases are generally upregulated.

Download Full-text

Molecular Characterization of Mycoplasma arthritidis Membrane Lipoprotein MAA1

Infection and Immunity ◽

10.1128/iai.68.2.437-442.2000 ◽

2000 ◽

Vol 68 (2) ◽

pp. 437-442 ◽

Cited By ~ 15

Author(s):

Leigh Rice Washburn ◽

Elizabeth J. Miller ◽

Keith E. Weaver

Keyword(s):

Amino Acid ◽

Stop Codon ◽

Premature Stop Codon ◽

Amino Acid Sequences ◽

Signal Peptidase ◽

Cleavage Sites ◽

Mycoplasma Arthritidis ◽

Genes Encoding ◽

Amino Acid Signal

ABSTRACT Genes encoding the Mycoplasma arthritidissurface-exposed lipoprotein MAA1 were cloned and sequenced from MAA1-expressing strains 158p10p9 and PG6, from a low-adherence (LA) variant derived from 158p10p9 that expresses a truncated version of MAA1 (MAA1Δ) and from two MAA1-negative strains, 158 and H39. The deduced amino acid sequences of maa1 from 158p10p9 and PG6 predicted, respectively, 86.5- and 86.4-kDa basic, largely hydrophilic lipoproteins with 29-amino-acid signal peptides and predicted cleavage sites for signal peptidase II (Ala-Ala-Ala↓Cys). The truncation in the LA variant resulted from a G→T substitution at nucleotide 695, which created a premature stop codon. This, in turn, generated a predicted 26.6-kDa prolipoprotein (23.6 kDa after processing), consistent with an M r of ∼24,000 calculated for MAA1Δ. Similarly, absence of MAA1 expression in H39 and 158 resulted from C→A substitutions at nucleotide 208, generating premature stop codons at that site in both strains.

Download Full-text