scholarly journals Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries

2020 ◽  
Vol 117 (41) ◽  
pp. 25464-25475
Author(s):  
Jie Zhou ◽  
Shantao Li ◽  
Kevin K. Leung ◽  
Brian O’Donovan ◽  
James Y. Zou ◽  
...  

Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical six- to eight-residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases, and the ectodomains of the sheddases, ADAMs 10 and 17. The first library (Lib 10AA) allowed us to identify 104to 105unique cleavage sites over a 1,000-fold dynamic range of NGS counts and produced consensus and optimal cleavage motifs based position-specific scoring matrices. A second SPD-NGS library (Lib hP), which displayed virtually the entire human proteome tiled in contiguous 49 amino acid sequences with 25 amino acid overlaps, enabled us to identify candidate human proteome sequences. We identified up to 104natural linear cut sites, depending on the protease, and captured most of the examples previously identified by proteomics and predicted 10- to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, and renewable, with unprecedented depth of coverage for substrate sequences, and is an important tool for protease biologists interested in protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.

Author(s):  
Jie Zhou ◽  
Shantao Li ◽  
Kevin K. Leung ◽  
Brian O’Donovan ◽  
James Y. Zou ◽  
...  

AbstractProteolysis is a major post-translational regulator of biology both inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical 6 to 8 residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases 2, 3, 6, 7 and 8, and the ectodomains of the membrane sheddases, ADAMs 10 and 17. The first library (Lib 10AA) was used to determine substrate cleavage motifs. Lib 10AA contains a highly diverse randomized 10-mer substrate peptide sequences (109 unique members) that was displayed mono-valently on filamentous phage and bound to magnetic beads via an N-terminal biotin. The protease was allowed to cleave the SPD beads, and the released phage subjected to up to three total rounds of positive selection followed by next generation sequencing (NGS). This allowed us to identify from 104 to 105 unique cleavage sites over a 1000-fold dynamic range of NGS counts (ranging from 3-4000), and produced consensus and optimal cleavage motifs based positional sequencing scoring matrices that closely matched synthetic peptide data. A second SPD-NGS library (Lib hP) was constructed that allowed us to identify candidate human proteome sequences. Lib hP displayed virtually the entire human proteome tiled in contiguous 49AA sequences with 25AA overlaps (nearly 1 million members). After three rounds of positive selection we identified up to 104 natural linear cut sites depending on the protease and captured most of the examples previously identified by proteomics (ranging from 30 to 1500) and predicted 10 to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, renewable, with unprecedented depth of coverage for substrate sequences. SPD-NGS is an important tool for protease biologists interested protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.


2003 ◽  
Vol 77 (24) ◽  
pp. 13376-13388 ◽  
Author(s):  
Mark A. Jensen ◽  
Fu-Sheng Li ◽  
Angélique B. van ’t Wout ◽  
David C. Nickle ◽  
Daniel Shriner ◽  
...  

ABSTRACT Early in infection, human immunodeficiency virus type 1 (HIV-1) generally uses the CCR5 chemokine receptor (along with CD4) for cellular entry. In many HIV-1-infected individuals, viral genotypic changes arise that allow the virus to use CXCR4 (either in addition to CCR5 or alone) as an entry coreceptor. This switch has been associated with an acceleration of both CD3+ T-cell decline and progression to AIDS. While it is well known that the V3 loop of gp120 largely determines coreceptor usage and that positively charged residues in V3 play an important role, the process of genetic change in V3 leading to altered coreceptor usage is not well understood. Further, the methods for biological phenotyping of virus for research or clinical purposes are laborious, depend on sample availability, and present biosafety concerns, so reliable methods for sequence-based“ virtual phenotyping” are desirable. We introduce a simple bioinformatic method of scoring V3 amino acid sequences that reliably predicts CXCR4 usage (sensitivity, 84%; specificity, 96%). This score (as determined on the basis of position-specific scoring matrices [PSSM]) can be interpreted as revealing a propensity to use CXCR4 as follows: known R5 viruses had low scores, R5X4 viruses had intermediate scores, and X4 viruses had high scores. Application of the PSSM scoring method to reconstructed virus phylogenies of 11 longitudinally sampled individuals revealed that the development of X4 viruses was generally gradual and involved the accumulation of multiple amino acid changes in V3. We found that X4 viruses were lost in two ways: by the dying off of an established X4 lineage or by mutation back to low-scoring V3 loops.


1994 ◽  
Vol 26 (1) ◽  
pp. 133-137 ◽  
Author(s):  
Moriyama Ryuichi ◽  
Nagatomi Yuji ◽  
Hoshino Fumihiko ◽  
Making Shio

2008 ◽  
Vol 227 (3) ◽  
pp. 707-714 ◽  
Author(s):  
Mohamed Rholam ◽  
Noureddine Brakch ◽  
Doris Germain ◽  
David Y. Thomas ◽  
Christine Fahy ◽  
...  

2022 ◽  
Author(s):  
Lev I. Levitsky ◽  
Ksenia Kuznetsova ◽  
Anna A. Kliuchnikova ◽  
Irina Y. Ilina ◽  
Anton O. Goncharov ◽  
...  

Mass spectrometry-based proteome analysis usually implies matching mass spectra of proteolytic peptides to amino acid sequences predicted from nucleic acid sequences. At the same time, due to the stochastic nature of the method when it comes to proteome-wide analysis, in which only a fraction of peptides are selected for sequencing, the completeness of protein sequence identification is undermined. Likewise, the reliability of peptide variant identification in proteogenomic studies is suffering. We propose a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each position in a sequence could be provided by overlapping distinct peptides, thus, confirming the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. The source of overlapping distinct peptides are, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, peptides generated by several proteases with different specificities after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease proteomic datasets and our own data generated for HEK-293 cell line digests obtained using trypsin, LysC and GluC proteases. From 5000 to 8000 protein groups are identified for each digest corresponding to up to 30% of the whole proteome coverage. Most of this coverage was provided by a single read, while up to 7% of the observed protein sequences were covered two-fold and more. The proteogenomic analysis of HEK-293 cell line revealed 36 peptide variants associated with SNP, seven of which were supported by multiple reads. The efficiency of the multiple reads approach depends strongly on the depth of proteome analysis, the digesting features such as the level of miscleavages, and will increase with the number of different proteases used in parallel proteome digestion.


2021 ◽  
Author(s):  
Keval Bollavaram ◽  
Tiffanie H. Leeman ◽  
Maggie W. Lee ◽  
Akhil Kulkarni ◽  
Sophia G. Upshaw ◽  
...  

AbstractSARS-CoV-2 is the coronavirus responsible for the COVID-19 pandemic. Proteases are central to the infection process of SARS-CoV-2. Cleavage of the spike protein on the virus’s capsid causes the conformational change that leads to membrane fusion and viral entry into the target cell. Since inhibition of one protease, even the dominant protease like TMPRSS2, may not be sufficient to block SARS-CoV-2 entry into cells, other proteases that may play an activating role and hydrolyze the spike protein must be identified. We identified amino acid sequences in all regions of spike protein, including the S1/S2 region critical for activation and viral entry, that are susceptible to cleavage by furin and cathepsins B, K, L, S, and V using PACMANS, a computational platform that identifies and ranks preferred sites of proteolytic cleavage on substrates, and verified with molecular docking analysis and immunoblotting to determine if binding of these proteases can occur on the spike protein that were identified as possible cleavage sites. Together, this study highlights cathepsins B, K, L, S, and V for consideration in SARS-CoV-2 infection and presents methodologies by which other proteases can be screened to determine a role in viral entry. This highlights additional proteases to be considered in COVID-19 studies, particularly regarding exacerbated damage in inflammatory preconditions where these proteases are generally upregulated.


2000 ◽  
Vol 68 (2) ◽  
pp. 437-442 ◽  
Author(s):  
Leigh Rice Washburn ◽  
Elizabeth J. Miller ◽  
Keith E. Weaver

ABSTRACT Genes encoding the Mycoplasma arthritidissurface-exposed lipoprotein MAA1 were cloned and sequenced from MAA1-expressing strains 158p10p9 and PG6, from a low-adherence (LA) variant derived from 158p10p9 that expresses a truncated version of MAA1 (MAA1Δ) and from two MAA1-negative strains, 158 and H39. The deduced amino acid sequences of maa1 from 158p10p9 and PG6 predicted, respectively, 86.5- and 86.4-kDa basic, largely hydrophilic lipoproteins with 29-amino-acid signal peptides and predicted cleavage sites for signal peptidase II (Ala-Ala-Ala↓Cys). The truncation in the LA variant resulted from a G→T substitution at nucleotide 695, which created a premature stop codon. This, in turn, generated a predicted 26.6-kDa prolipoprotein (23.6 kDa after processing), consistent with an M r of ∼24,000 calculated for MAA1Δ. Similarly, absence of MAA1 expression in H39 and 158 resulted from C→A substitutions at nucleotide 208, generating premature stop codons at that site in both strains.


Sign in / Sign up

Export Citation Format

Share Document