sequence coverage
Recently Published Documents





2022 ◽  
Fred Lee ◽  
Xinhao Shao ◽  
Yu Gao ◽  
Alexandra Naba

The extracellular matrix (ECM) is a complex and dynamic meshwork of proteins providing structural support to cells. It also provides biochemical signals governing cellular processes including proliferation and migration. Alterations of ECM structure and/or composition has been shown to lead to, or accompany, many pathological processes including cancer and fibrosis. To understand how the ECM contributes to diseases, we first need to obtain a comprehensive characterization of the ECM of tissues and of its changes during disease progression. Over the past decade, mass-spectrometry-based proteomics has become the state-of-the-art method to profile the protein composition of ECMs. However, existing methods do not fully capture the broad dynamic range of protein abundance in the ECM, nor do they permit to achieve the high coverage needed to gain finer biochemical information, including the presence of isoforms or post-translational modifications. In addition, broadly adopted proteomic methods relying on extended trypsin digestion do not provide structural information on ECM proteins, yet, gaining insights into ECM protein structure is critical to better understanding protein functions. Here, we present the optimization of a time-lapsed proteomic method using limited proteolysis of partially denatured samples and the sequential release of peptides to achieve superior sequence coverage as compared to standard ECM proteomic workflow. Exploiting the spatio-temporal resolution of this method, we further demonstrate how 3-dimensional time-lapsed peptide mapping can identify protein regions differentially susceptible to trypsin and can thus identify sites of post-translational modifications, including protein-protein interactions. We further illustrate how this approach can be leveraged to gain insight on the role of the novel ECM protein SNED1 in ECM homeostasis. We found that the expression of SNED1 expression by mouse embryonic fibroblasts results in the alteration of overall ECM composition and the sequence coverage of certain ECM proteins, raising the possibility that SNED1 could modify accessibility to trypsin by engaging in protein-protein interactions.

2022 ◽  
Xinhao Shao ◽  
Christopher Grams ◽  
Yu Gao

Protein structure is connected with its function and interaction and plays an extremely important role in protein characterization. As one of the most important analytical methods for protein characterization, Proteomics is widely used to determine protein composition, quantitation, interaction, and even structures. However, due to the gap between identified proteins by proteomics and available 3D structures, it was very challenging, if not impossible, to visualize proteomics results in 3D and further explore the structural aspects of proteomics experiments. Recently, two groups of researchers from DeepMind and Baker lab have independently published protein structure prediction tools that can help us obtain predicted protein structures for the whole human proteome. Although there is still debate on the validity of some of the predicted structures, it is no doubt that these represent the most accurate predictions to date. More importantly, this enabled us to visualize the majority of human proteins for the first time. To help other researchers best utilize these protein structure predictions, we present the Sequence Coverage Visualizer (SCV),, a web application for protein sequence coverage 3D visualization. Here we showed a few possible usages of the SCV, including the labeling of post-translational modifications and isotope labeling experiments. These results highlight the usefulness of such 3D visualization for proteomics experiments and how SCV can turn a regular result list into structural insights. Furthermore, when used together with limited proteolysis, we demonstrated that SCV can help validate and compare different protein structures, including predicted ones and existing PDB entries. By performing limited proteolysis on native proteins at various time points, SCV can visualize the progress of the digestion. This time-series data further allowed us to compare the predicted structure and existing PDB entries. Although not deterministic, these comparisons could be used to refine current predictions further and represent an important step towards a complete and correct protein structure database. Overall, SCV is a convenient and powerful tool for visualizing proteomics results.

2022 ◽  
Lev I. Levitsky ◽  
Ksenia Kuznetsova ◽  
Anna A. Kliuchnikova ◽  
Irina Y. Ilina ◽  
Anton O. Goncharov ◽  

Mass spectrometry-based proteome analysis usually implies matching mass spectra of proteolytic peptides to amino acid sequences predicted from nucleic acid sequences. At the same time, due to the stochastic nature of the method when it comes to proteome-wide analysis, in which only a fraction of peptides are selected for sequencing, the completeness of protein sequence identification is undermined. Likewise, the reliability of peptide variant identification in proteogenomic studies is suffering. We propose a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each position in a sequence could be provided by overlapping distinct peptides, thus, confirming the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. The source of overlapping distinct peptides are, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, peptides generated by several proteases with different specificities after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease proteomic datasets and our own data generated for HEK-293 cell line digests obtained using trypsin, LysC and GluC proteases. From 5000 to 8000 protein groups are identified for each digest corresponding to up to 30% of the whole proteome coverage. Most of this coverage was provided by a single read, while up to 7% of the observed protein sequences were covered two-fold and more. The proteogenomic analysis of HEK-293 cell line revealed 36 peptide variants associated with SNP, seven of which were supported by multiple reads. The efficiency of the multiple reads approach depends strongly on the depth of proteome analysis, the digesting features such as the level of miscleavages, and will increase with the number of different proteases used in parallel proteome digestion.

2022 ◽  
Hosoon Choi ◽  
Munok Hwang ◽  
Dhammika Navarathna ◽  
Jing Xu ◽  
Janell Lukey ◽  

The whole genomic sequencing (WGS) of SARS-CoV-2 has been performed extensively and is playing a crucial role in fighting against COVID-19 pandemic. Obtaining sufficient WGS data from clinical samples is often challenging especially from the samples with low viral load. We evaluated two SARS-CoV-2 sequencing protocols for their efficiency/accuracy and limitations. Sequence coverage of >95% was obtained by Swift normalase amplicon SARS-CoV-2 panels (SNAP) protocol for all the samples with Ct ≤ 35 and by COVIDSeq protocol for 97% of samples with Ct ≤ 30. Sample RNA quantitation obtained using digital PCR provided more precise cutoff values. The quantitative digital PCR cutoff values for obtaining 95% coverage are 10.5 copies/μL for SNAP protocol and 147 copies/μL for COVIDSeq protocol. Combining FASTQ files obtained from 2 protocols improved the outcome of sequence analysis by compensating for missing amplicon regions. This process resulted in an increase of sequencing coverage and lineage call precision.

2021 ◽  
Florian Pfaff ◽  
Angele Breithaupt ◽  
Dennis Rubbenstroth ◽  
Sina Nippert ◽  
Christina Baumbach ◽  

Rustrela virus (RusV, species Rubivirus strelense) is a recently discovered relative of rubella virus (RuV) that has been detected in cases of encephalitis across a wide spectrum of mammals, including placental and marsupial animals. Here we diagnosed two additional cases of fatal RusV-associated meningoencephalitis in a South American coati (Nasua nasua) and a Eurasian otter (Lutra lutra) that were detected in a zoological garden with history of prior RusV infections. Both animals showed abnormal movement or unusual behaviour and their brains tested positive for RusV using specific RT-qPCR and RNA in situ hybridization. As previous sequencing of RusV proved to be very challenging, we employed a sophisticated target-specific capture enrichment with specifically designed RNA baits to generate complete RusV genome sequences from both detected encephalitic animals and apparently healthy wild yellow-necked field mice (Apodemus flavicollis). Furthermore, the technique was used to revise three previously published RusV genomes from two encephalitic animals and a wild yellow-necked field mouse. Virus-to-host sequence ratio and thereby sequence coverage improved markedly using the enrichment method as compared to standard procedures. When comparing the newly generated RusV sequences to the previously published RusV genomes, we identified a previously undetected stretch of 309 nucleotides predicted to represent the intergenic region and the sequence encoding the N-terminus of the capsid protein. This indicated that the original RusV sequence was likely incomplete due to misassembly of the genome at a region with an exceptionally high G+C content of >80 mol%, which could not be resolved even by enormous sequencing efforts with standard methods. The updated capsid protein amino acid sequence now resembles those of RuV and ruhugu virus in size and harbours a predicted RNA binding domain that was not encoded in the original RusV genome version. The new sequence data indicate that RusV has the largest overall genome (9,631 nucleotides), intergenic region (290 nucleotides) and capsid protein-encoding sequence (331 codons) within the genus Rubivirus.

2021 ◽  
Vol 5 (2) ◽  
pp. 24
Dino Pećar ◽  
Ivana Čeko ◽  
Lana Salihefendić ◽  
Rijad Konjhodžić

Monitoring of the lineages SARS-CoV-2 is equally important in a fight against COVID-19 epidemics, as is regular RT - PCR testing. Ion AmpliSeq Library kit plus is a robust and validated protocol for library preparation, but certain optimizations for better sequencing results were required. Clinical SARS-CoV-2 samples were transported in three different viral transport mediums (VTM), on arrival at the testing lab, samples were stored on -20OC. Viral RNA isolation was done on an automatic extractor using a magnetic beads-based protocol. Screening for positive SARS-CoV-2 samples was performed on RT–PCR with IVD certified detection kit. This study aims to present results as follows: impact of first PCR cycle variation on library quantity, comparison of VTMs with a quantified library, maximum storage time of virus and correlation between used cDNA synthesis kit with generated target base coverage. Our results confirmed the adequacy of the three tested VTMs for SARS-CoV-2 whole-genome sequencing. Tested cDNA synthesis kits are valid for NGS library preparation and all kits give good quality cDNA uniformed in viral sequence coverage. Results of this report are useful for applicative scientists who work on SARS-CoV-2 whole-genome sequencing to compare and apply good laboratory practice for optimal preparation of the NGS library.

Scientifica ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-17
Tessa Sjahriani ◽  
Eddy Bagus Wasito ◽  
Wiwiek Tyasningsih

A good strategy to conquer the Escherichia coli-cause food-borne disease could be bacteriophages. Porins are a type of β-barrel proteins with diffuse channels and OmpA, which has a role in hydrophilic transport, is the most frequent porin in E. coli; it was also chosen as the potential receptor of the phage. And the Rz/Rz1 was engaged in the breakup of the host bacterial external membrane. This study aimed to analyze the amino acid of OmpA and Rz/Rz1 of lytic bacteriophage from Surabaya, Indonesia. This study employed a sample of 8 bacteriophages from the previous study. The OmpA analysis method was mass spectrometry. Rz/Rz1 was analyzed using PCR, DNA sequencing, Expasy Translation, and Expasy ProtParam. The result obtained 10% to 29% sequence coverage of OmpA, carrying the ligand-binding site. The Rz/Rz1 gene shares a high percentage of 97.04% to 98.89% identities with the Siphoviridae isolate ctTwQ4, partial genome, and Myoviridae isolate cthRA4, partial genome. The Mann–Whitney statistical tests indicate the significant differences between Alanine, Aspartate, Glycine, Proline, Serine ( p = 0.011 ), Asparagine, Cysteine ( p = 0.009 ), Isoleucine ( p = 0.043 ), Lysine ( p = 0.034 ), Methionine ( p = 0.001 ), Threonine ( p = 0.018 ), and Tryptophan ( p = 0.007 ) of OmpA and Rz/Rz1. The conclusion obtained from this study is the fact that OmpA acts as Phage 1, Phage 2, Phage 3, Phage 5, and Phage 6 receptors for its peptide composition comprising the ligand binding site, and Rz/Rz1 participates in host bacteria lysis.

2021 ◽  
István Csabai ◽  
Krisztián Papp ◽  
Dávid Visontai ◽  
József Stéger ◽  
Norbert Solymosi

Abstract The COVID-19 pandemic has been going on for two years now and although many hypotheses have been put forward, its origin remain obscure. We investigated whether the huge public sequencing data archives’ samples collected earlier than the earliest known cases of the pandemic might contain traces of SARS-CoV-2. Here we report the bioinformatic analysis of a metagenome sample set collected from soil on King George Island, Antarctica between 2018-12-24 and 2019-01-13. It contains sequence fragments matching the SARS-CoV-2 reference genome with altogether more than half million nucleotides, covering the complete genome on average 17×. Preliminary phylogeny analysis places the sample close to the known earliest cases. The high sequence coverage rules out chance alignments from other species but possible laboratory contamination cannot be excluded. The sequence harbours a unique combination of mutations, unseen in other samples, so whatever its origin, it can add important piece of information to the puzzle of the ongoing pandemic.

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 961
Kevin McKernan ◽  
Liam Kane ◽  
Yvonne Helbert ◽  
Lei Zhang ◽  
Nathan Houde ◽  

The Psilocybe genus is well known for the synthesis of valuable psychoactive compounds such as Psilocybin, Psilocin, Baeocystin and Aeruginascin. The ubiquity of Psilocybin synthesis in Psilocybe has been attributed to a horizontal gene transfer mechanism of a ~20Kb gene cluster. A recently published highly contiguous reference genome derived from long read single molecule sequencing has underscored interesting variation in this Psilocybin synthesis gene cluster. This reference genome has also enabled the shotgun sequencing of spores from many Psilocybe strains to better catalog the genomic diversity in the Psilocybin synthesis pathway. Here we present the de novo assembly of 81 Psilocybe genomes compared to the P.envy reference genome. Surprisingly, the genomes of Psilocybe galindoi, Psilocybe tampanensis and Psilocybe azurescens lack sequence coverage over the previously described Psilocybin synthesis pathway but do demonstrate amino acid sequence homology to a less contiguous gene cluster and may illuminate the previously proposed evolution of psilocybin synthesis.

Lin Wang ◽  
Jixuan Yang ◽  
Hong Zhang ◽  
Qin Tao ◽  
Yuxin Zhang ◽  

Sign in / Sign up

Export Citation Format

Share Document