sequence bias
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 11)

H-INDEX

12
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Artis Linārs ◽  
Ivars Silamikelis ◽  
Dita Gudra ◽  
Ance Roga ◽  
Dāvids Fridmanis

Over the decades the improvement of naturally occurring proteins and creation of novel ones has been the primary goal for many practical biotechnology researchers and it is widely recognized that randomization of protein sequences coupled to various effect screening methodologies is one of the most powerful techniques for fast, efficient and purposeful approach for acquisition of desired improvements. Over the years considerable advancements have been made in this field, however development of PCR based or template guided methodologies has been hampered by the resulting template sequence bias. In this article we present novel whole plasmid amplification based approach, which we named OverFlap PCR, for randomization of virtually any region of the plasmid DNA, without introduction of mentioned bias.


2021 ◽  
Vol 8 ◽  
Author(s):  
Paul Riggs ◽  
George Blundell-Hunter ◽  
Joanna Hagelberger ◽  
Guoping Ren ◽  
Laurence Ettwiller ◽  
...  

Transposable elements (TE) are mobile genetic elements, present in all domains of life. They commonly encode a single transposase enzyme, that performs the excision and reintegration reactions, and these enzymes have been used in mutagenesis and creation of next-generation sequencing libraries. All transposases have some bias in the DNA sequence they bind to when reintegrating the TE DNA. We sought to identify a transposase that showed minimal sequence bias and could be produced recombinantly, using information from the literature and a novel bioinformatic analysis, resulting in the selection of the hATx-6 transposase from Hydra vulgaris (aka Hydra magnipapillata) for further study. This transposase was tested and shown to be active both in vitro and in vivo, and we were able to demonstrate very low sequence bias in its integration preference. This transposase could be an excellent candidate for use in biotechnology, such as the creation of next-generation sequencing libraries.


2021 ◽  
Author(s):  
Saulius Vainauskas ◽  
Hélène Guntz ◽  
Elizabeth McLeod ◽  
Colleen McClung ◽  
Cristian Ruse ◽  
...  

AbstractAnalysis of mucin type O-glycans linked to serine/threonine of glycoproteins is technically challenging, in part, due to a lack of effective enzymatic tools that enable their analysis. Recently, several O-glycan-specific endoproteases that can cleave the protein adjacent to the appended glycan have been described. Despite significant progress in understanding the biochemistry of these enzymes, known O-glycoproteases have specificity constrains, such as inefficient cleavage of glycoproteins bearing sialylated O-glycans, high selectivity for certain type of glycoproteins or protein sequence bias, that limit their analytical application. In this study, we examined the capabilities of an immunomodulating metalloprotease (IMPa) from Pseudomonas aeruginosa. The peptide substrate sequence selectivity and its impact on IMPa activity was interrogated using an array of synthetic peptides and their glycoforms. We show that IMPa has no specific P1 residue preference and can tolerate most amino acids at the P1 position, except aspartic acid. The enzyme does not cleave between two adjacent O-glycosites, indicating that O-glycosylated serine/threonine is not allowed at position P1. Glycopeptides with as few as two amino acids on either side of an O-glycosite were specifically cleaved by IMPa. Finally, IMPa efficiently cleaved peptides and proteins carrying sialylated and asialylated O-glycans of varying complexity. We present the use of IMPa in a one-step O-glycoproteomics workflow for glycoprofiling of individual purified glycoproteins granulocyte colony-stimulating factor (G-CSF) and receptor-type tyrosine-protein phosphatase C (CD45) without the need for glycopeptide enrichment. In these examples, IMPa enabled identification of O-glycosites and the range of complex O-glycan structures at each site.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Stoler ◽  
Anton Nekrutenko

Abstract Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Nathaniel P. Delos Santos ◽  
Lorane Texari ◽  
Christopher Benner

Abstract Background Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. Results We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. Conclusions Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i276-i284 ◽  
Author(s):  
Zichao Yan ◽  
William L Hamilton ◽  
Mathieu Blanchette

Abstract Motivation RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor. Results In this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify an important type of sequence bias caused by the RNase T1 enzyme used in many CLIP-Seq experiments, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically interpretable representations of the learned sequence and structural motifs. Availability and implementation Source code can be accessed at https://www.github.com/HarveyYan/RNAonGraph. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Zichao Yan ◽  
William L. Hamilton ◽  
Mathieu Blanchette

AbstractMotivationRNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor.ResultsIn this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify particular type of sequence bias present in many CLIP-Seq data sets, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically-interpretable representations of the learned sequence and structural motifs.AvailabilitySource code can be accessed at https://www.github.com/HarveyYan/[email protected], [email protected]


2019 ◽  
Vol 48 (2) ◽  
pp. e11-e11 ◽  
Author(s):  
Willow Coyote-Maestas ◽  
David Nedrud ◽  
Steffan Okorafor ◽  
Yungui He ◽  
Daniel Schmidt

Abstract Domain recombination is a key principle in protein evolution and protein engineering, but inserting a donor domain into every position of a target protein is not easily experimentally accessible. Most contemporary domain insertion profiling approaches rely on DNA transposons, which are constrained by sequence bias. Here, we establish Saturated Programmable Insertion Engineering (SPINE), an unbiased, comprehensive, and targeted domain insertion library generation technique using oligo library synthesis and multi-step Golden Gate cloning. Through benchmarking to MuA transposon-mediated library generation on four ion channel genes, we demonstrate that SPINE-generated libraries are enriched for in-frame insertions, have drastically reduced sequence bias as well as near-complete and highly-redundant coverage. Unlike transposon-mediated domain insertion that was severely biased and sparse for some genes, SPINE generated high-quality libraries for all genes tested. Using the Inward Rectifier K+ channel Kir2.1, we validate the practical utility of SPINE by constructing and comparing domain insertion permissibility maps. SPINE is the first technology to enable saturated domain insertion profiling. SPINE could help explore the relationship between domain insertions and protein function, and how this relationship is shaped by evolutionary forces and can be engineered for biomedical applications.


Sign in / Sign up

Export Citation Format

Share Document