Bioinf-PHP: Bioinformatics Pipeline for Protein Homology and Phylogeny

Author(s):  
Michael Zhou ◽  
Yongsheng Bai
2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


2021 ◽  
Vol 22 (15) ◽  
pp. 8012
Author(s):  
Rongxin Zhang ◽  
Yajun Liu ◽  
Xingxing Zhang ◽  
Ke Xiao ◽  
Yue Hou ◽  
...  

G-quadruplexes are the non-canonical nucleic acid structures that are preferentially formed in G-rich regions. This structure has been shown to be associated with many biological functions. Regardless of the broad efforts on DNA G-quadruplexes, we still have limited knowledge on RNA G-quadruplexes, especially in a transcriptome-wide manner. Herein, by integrating the DMS-seq and the bioinformatics pipeline, we profiled and depicted the RNA G-quadruplexes in the human transcriptome. The genes that contain RNA G-quadruplexes in their specific regions are significantly related to immune pathways and the COVID-19-related gene sets. Bioinformatics analysis reveals the potential regulatory functions of G-quadruplexes on miRNA targeting at the scale of the whole transcriptome. In addition, the G-quadruplexes are depleted in the putative, not the real, PAS-strong poly(A) sites, which may weaken the possibility of such sites being the real cleaved sites. In brief, our study provides insight into the potential function of RNA G-quadruplexes in post-transcription.


2019 ◽  
Author(s):  
Yu Liu ◽  
Paul W Bible ◽  
Bin Zou ◽  
Qiaoxing Liang ◽  
Cong Dong ◽  
...  

Abstract Motivation Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databases. Results We present computational subtraction-based microbiome discovery (CSMD), a bioinformatics pipeline specifically developed to generate accurate species-level microbiome profiles for clinical samples with low microbial loads. CSMD applies strategies for the maximal elimination of host sequences with minimal loss of microbial signal and effectively detects microorganisms present in the sample with minimal false positives using a stepwise convergent solution. CSMD was benchmarked in a comparative evaluation with other classic tools on previously published well-characterized datasets. It showed higher sensitivity and specificity in host sequence removal and higher specificity in microbial identification, which led to more accurate abundance estimation. All these features are integrated into a free and easy-to-use tool. Additionally, CSMD applied to cell-free plasma DNA showed that microbial diversity within these samples is substantially broader than previously believed. Availability and implementation CSMD is freely available at https://github.com/liuyu8721/csmd. Supplementary information Supplementary data are available at Bioinformatics online.


RNA Biology ◽  
2021 ◽  
pp. 1-6
Author(s):  
Bhaskar Shukla ◽  
Sanchita Gupta ◽  
Gaurava Srivastava ◽  
Ashok Sharma ◽  
Ashutosh K. Shukla ◽  
...  

2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Tal Domanovich-Asor ◽  
Yair Motro ◽  
Boris Khalfin ◽  
Hillary A. Craddock ◽  
Avi Peretz ◽  
...  

Antimicrobial resistance (AMR) in Helicobacter pylori is increasing and can result in treatment failure and inappropriate antibiotic usage. This study used whole genome sequencing (WGS) to comprehensively analyze the H. pylori resistome and phylogeny in order to characterize Israeli H. pylori. Israeli H. pylori isolates (n = 48) underwent antimicrobial susceptibility testing (AST) against five antimicrobials and WGS analysis. Literature review identified 111 mutations reported to correlate with phenotypic resistance to these antimicrobials. Analysis was conducted via our in-house bioinformatics pipeline targeting point mutations in the relevant genes (pbp1A, 23S rRNA, gyrA, rdxA, frxA, and rpoB) in order to assess genotype-to-phenotype correlation. Resistance rates of study isolates were as follows: clarithromycin 54%, metronidazole 31%, amoxicillin 10%, rifampicin 4%, and levofloxacin 2%. Genotype-to-phenotype correlation was inconsistent; for every analyzed gene at least one phenotypically susceptible isolate was found to have a mutation previously associated with resistance. This was also observed regarding mutations commonly used in commercial kits to diagnose AMR in H. pylori cases. Furthermore, 11 novel point mutations associated with a resistant phenotype were detected. Analysis of a unique set of H. pylori isolates demonstrates that inferring resistance phenotypes from WGS in H. pylori remains challenging and should be optimized further.


2021 ◽  
Author(s):  
H. Serhat Tetikol ◽  
Kubra Narci ◽  
Deniz Turgut ◽  
Gungor Budak ◽  
Ozem Kalay ◽  
...  

ABSTRACTGraph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference for capturing the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based bioinformatics toolkits, how to curate genomic variants and subsequently construct genome graphs remains an understudied problem that inevitably determines the effectiveness of the end-to-end bioinformatics pipeline. In this study, we discuss major obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and test the proposed approach on the whole-genome samples of African ancestry. Our results show that, as more representative alternatives to linear or generic graph references, population-specific graphs can achieve significantly lower read mapping errors, increased variant calling sensitivity and provide the improvements of joint variant calling without the need of computationally intensive post-processing steps.


Sign in / Sign up

Export Citation Format

Share Document