scholarly journals Tally-2.0: upgraded validator of tandem repeat detection in protein sequences

2020 ◽  
Vol 36 (10) ◽  
pp. 3260-3262 ◽  
Author(s):  
Vladimir Perovic ◽  
Jeremy Y Leclercq ◽  
Neven Sumonja ◽  
Francois D Richard ◽  
Nevena Veljkovic ◽  
...  

Abstract Motivation Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. Results Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. Availability and implementation Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. Supplementary information Supplementary data are available at Bioinformatics online.

PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0239154
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.


2019 ◽  
Vol 35 (22) ◽  
pp. 4596-4606 ◽  
Author(s):  
Sijie Chen ◽  
Yixin Chen ◽  
Fengzhu Sun ◽  
Michael S Waterman ◽  
Xuegong Zhang

Abstract Motivation Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions. Results Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. Availability and implementation The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Diederik Cames van Batenburg ◽  
Jasper Linthorst ◽  
Henne Holstege ◽  
Marcel Reinders

AbstractTandem repeats (TRs) are contiguously repetitive sequences with a high mutation rate. Several human diseases have been associated with an expansion of TR, a mutation which constitutes a change in their number of repetitions. Nevertheless, these Variable Number Tandem Repeats (VNTRs) have not been included in many genome-wide studies. The reason is that VNTR genotyping is inaccurate using short-read sequencing while new technology like long-read sequencing is expensive and lacks throughput.Here, we propose a sequence based random forest classifier that is able to predict variable expansion of TR regions, given by incomplete VNTR annotation from long-read sequencing of 5 haplotypes. The classifier mainly predicted VNTRs using the features TR length. The second most used feature is a novel finding: the Mfold predicted likelihood of self-folding for which more stable foldings are correlated with VNTRs. We validated VNTR candidates predicted by this classifier by clustering short-read pileup patterns compared across 17 genomes. TRs labeled VNTR by the classifier showed similar local variance in their pileup [email protected] informationSupplementary data are available at bioRxiv


MicroRNA ◽  
2018 ◽  
Vol 8 (1) ◽  
pp. 86-92 ◽  
Author(s):  
Shili Jiang ◽  
Wei Jiang ◽  
Ying Xu ◽  
Xiaoning Wang ◽  
Yongping Mu ◽  
...  

Background and Objective: Accurately evaluating the severity of liver cirrhosis is essential for clinical decision making and disease management. This study aimed to evaluate the value of circulating levels of microRNA (miR)-26a and miR-21 as novel noninvasive biomarkers in detecting severity of cirrhosis in patients with chronic hepatitis B. </P><P> Methods: Thirty patients with clinically diagnosed chronic hepatitis B-related cirrhosis and 30 healthy individuals were selected. The serum levels of miR-26a and miR-21 were quantified by qRT-PCR. Receiver operating characteristic curve analysis was performed to evaluate the sensitivity and specificity of the miRNAs for detecting the severity of cirrhosis. Results: Serum miR-26a and miR-21 levels were found to be significantly downregulated in patients with severe cirrhosis scored at Child-Pugh class C in comparison to healthy controls (miR-26a p<0.01, and miR-21 p<0.001, respectively). The circulating miR-26a and miR-21 levels in patients were positively correlated with serum albumin concentration but negatively correlated with serum total bilirubin concentration and prothrombin time. Receiver operating characteristic curve analysis revealed that both serum miR-26a and miR-21 levels were associated with a high diagnostic accuracy for patients with cirrhosis scored at Child-Pugh class C (miR-26a Cut-off fold change at ≤0.4, Sensitivity: 84.62%, Specificity: 89.36%, P<0.0001; miR-21 Cut-off fold change at ≤0.6, Sensitivity: 84.62%, Specificity: 78.72%, P<0.0001). Our results indicate that the circulating levels of miR-26a and miR-21 are closely related to the extent of liver decompensation, and the decreased levels are capable of discriminating patients with cirrhosis at Child-Pugh class C from the whole cirrhosis cases.


2019 ◽  
Vol 30 (7-8) ◽  
pp. 221-228
Author(s):  
Shahab Hajibandeh ◽  
Shahin Hajibandeh ◽  
Nicholas Hobbs ◽  
Jigar Shah ◽  
Matthew Harris ◽  
...  

Aims To investigate whether an intraperitoneal contamination index (ICI) derived from combined preoperative levels of C-reactive protein, lactate, neutrophils, lymphocytes and albumin could predict the extent of intraperitoneal contamination in patients with acute abdominal pathology. Methods Patients aged over 18 who underwent emergency laparotomy for acute abdominal pathology between January 2014 and October 2018 were randomly divided into primary and validation cohorts. The proposed intraperitoneal contamination index was calculated for each patient in each cohort. Receiver operating characteristic curve analysis was performed to determine discrimination of the index and cut-off values of preoperative intraperitoneal contamination index that could predict the extent of intraperitoneal contamination. Results Overall, 468 patients were included in this study; 234 in the primary cohort and 234 in the validation cohort. The analyses identified intraperitoneal contamination index of 24.77 and 24.32 as cut-off values for purulent contamination in the primary cohort (area under the curve (AUC): 0.73, P < 0.0001; sensitivity: 84%, specificity: 60%) and validation cohort (AUC: 0.83, P < 0.0001; sensitivity: 91%, specificity: 69%), respectively. Receiver operating characteristic curve analysis also identified intraperitoneal contamination index of 33.70 and 33.41 as cut-off values for feculent contamination in the primary cohort (AUC: 0.78, P < 0.0001; sensitivity: 87%, specificity: 64%) and validation cohort (AUC: 0.79, P < 0.0001; sensitivity: 86%, specificity: 73%), respectively. Conclusions As a predictive measure which is derived purely from biomarkers, intraperitoneal contamination index may be accurate enough to predict the extent of intraperitoneal contamination in patients with acute abdominal pathology and to facilitate decision-making together with clinical and radiological findings.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 949
Author(s):  
Cecil J. Weale ◽  
Don M. Matshazi ◽  
Saarah F. G. Davids ◽  
Shanel Raghubeer ◽  
Rajiv T. Erasmus ◽  
...  

This cross-sectional study investigated the association of miR-1299, -126-3p and -30e-3p with and their diagnostic capability for dysglycaemia in 1273 (men, n = 345) South Africans, aged >20 years. Glycaemic status was assessed by oral glucose tolerance test (OGTT). Whole blood microRNA (miRNA) expressions were assessed using TaqMan-based reverse transcription quantitative-PCR (RT-qPCR). Receiver operating characteristic (ROC) curves assessed the ability of each miRNA to discriminate dysglycaemia, while multivariable logistic regression analyses linked expression with dysglycaemia. In all, 207 (16.2%) and 94 (7.4%) participants had prediabetes and type 2 diabetes mellitus (T2DM), respectively. All three miRNAs were significantly highly expressed in individuals with prediabetes compared to normotolerant patients, p < 0.001. miR-30e-3p and miR-126-3p were also significantly more expressed in T2DM versus normotolerant patients, p < 0.001. In multivariable logistic regressions, the three miRNAs were consistently and continuously associated with prediabetes, while only miR-126-3p was associated with T2DM. The ROC analysis indicated all three miRNAs had a significant overall predictive ability to diagnose prediabetes, diabetes and the combination of both (dysglycaemia), with the area under the receiver operating characteristic curve (AUC) being significantly higher for miR-126-3p in prediabetes. For prediabetes diagnosis, miR-126-3p (AUC = 0.760) outperformed HbA1c (AUC = 0.695), p = 0.042. These results suggest that miR-1299, -126-3p and -30e-3p are associated with prediabetes, and measuring miR-126-3p could potentially contribute to diabetes risk screening strategies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yali Feng ◽  
Jiaqi Zhang ◽  
Yi Zhou ◽  
Bo Chen ◽  
Ying Yin

AbstractThe aim of the present study was to examine the concurrent validity of 2 Chinese versions of the short version of the Montreal Cognitive Assessment (MoCA) in patients with stroke, i.e., MoCA 5-minute protocol and National Institute for Neurological Disorders and Stroke and Canadian Stroke Network (NINDS-CSN) 5-minute Protocol. A total of 54 patients and 27 healthy controls were enrolled in this study. In this study, the Neurobehavioural Cognitive Status Examination (NCSE) was used as an external criterion of cognitive impairment. We found that the 5-min protocol did not differ from the MoCA in differentiating patients with cognitive impairments from those without (area under the receiver operating characteristic curve, AUC, of 0.948 for the MoCA 5-min protocol v.s. 0.984 for MoCA, P = 0.097). These three assessments demonstrated equal performance in differentiating patients with stroke from controls. The Chinese version of the MoCA 5-min protocol can be used as a valid screening for patients with stroke.


2021 ◽  
pp. 1-12
Author(s):  
Xingchen Fan ◽  
Minmin Cao ◽  
Cheng Liu ◽  
Cheng Zhang ◽  
Chunyu Li ◽  
...  

BACKGROUND: MicroRNAs (miRNAs), with noticeable stability and unique expression pattern in plasma of patients with various diseases, are powerful non-invasive biomarkers for cancer detection including endometrial cancer (EC). OBJECTIVE: The objective of this study was to identify promising miRNA biomarkers in plasma to assist the clinical screening of EC. METHODS: A total of 93 EC and 79 normal control (NC) plasma samples were analyzed using Quantitative Real-time Polymerase Chain Reaction (qRT-PCR) in this four-stage experiment. The receiver operating characteristic curve (ROC) analysis was conducted to evaluate the diagnostic value. Additionally, the expression features of the identified miRNAs were further explored in tissues and plasma exosomes samples. RESULTS: The expression of miR-142-3p, miR-146a-5p, and miR-151a-5p was significantly overexpressed in the plasma of EC patients compared with NCs. Areas under the ROC curve of the 3-miRNA signature were 0.729, 0.751, and 0.789 for the training, testing, and external validation phases, respectively. The diagnostic performance of the identified signature proved to be stable in the three public datasets and superior to the other miRNA biomarkers in EC diagnosis. Moreover, the expression of miR-151a-5p was significantly elevated in EC plasma exosomes. CONCLUSIONS: A signature consisting of 3 plasma miRNAs was identified and showed potential for the non-invasive diagnosis of EC.


Cancers ◽  
2021 ◽  
Vol 13 (14) ◽  
pp. 3546
Author(s):  
Katarzyna Sylwia Dobruch-Sobczak ◽  
Hanna Piotrzkowska-Wróblewska ◽  
Piotr Karwat ◽  
Ziemowit Klimonda ◽  
Ewa Markiewicz-Grodzicka ◽  
...  

The aim of the study was to improve monitoring the treatment response in breast cancer patients undergoing neoadjuvant chemotherapy (NAC). The IRB approved this prospective study. Ultrasound examinations were performed prior to treatment and 7 days after four consecutive NAC cycles. Residual malignant cell (RMC) measurement at surgery was the standard of reference. Alteration in B-mode ultrasound (tumor echogenicity and volume) and the Kullback-Leibler divergence (kld), as a quantitative measure of amplitude difference, were used. Correlations of these parameters with RMC were assessed and Receiver Operating Characteristic curve (ROC) analysis was performed. Thirty-nine patients (mean age 57 y.) with 50 tumors were included. There was a significant correlation between RMC and changes in quantitative parameters (KLD) after the second, third and fourth course of NAC, and alteration in echogenicity after the third and fourth course. Multivariate analysis of the echogenicity and KLD after the third NAC course revealed a sensitivity of 91%, specificity of 92%, PPV = 77%, NPV = 97%, accuracy = 91%, and AUC of 0.92 for non-responding tumors (RMC ≥ 70%). In conclusion, monitoring the echogenicity and KLD parameters made it possible to accurately predict the treatment response from the second course of NAC.


Sign in / Sign up

Export Citation Format

Share Document