scholarly journals OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data

2022 ◽  
Vol 18 (1) ◽  
pp. e1009731
Author(s):  
Raga Krishnakumar ◽  
Anne M. Ruffing

Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high recall or high specificity. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded as additional data is generated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility, and adaptability.

2021 ◽  
Author(s):  
Raga Krishnakumar ◽  
Anne M Ruffing

Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine whether the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high coverage of operons or high accuracy of the calls. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded on as additional data is generated and incorporated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility and adaptability.


2021 ◽  
Vol 11 (13) ◽  
pp. 6225
Author(s):  
Seongkeun Park ◽  
Jieun Byun

Background: Post-prostatectomy incontinence (PPI) is a major complication that can significantly decrease quality of life. Approximately 20% of patients experience consistent PPI as long as 1 year after radical prostatectomy (RP). This study develops a preoperative predictive model and compares its diagnostic performance with conventional tools. Methods: A total of 166 prostate cancer patients who underwent magnetic resonance imaging (MRI) and RP were evaluated. According to the date of the RP, patients were divided into a development cohort (n = 109) and a test cohort (n = 57). Patients were classified as PPI early-recovery or consistent on the basis of pad usage for incontinence at 3 months after RP. Uni- and multi-variable logistic regression analyses were performed to identify associates of PPI early recovery. Four well-known machine learning algorithms (k-nearest neighbor, decision tree, support-vector machine (SVM), and random forest) and a logistic regression model were used to build prediction models for recovery from PPI using preoperative clinical and imaging data. The performances of the prediction models were assessed internally and externally using sensitivity, specificity, accuracy, and area-under-the-curve values and estimated probabilities and the actual proportion of cases of recovery from PPI within 3 months were compared using a chi-squared test. Results: Clinical and imaging findings revealed that age (70.1 years old for the PPI early-recovery group vs. 72.8 years old for the PPI consistent group), membranous urethral length (MUL; 15.7 mm for the PPI early-recovery group vs. 13.9 mm for the PPI consistent group), and obturator internal muscle (18.2 mm for the PPI early-recovery group vs. 17.5 mm for the PPI consistent group) were significantly different between the PPI early-recovery and consistent groups (all p-values < 0.05). Multivariate analysis confirmed that age (odds ratio = 1.07, 95% confidence interval = 1.02–1.14, p-value = 0.007) and MUL (odds ratio = 0.87, 95% confidence interval = 0.80–0.95, p-value = 0.002) were significant independent factors for early recovery. The prediction model using machine learning algorithms showed superior diagnostic performance compared with conventional logistic regression (AUC = 0.59 ± 0.07), especially SVM (AUC = 0.65 ± 0.07). Moreover, all models showed good calibration between the estimated probability and actual observed proportion of cases of recovery from PPI within 3 months. Conclusions: Preoperative clinical data and anatomic features on preoperative MRI can be used to predict early recovery from PPI after RP, and machine learning algorithms provide greater diagnostic accuracy compared with conventional statistical approaches.


2019 ◽  
Vol 19 (1) ◽  
pp. 40-48 ◽  
Author(s):  
Shanwen Sun ◽  
Chunyu Wang ◽  
Hui Ding ◽  
Quan Zou

Abstract The advent of high-throughput genomic technologies has resulted in the accumulation of massive amounts of genomic information. However, biologists are challenged with how to effectively analyze these data. Machine learning can provide tools for better and more efficient data analysis. Unfortunately, because many plant biologists are unfamiliar with machine learning, its application in plant molecular studies has been restricted to a few species and a limited set of algorithms. Thus, in this study, we provide the basic steps for developing machine learning frameworks and present a comprehensive overview of machine learning algorithms and various evaluation metrics. Furthermore, we introduce sources of important curated plant genomic data and R packages to enable plant biologists to easily and quickly apply appropriate machine learning algorithms in their research. Finally, we discuss current applications of machine learning algorithms for identifying various genes related to resistance to biotic and abiotic stress. Broad application of machine learning and the accumulation of plant sequencing data will advance plant molecular studies.


2020 ◽  
Vol 12 ◽  
Author(s):  
Ibrahim Almubark ◽  
Lin-Ching Chang ◽  
Kyle F. Shattuck ◽  
Thanh Nguyen ◽  
Raymond Scott Turner ◽  
...  

Introduction: The goal of this study was to investigate and compare the classification performance of machine learning with behavioral data from standard neuropsychological tests, a cognitive task, or both.Methods: A neuropsychological battery and a simple 5-min cognitive task were administered to eight individuals with mild cognitive impairment (MCI), eight individuals with mild Alzheimer's disease (AD), and 41 demographically match controls (CN). A fully connected multilayer perceptron (MLP) network and four supervised traditional machine learning algorithms were used.Results: Traditional machine learning algorithms achieved similar classification performances with neuropsychological or cognitive data. MLP outperformed traditional algorithms with the cognitive data (either alone or together with neuropsychological data), but not neuropsychological data. In particularly, MLP with a combination of summarized scores from neuropsychological tests and the cognitive task achieved ~90% sensitivity and ~90% specificity. Applying the models to an independent dataset, in which the participants were demographically different from the ones in the main dataset, a high specificity was maintained (100%), but the sensitivity was dropped to 66.67%.Discussion: Deep learning with data from specific cognitive task(s) holds promise for assisting in the early diagnosis of Alzheimer's disease, but future work with a large and diverse sample is necessary to validate and to improve this approach.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2019 ◽  
Vol 1 (2) ◽  
pp. 78-80
Author(s):  
Eric Holloway

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms.  Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1290-P
Author(s):  
GIUSEPPE D’ANNUNZIO ◽  
ROBERTO BIASSONI ◽  
MARGHERITA SQUILLARIO ◽  
ELISABETTA UGOLOTTI ◽  
ANNALISA BARLA ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document