scholarly journals Target-Decoy MineR for determining the biological relevance of variables in noisy data sets

2020 ◽  
Author(s):  
Cesaré Ovando-Vázquez ◽  
Daniel Cázarez-García ◽  
Robert Winkler

AbstractMachine learning algorithms excavate important variables from biological big data. However, deciding on the biological relevance of identified variables is challenging. The addition of artificial noise, ‘decoy’ variables, to raw data, ‘target’ variables, enables calculating a false-positive rate (FPR) and a biological relevance probability (BRp) for each variable rank. These scores allow the setting of a cut-off for informative variables can be defined, depending on the required sensitivity/ specificity of a scientific question. We demonstrate the function of the Target-Decoy MineR (TDM) with synthetic data and with experimental metabolomics results. The Target-Decoy MineR is suitable for different types of quantitative data in tabular format. An implementation of the algorithm in R is freely available from https://bitbucket.org/cesaremov/targetdecoy_mining/.

2021 ◽  
pp. bjophthalmol-2020-318188
Author(s):  
Shotaro Asano ◽  
Hiroshi Murata ◽  
Yuri Fujino ◽  
Takehiro Yamashita ◽  
Atsuya Miki ◽  
...  

Background/AimTo investigate the clinical validity of the Guided Progression Analysis definition (GPAD) and cluster-based definition (CBD) with the Humphrey Field Analyzer 10-2 test in diagnosing glaucomatous visual field (VF) progression, and to introduce a novel definition with optimised specificity by combining the ‘any-location’ and ‘cluster-based’ approaches (hybrid definition).Methods64 400 stable glaucomatous VFs were simulated from 664 pairs of 10-2 tests (10 sets × 10 VF series × 664 eyes; data set 1). Using these simulated VFs, the specificity to detect progression and the effects of changing the parameters (number of test locations or consecutive VF tests, and percentile cut-off values) were investigated. The hybrid definition was designed as the combination where the specificity was closest to 95.0%. Subsequently, another 5000 actual glaucomatous 10-2 tests from 500 eyes (10 VFs each) were collected (data set 2), and their accuracy (sensitivity, specificity and false positive rate) and the time needed to detect VF progression were evaluated.ResultsThe specificity values calculated using data set 1 with GPAD and CBD were 99.6% and 99.8%. Using data set 2, the hybrid definition had a higher sensitivity than GPAD and CBD, without detriment to the specificity or false positive rate. The hybrid definition also detected progression significantly earlier than GPAD and CBD (at 3.1 years vs 4.2 years and 4.1 years, respectively).ConclusionsGPAD and CBD had specificities of 99.6% and 99.8%, respectively. A novel hybrid definition (with a specificity of 95.5%) had higher sensitivity and enabled earlier detection of progression.


2014 ◽  
Author(s):  
Andreas Tuerk ◽  
Gregor Wiktorin ◽  
Serhat Güler

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragment bias, which is not represented appropriately by current statistical models of RNA-Seq data. This article introduces the Mix2(rd. "mixquare") model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2model can be efficiently trained with the Expectation Maximization (EM) algorithm resulting in simultaneous estimates of the transcript abundances and transcript specific positional biases. Experiments are conducted on synthetic data and the Universal Human Reference (UHR) and Brain (HBR) sample from the Microarray quality control (MAQC) data set. Comparing the correlation between qPCR and FPKM values to state-of-the-art methods Cufflinks and PennSeq we obtain an increase in R2value from 0.44 to 0.6 and from 0.34 to 0.54. In the detection of differential expression between UHR and HBR the true positive rate increases from 0.44 to 0.71 at a false positive rate of 0.1. Finally, the Mix2model is used to investigate biases present in the MAQC data. This reveals 5 dominant biases which deviate from the common assumption of a uniform fragment distribution. The Mix2software is available at http://www.lexogen.com/fileadmin/uploads/bioinfo/mix2model.tgz.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Çiğdem Karakükcü ◽  
Mehmet Zahid Çıracı ◽  
Derya Kocer ◽  
Mine Yüce Faydalı ◽  
Muhittin Abdulkadir Serdar

Abstract Objectives To obtain optimal immunoassay screening and LC-MS/MS confirmation cut-offs for opiate group tests to reduce false positive (FP) and false negative (FN) rates. Methods A total of 126 urine samples, −50 opiate screening negative, 76 positive according to the threshold of 300 ng/mL by CEDIA method – were confirmed by a full-validated in-house LC-MS/MS method. Sensitivity, specificity, FP, and FN rates were determined at cut-off concentrations of both 300 and 2,000 ng/mL for morphine and codeine, and 10 ng/mL for heroin metabolite 6-mono-acetyl-morphine (6-MAM). Results All CEDIA opiate negative urine samples were negative for morphine, codeine and 6-MAM. Although sensitivity was 100% for each cut-off; specificity was 54.9% at CEDIA cut-off 300 ng/mL vs. LC-MS/MS cut-off 300 ng/mL and, 75% at CEDIA cut-off 2,000 ng/mL vs. LC-MS/MS cut-off 2,000 ng/mL. False positive rate was highest (45.1%) at CEDIA cut-off 300 ng/mL. At CEDIA cut-off 2,000 ng/mL vs. LC-MS/MS cut-off 300 ng/mL, specificity increased to 82.4% and FP rate decreased to 17.6%. All 6-MAM positive samples had CEDIA concentration ≥2,000 ng/mL. Conclusions 2,000 ng/mL for screening and 300 ng/mL for confirmation cut-offs are the most efficient thresholds for the lowest rate of FP opiate results.


2012 ◽  
pp. 830-850
Author(s):  
Abhilash Alexander Miranda ◽  
Olivier Caelen ◽  
Gianluca Bontempi

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.


2019 ◽  
Vol 09 (03) ◽  
pp. e262-e267
Author(s):  
Henry Alexander Easley ◽  
Todd Michael Beste

Objectives To evaluate the diagnostic accuracy of a multivariable prediction model, the Shoulder Screen (Perigen, Inc.), and compare it with the American College of Obstetricians and Gynecologists (ACOG) guidelines to prevent harm from shoulder dystocia. Study Design The model was applied to two groups of 199 patients each who delivered during a 4-year period. One group experienced shoulder dystocia and the other group delivered without shoulder dystocia. The model's accuracy was analyzed. The performance of the model was compared with the ACOG guideline. Results The sensitivity, specificity, positive, and negative predictive values of the model were 23.1, 99.5, 97.9, and 56.4%, respectively. The sensitivity of the ACOG guideline was 10.1%. The false-positive rate of the model was 0.5%. The accuracy of the model was 61.3%. Conclusion A multivariable prediction model can predict shoulder dystocia and is more accurate than ACOG guidelines.


Author(s):  
Harikrishna Mulam ◽  
Malini Mudigonda

Many research works are in progress in classification of the eye movements using the electrooculography signals and employing them to control the human–computer interface systems. This article introduces a new model for recognizing various eye movements using electrooculography signals with the help of empirical mean curve decomposition and multiwavelet transformation. Furthermore, this article also adopts a principal component analysis algorithm to reduce the dimension of electrooculography signals. Accordingly, the dimensionally reduced decomposed signal is provided to the neural network classifier for classifying the electrooculography signals, along with this, the weight of the neural network is fine-tuned with the assistance of the Levenberg–Marquardt algorithm. Finally, the proposed method is compared with the existing methods and it is observed that the proposed methodology gives the better performance in correspondence with accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, negative predictive value, false discovery rate, F1 score, and Mathews correlation coefficient.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yizhen Sun ◽  
Jianjiang Yu ◽  
Jianwei Tian ◽  
Zhongwei Chen ◽  
Weiping Wang ◽  
...  

Security issues related to the Internet of Things (IoTs) have attracted much attention in many fields in recent years. One important problem in IoT security is to recognize the type of IoT devices, according to which different strategies can be designed to enhance the security of IoT applications. However, existing IoT device recognition approaches rarely consider traffic attacks, which might change the pattern of traffic and consequently decrease the recognition accuracy of different IoT devices. In this work, we first validate by experiments that traffic attacks indeed decrease the recognition accuracy of existing IoT device recognition approaches; then, we propose an approach called IoT-IE that combines information entropy of different traffic features to detect traffic anomaly. We then enhance the robustness of IoT device recognition by detecting and ignoring the abnormal traffic detected by our approach. Experimental evaluations show that IoT-IE can effectively detect abnormal behaviors of IoT devices in the traffic under eight different types of attacks, achieving a high accuracy value of 0.977 and a low false positive rate of 0.011. It also achieves an accuracy of 0.969 in a multiclassification experiment with 7 different types of attacks.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Wanaporn Burivong ◽  
Thanatorn Sricharoen ◽  
Apichart Thachang ◽  
Sunsiree Soodchuen ◽  
Panitpong Maroongroge ◽  
...  

Objective. The purpose of this study is to compare the early radiologic diagnosis of pulmonary infection between serial chest radiography (chest film) and single chest computed tomography (CT chest) in the first seven days of febrile neutropenia. Methods. This study included 78 patients with hematologic malignancies who developed 107 episodes of febrile neutropenia from January 2012 to October 2017 and had a chest film performed within the first seven days. Demographic and radiographic data were retrospectively reviewed. Three radiologists independently and blindly evaluated chest films and CT chests. The sensitivity, specificity, and correlation of chest film with absolute neutrophil count were carried out. Results. A total of 222 chest films were performed during this period and found thirty-nine episodes (36.4%) of radiographic active pulmonary infection. The diagnosis of clinical positive for pulmonary infection is 44.8% (48/107). Sensitivity, specificity, positive predictive value, and negative predictive value of serial chest film in the early radiologic diagnosis of pulmonary infection are 50%, 74%, 61%, and 64%, respectively. The false-positive rate was 14%, and the false-negative rate was 22%. For single CT chest examinations, twenty-six studies were assessed, and 42.3% was indicative of radiographic active pulmonary infection. Sensitivity, specificity, positive predictive value, and negative predictive value of CT chest in the early radiologic diagnosis of pulmonary infection are 91%, 40%, 53%, and 86%, respectively. The false-positive rate was 60%. The absolute neutrophil count was not useful for predicting radiographic active pulmonary infection. Conclusion. Serial chest film for early radiologic diagnosis of pulmonary infection within the first seven days of febrile neutropenia has lower sensitivity with higher specificity as compared to a single CT chest. Conversely, CT chest may not only have a higher sensitivity in determining early pulmonary infection but also has a higher rate of false-positives.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 288-288
Author(s):  
Takeyuki Wada ◽  
Takaki Yoshikawa ◽  
Ayako Kamiya ◽  
Keichi Date ◽  
Tsutomu Hayashi ◽  
...  

288 Background: D2 surgery is required for clinical T1 gastric cancer with nodal swelling, however, D2 has a higher risk for morbidity than D1/D1+. Moreover, previous study demonstrated that the false positive rate for nodal diagnosis in clinical T1 was very high. To select optimal surgery with high probability, we explored risk factors for false positivity in clinical T1 disease. Methods: Patients who underwent radical gastrectomy for clinical T1 gastric cancer between April 2015 and June 2019 were enrolled. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive values for nodal diagnosis were retrospectively investigated. The risk factors for false positivity were also analyzed by the following factors; age, sex, histological type, tumor size, tumor depth, location, tumor type, presence of ulcer, and timing of CT that is (1) the patients who underwent primary endoscopic mucosal dissection (ESD) but resulted in non-curative resection, then received CT to proceed to surgery (delayed CT group) or (2) the other patients who had received CT before primary surgery or before non-curative ESD (primary CT group). Results: A total of 679 patients were examined in the present study. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 83.5% (567/679), 14.3% (13/91), 94.2% (554/588), 27.7% (13/47), and 87.7% (554/632), respectively. The false positive rate was 72.3% (34/47). In univariate analysis, differentiated tumor ( p= 0.012) and delayed CT (p < 0.001) were associated with the false positivity. Multivariate analysis revealed that delayed CT (OR, 4.534; p < 0.001) was a sole significant risk factor for false positivity. False positive rate was 100% (13/13) in the delayed CT group and 61.8% (21/34) in the primary CT group ( p= 0.009). Conclusions: False positive rate was high in clinical T1 disease, especially when the patients received delayed CT after non-curative ESD. D2 surgery would be unnecessary even though nodal swelling was detected in CT after non-curative ESD.


2021 ◽  
Vol 2 (2) ◽  
pp. 40-47
Author(s):  
Sunil Kumar ◽  
Vaibhav Bhatnagar

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.


Sign in / Sign up

Export Citation Format

Share Document