scholarly journals Gene Selection using a High-Dimensional Regression Model with Microarrays in Cancer Prognostic Studies

2012 ◽  
Vol 11 ◽  
pp. CIN.S9048 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.

2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


Author(s):  
Srinivas Gutta ◽  
Ibrahim F. Imam ◽  
Harry Wechsler

Hand gestures are the natural form of communication among people, yet human-computer interaction is still limited to mice movements. The use of hand gestures in the field of human-computer interaction has attracted renewed interest in the past several years. Special glove-based devices have been developed to analyze finger and hand motion and use them to manipulate and explore virtual worlds. To further enrich the naturalness of the interaction, different computer vision-based techniques have been developed. At the same time the need for more efficient systems has resulted in new gesture recognition approaches. In this paper we present an hybrid intelligent system for hand gesture recognition. The hybrid approach consists of an ensemble of connectionist networks — radial basis functions (RBF) — and inductive decision trees (AQDT). Cross Validation (CV) experimental results yield a false negative rate of 1.7% and a false positive rate of 1% while the evaluation takes place on a data base including 150 images corresponding to 15 gestures of 5 subjects. In order to assess the robustness of the system, the vocabulary of the gestures has been increased from 15 to 25 and the size of the database from 150 to 750 images corresponding now to 15 subjects. Cross Validation (CV) experimental results yield a false negative rate of 3.6% and a false positive rate of 1.8% respectively. The benefits of our hybrid architecture include (i) robustness via query by consensus as provided by ensembles of networks when facing the inherent variability of the image formation and data acquisition process, (ii) classifications made using decision trees, (iii) flexible and adaptive thresholds as opposed to ad hoc and hard thresholds and (iv) interpretability of the way classification and retrieval is eventually achieved.


2015 ◽  
Author(s):  
David M Rocke ◽  
Luyao Ruan ◽  
Yilun Zhang ◽  
J. Jared Gossett ◽  
Blythe Durbin-Johnson ◽  
...  

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.


Author(s):  
ZIQIANG SHI ◽  
BOYANG GAO ◽  
TIERAN ZHENG ◽  
JIQING HAN

In this paper, a novel method from the feature — porno-sounds recognition — point of view is proposed to detect adult video sequences automatically which may serve as a verification step, a supplementary method or an independent detector. To the specificity of erotic sound, its feature analysis is given. Based on the popular features, histograms and contours are introduced as new sets of features. At the same time due to the complexity of outside data, a general framework called in-class clustering is proposed which selects the most representative subclass for training and classification. All these efforts increase the recall rate and decrease the false positive rate. Experiments on real data from the Internet indicate that the proposed method yields superior performance with 89.17% recall rate and 10.78% false positive rate being achieved.


2002 ◽  
Vol 41 (01) ◽  
pp. 37-41 ◽  
Author(s):  
S. Shung-Shung ◽  
S. Yu-Chien ◽  
Y. Mei-Due ◽  
W. Hwei-Chung ◽  
A. Kao

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.


1993 ◽  
Vol 32 (02) ◽  
pp. 175-179 ◽  
Author(s):  
B. Brambati ◽  
T. Chard ◽  
J. G. Grudzinskas ◽  
M. C. M. Macintosh

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.


2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1894
Author(s):  
Chun Guo ◽  
Zihua Song ◽  
Yuan Ping ◽  
Guowei Shen ◽  
Yuhei Cui ◽  
...  

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.


2020 ◽  
Vol 154 (Supplement_1) ◽  
pp. S5-S5
Author(s):  
Ridin Balakrishnan ◽  
Daniel Casa ◽  
Morayma Reyes Gil

Abstract The diagnostic approach for ruling out suspected acute pulmonary embolism (PE) in the ED setting includes several tests: ultrasound, plasma d-dimer assays, ventilation-perfusion scans and computed tomography pulmonary angiography (CTPA). Importantly, a pretest probability scoring algorithm is highly recommended to triage high risk cases while also preventing unnecessary testing and harm to low/moderate risk patients. The d-dimer assay (both ELISA and immunoturbidometric) has been shown to be extremely sensitive to rule out PE in conjunction with clinical probability. In particularly, d-dimer testing is recommended for low/moderate risk patients, in whom a negative d-dimer essentially rules out PE sparing these patients from CTPA radiation exposure, longer hospital stay and anticoagulation. However, an unspecific increase in fibrin-degradation related products has been seen with increase in age, resulting in higher false positive rate in the older population. This study analyzed patient visits to the ED of a large academic institution for five years and looked at the relationship between d-dimer values, age and CTPA results to better understand the value of age-adjusted d-dimer cut-offs in ruling out PE in the older population. A total of 7660 ED visits had a CTPA done to rule out PE; out of which 1875 cases had a d-dimer done in conjunction with the CT and 5875 had only CTPA done. Out of the 1875 cases, 1591 had positive d-dimer results (>0.50 µg/ml (FEU)), of which 910 (57%) were from patients older than or equal to fifty years of age. In these older patients, 779 (86%) had a negative CT result. The following were the statistical measures of the d-dimer test before adjusting for age: sensitivity (98%), specificity (12%); negative predictive value (98%) and false positive rate (88%). After adjusting for age in people older than 50 years (d-dimer cut off = age/100), 138 patients eventually turned out to be d-dimer negative and every case but four had a CT result that was also negative for a PE. The four cases included two non-diagnostic results and two with subacute/chronic/subsegmental PE on imaging. None of these four patients were prescribed anticoagulation. The statistical measures of the d-dimer test after adjusting for age showed: sensitivity (96%), specificity (20%); negative predictive value (98%) and a decrease in the false positive rate (80%). Therefore, imaging could have been potentially avoided in 138/779 (18%) of the patients who were part of this older population and had eventual negative or not clinically significant findings on CTPA if age-adjusted d-dimers were used. This data very strongly advocates for the clinical usefulness of an age-adjusted cut-off of d-dimer to rule out PE.


Sign in / Sign up

Export Citation Format

Share Document