scholarly journals Feature selection by higher criticism thresholding achieves the optimal phase diagram

Author(s):  
David Donoho ◽  
Jiashun Jin

We consider two-class linear classification in a high-dimensional, small-sample-size setting. Only a small fraction of the features are useful, these being unknown to us, and each useful feature contributes weakly to the classification decision. This was called the rare/weak (RW) model in our previous study ( Donoho, D. & Jin, J. 2008 Proc. Natl Acad. Sci. USA 105 , 14 790–14 795). We select features by thresholding feature Z -scores. The threshold is set by higher criticism (HC). For 1≤ i ≤ N , let π i denote the p -value associated with the i th Z -score and π ( i ) denote the i th order statistic of the collection of p -values. The HC threshold (HCT) is the order statistic of the Z -score corresponding to index i maximizing . The ideal threshold optimizes the classification error. In that previous study, we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations. We show that, along this sequence, the limiting performance of ideal HCT is essentially just as good as the limiting performance of ideal thresholding. Our results describe two-dimensional phase space , a two-dimensional diagram with coordinates quantifying ‘rare’ and ‘weak’ in the RW model. The phase space can be partitioned into two regions—one where ideal threshold classification is successful, and one where the features are so weak and so rare that it must fail. Surprisingly, the regions where ideal HCT succeeds and fails make exactly the same partition of the phase diagram. Other threshold methods, such as false (feature) discovery rate (FDR) threshold selection, are successful in a substantially smaller region of the phase space than either HCT or ideal thresholding. The FDR and local FDR of the ideal and HC threshold selectors have surprising phase diagrams, which are also described. Results showing the asymptotic equivalence of HCT with ideal HCT can be found in a forthcoming paper ( Donoho, D. & Jin, J. In preparation ).

2011 ◽  
Vol 5 ◽  
pp. BCBCR.S6263 ◽  
Author(s):  
Olena Zakharchenko ◽  
Christina Greenwood ◽  
Louise Alldridge ◽  
Serhiy Souchelnytskyi

Proteomics is a highly informative approach to analyze cancer-associated transformation in tissues. The main challenge to use a tissue for proteomics studies is the small sample size and difficulties to extract and preserve proteins. The choice of a buffer compatible with proteomics applications is also a challenge. Here we describe a protocol optimized for the most efficient extraction of proteins from the human breast tissue in a buffer compatible with two-dimensional gel electrophoresis (2D-GE). This protocol is based on mechanically assisted disintegration of tissues directly in the 2D-GE buffer. Our method is simple, robust and easy to apply in clinical practice. We demonstrate high quality of separation of proteins prepared according to the reported here protocol.


2013 ◽  
Vol 753-755 ◽  
pp. 3064-3067
Author(s):  
Ju Zhong ◽  
Ye Zi Sheng ◽  
Chun Li Lin ◽  
Nai Dong Cui

Double-direction two-dimensional Maximum Scatter Difference (2D2MSD) based on Maximum Scatter Difference (MSD) was proposed,which overcame the small sample size problem of LDA, and data were more concise. In the Weizmann human action database, experimental results showed the algorithm was fast, the average recognition rate reached 92% and the highest recognition rate reached 100%.


2020 ◽  
Vol 31 (1) ◽  
pp. 131-138
Author(s):  
Ronald Booij ◽  
Marcel van Straten ◽  
Andreas Wimmer ◽  
Ricardo P.J. Budde

Abstract Objective To assess the accuracy of a 3D camera for body contour detection in pediatric patient positioning in CT compared with routine manual positioning by radiographers. Methods and materials One hundred and ninety-one patients, with and without fixation aid, which underwent CT of the head, thorax, and/or abdomen on a scanner with manual table height selection and with table height suggestion by a 3D camera were retrospectively included. The ideal table height was defined as the position at which the scanner isocenter coincides with the patient’s isocenter. Table heights suggested by the camera and selected by the radiographer were compared with the ideal height. Results For pediatric patients without fixation aid like a baby cradle or vacuum cushion and positioned by radiographers, the median (interquartile range) absolute table height deviation in mm was 10.2 (16.8) for abdomen, 16.4 (16.6) for head, 4.1 (5.1) for thorax-abdomen, and 9.7 (9.7) for thorax CT scans. The deviation was less for the 3D camera: 3.1 (4.7) for abdomen, 3.9 (6.3) for head, 2.2 (4.3) for thorax-abdomen, and 4.8 (6.7) for thorax CT scans (p < 0.05 for all body parts combined). Conclusion A 3D camera for body contour detection allows for automated and more accurate pediatric patient positioning than manual positioning done by radiographers, resulting in overall significantly smaller deviations from the ideal table height. The 3D camera may be also useful in the positioning of patients with fixation aid; however, evaluation of possible improvements in positioning accuracy was limited by the small sample size. Key Points • A 3D camera for body contour detection allows for automated and accurate pediatric patient positioning in CT. • A 3D camera outperformed radiographers in positioning pediatric patients without a fixation aid in CT. • Positioning of pediatric patients with fixation aid was feasible using the 3D camera, but no definite conclusions were drawn regarding the positioning accuracy due to the small sample size.


2012 ◽  
Vol 542-543 ◽  
pp. 1343-1346
Author(s):  
Xing Zhu Liang ◽  
Yu E Lin ◽  
Jing Zhao Li

Unsupervised Discriminant Projection (UDP) is one of the most promising feature extraction methods. However, UDP suffers from the small sample size problem and the optimal basis vectors obtained by the UDP are nonorthogonal. In this paper, we present a new method called Two-dimensional Orthogonal Unsupervised Discriminant Projection (2DOUDP), which is not necessary to convert the image matrix into high-dimensional image vector and does not suffer the small sample size problem. To further improve the recognition performance, the orthogonal projection matrix obtained based on Gram–Schmidt orthogonalization is given. Experimental results on ORL database indicate that the proposed 2DOUDP method achieves better recognition rate than the UDP method and some other orthogonal feature extraction algorithms.


Author(s):  
Stephen E. Fienberg ◽  
Jiashun Jin

We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p >> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of ``cautious", ``regular", and ``generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the ``feature privacy" rather than the ``sample privacy", though the violation of the former may lead to the latter. We evaluate the data sharing methods using {\it phase diagram} from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature)-privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation.


1966 ◽  
Vol 25 ◽  
pp. 46-48 ◽  
Author(s):  
M. Lecar

“Dynamical mixing”, i.e. relaxation of a stellar phase space distribution through interaction with the mean gravitational field, is numerically investigated for a one-dimensional self-gravitating stellar gas. Qualitative results are presented in the form of a motion picture of the flow of phase points (representing homogeneous slabs of stars) in two-dimensional phase space.


Author(s):  
Conly L. Rieder ◽  
S. Bowser ◽  
R. Nowogrodzki ◽  
K. Ross ◽  
G. Sluder

Eggs have long been a favorite material for studying the mechanism of karyokinesis in-vivo and in-vitro. They can be obtained in great numbers and, when fertilized, divide synchronously over many cell cycles. However, they are not considered to be a practical system for ultrastructural studies on the mitotic apparatus (MA) for several reasons, the most obvious of which is that sectioning them is a formidable task: over 1000 ultra-thin sections need to be cut from a single 80-100 μm diameter egg and of these sections only a small percentage will contain the area or structure of interest. Thus it is difficult and time consuming to obtain reliable ultrastructural data concerning the MA of eggs; and when it is obtained it is necessarily based on a small sample size.We have recently developed a procedure which will facilitate many studies concerned with the ultrastructure of the MA in eggs. It is based on the availability of biological HVEM's and on the observation that 0.25 μm thick serial sections can be screened at high resolution for content (after mounting on slot grids and staining with uranyl and lead) by phase contrast light microscopy (LM; Figs 1-2).


Sign in / Sign up

Export Citation Format

Share Document