large p small n
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 10)

H-INDEX

7
(FIVE YEARS 1)

Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1814
Author(s):  
Yuanyuan Han ◽  
Lan Huang ◽  
Fengfeng Zhou

Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.


2021 ◽  
Author(s):  
Yun Zhang ◽  
Hao Sun ◽  
Aishwary Mandava ◽  
Brian Aevermann ◽  
Tobias Kollmann ◽  
...  

We developed a novel analytic pipeline - FastMix - to integrate flow cytometry, bulk transcriptomics, and clinical covariates for statistical inference of cell type-specific gene expression signatures. FastMix addresses the ''large p, small n'' problem via a carefully designed linear mixed effects model (LMER), which is applicable for both cross-sectional and longitudinal studies. With a novel moment-based estimator, FastMix runs and converges much faster than competing methods for big data analytics. The pipeline also includes a cutting-edge flow cytometry data analysis method for identifying cell population proportions. Simulation studies showed that FastMix produced smaller type I/II errors with more accurate parameter estimation than competing methods. When applied to real transcriptomics and flow cytometry data in two vaccine studies, FastMix-identified cell type-specific signatures were largely consistent with those obtained from the single cell RNA-seq data, with some unique interesting findings.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1493
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.


2020 ◽  
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

In this work, we extended the recently developed tensor decomposition (TD) based unsupervised feature extraction (FE) to a kernel-based method through a mathematical formulation. Subsequently, the kernel TD (KTD) based unsupervised FE was applied to two synthetic examples as well as real data sets, and the findings were compared with those obtained previously using the TD-based unsupervised FE approaches. The KTD-based unsupervised FE outperformed or performed comparably with the TD-based unsupervised FE in large p small n situations, which are situations involving a limited number of samples with many variables (observations). Nevertheless, the KTD-based unsupervised FE outperformed the TD-based unsupervised FE in non large p small n situations. In general, although the use of the kernel trick can help the TD-based unsupervised FE gain more variations, a wider range of problems may also be encountered. Considering the outperformance or comparable performance of the KTD-based unsupervised FE compared to the TD-based unsupervised FE when applied to large p small n problems, it is expected that the KTD-based unsupervised FE can be applied in the genomic science domain, which involves many large p small n problems, and, in which, the TD-based unsupervised FE approach has been effectively applied.


Biometrika ◽  
2020 ◽  
Author(s):  
J E Griffin ◽  
K G Łatuszyński ◽  
M F J Steel

Summary The availability of datasets with large numbers of variables is rapidly increasing. The effective application of Bayesian variable selection methods for regression with these datasets has proved difficult since available Markov chain Monte Carlo methods do not perform well in typical problem sizes of interest. We propose new adaptive Markov chain Monte Carlo algorithms to address this shortcoming. The adaptive design of these algorithms exploits the observation that in large-$p$, small-$n$ settings, the majority of the $p$ variables will be approximately uncorrelated a posteriori. The algorithms adaptively build suitable nonlocal proposals that result in moves with squared jumping distance significantly larger than standard methods. Their performance is studied empirically in high-dimensional problems and speed-ups of up to four orders of magnitude are observed.


2020 ◽  
Vol 57 (5) ◽  
pp. 831-852
Author(s):  
Sungjin Kim ◽  
Clarence Lee ◽  
Sachin Gupta

The authors propose a new Bayesian synthetic control framework to overcome limitations of extant synthetic control methods (SCMs). The proposed Bayesian synthetic control methods (BSCMs) do not impose any restrictive constraints on the parameter space a priori. Moreover, they provide statistical inference in a straightforward manner as well as a natural mechanism to deal with the “large p, small n” and sparsity problems through Markov chain Monte Carlo procedures. Using simulations, the authors find that for a variety of data-generating processes, the proposed BSCMs almost always provide better predictive accuracy and parameter precision than extant SCMs. They demonstrate an application of the proposed BSCMs to a real-world context of a tax imposed on soda sales in Washington state in 2010. As in the simulations, the proposed models outperform extant models, as measured by predictive accuracy in the posttreatment periods. The authors find that the tax led to an increase of 5.7% in retail price and a decrease of 5.5%∼5.8% in sales. They also find that retailers in Washington overshifted the tax to consumers, leading to a pass-through rate of approximately 121%.


2020 ◽  
Vol 1 (4) ◽  
Author(s):  
Phuoc-Hai Huynh ◽  
Van Hoa Nguyen ◽  
Thanh-Nghi Do

Biometrika ◽  
2020 ◽  
Vol 107 (2) ◽  
pp. 415-431
Author(s):  
Xinghao Qiao ◽  
Cheng Qian ◽  
Gareth M James ◽  
Shaojun Guo

Summary We consider estimating a functional graphical model from multivariate functional observations. In functional data analysis, the classical assumption is that each function has been measured over a densely sampled grid. However, in practice the functions have often been observed, with measurement error, at a relatively small number of points. We propose a class of doubly functional graphical models to capture the evolving conditional dependence relationship among a large number of sparsely or densely sampled functions. Our approach first implements a nonparametric smoother to perform functional principal components analysis for each curve, then estimates a functional covariance matrix and finally computes sparse precision matrices, which in turn provide the doubly functional graphical model. We derive some novel concentration bounds, uniform convergence rates and model selection properties of our estimator for both sparsely and densely sampled functional data in the high-dimensional large-$p$, small-$n$ regime. We demonstrate via simulations that the proposed method significantly outperforms possible competitors. Our proposed method is applied to a brain imaging dataset.


Author(s):  
Wolfgang C. Muller ◽  
Torbjorn Bergman ◽  
Gabriella Ilonszki

This chapter makes the case for studying coalition politics in Central Eastern Europe (CEE). A focus on CEE not only fills a research gap in terms of geographic coverage but also opens up the opportunity of out-of-sample theory testing, mitigating the notorious large-p-small-N problem in coalition research, and extending the theoretical framework by incorporating explanatory factors particularly relevant in CEE countries. Within the coalition life-cycle coalition governance—its central stage between coalition formation and termination—is the stage which constitutes the greatest lacuna in coalition research. The main problem of coalition governance is the multi-party nature of governments, with the coalition parties often having conflicting policy preferences, desiring the same offices as their partners, and competing with each other in the next elections. This constellation may lead to conflict within and between the coalition parties, cabinet instability, and policy stalemate. Coalition builders can contain these dangers by choosing the right partners, dividing the spoils wisely, and by employing various mechanisms to manage intra-party politics and, in particular, inter-party relations (giving credibility to commitments, providing mutual information, and making decisions jointly). The resulting modes of coalition governance take three ideal-typical forms: the Ministerial Government Model, the Coalition Compromise Model, and the Dominant Prime Minister Model. Turning to the coalition-cycle in CEE, the chapter explains how the country chapters are organized, which research questions they ask, and how this relates to the extant literature on CEE coalition politics. The chapter concludes with highlighting some of the books’ main findings.


Sign in / Sign up

Export Citation Format

Share Document