subset selection
Recently Published Documents





2022 ◽  
Meelad Amouzgar ◽  
David R Glass ◽  
Reema Baskar ◽  
Inna Averbukh ◽  
Samuel C Kimmey ◽  

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction enables visualization of data by representing cells in two-dimensional plots that capture the structure and heterogeneity of the original dataset. Visualizations contribute to human understanding of data and are useful for guiding both quantitative and qualitative analysis of cellular relationships. Existing algorithms are typically unsupervised, utilizing only measured features to generate manifolds, disregarding known biological labels such as cell type or experimental timepoint. Here, we repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling users to tailor visualizations to separate specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this flexible, computationally-efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes, such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality reduction algorithms and illustrate its utility and versatility for exploration of single-cell mass cytometry, transcriptomics and chromatin accessibility data.

2022 ◽  
Vol 302 ◽  
pp. 103597
Vahid Roostapour ◽  
Aneta Neumann ◽  
Frank Neumann ◽  
Tobias Friedrich

2021 ◽  
Vol 6 (3) ◽  
pp. 177
Muhamad Arief Hidayat

In health science there is a technique to determine the level of risk of pregnancy, namely the Poedji Rochyati score technique. In this evaluation technique, the level of pregnancy risk is calculated from the values ​​of 22 parameters obtained from pregnant women. Under certain conditions, some parameter values ​​are unknown. This causes the level of risk of pregnancy can not be calculated. For that we need a way to predict pregnancy risk status in cases of incomplete attribute values. There are several studies that try to overcome this problem. The research "classification of pregnancy risk using cost sensitive learning" [3] applies cost sensitive learning to the process of classifying the level of pregnancy risk. In this study, the best classification accuracy achieved was 73% and the best value was 77.9%. To increase the accuracy and recall of predicting pregnancy risk status, in this study several improvements were proposed. 1) Using ensemble learning based on classification tree 2) using the SVMattributeEvaluator evaluator to optimize the feature subset selection stage. In the trials conducted using the classification tree-based ensemble learning method and the SVMattributeEvaluator at the feature subset selection stage, the best value for accuracy was up to 76% and the best value for recall was up to 89.5%

Sign in / Sign up

Export Citation Format

Share Document