A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

2004 ◽  
Vol 3 (1) ◽  
pp. 1-24 ◽  
Author(s):  
Markus Ruschhaupt ◽  
Wolfgang Huber ◽  
Annemarie Poustka ◽  
Ulrich Mansmann

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

2021 ◽  
Author(s):  
Jiahui Zhong ◽  
Minjie Lyu ◽  
Huan Jin ◽  
Zhiwei Cao ◽  
Lou T Chitkushev ◽  
...  

Background: Single-cell transcriptome (SCT) sequencing technology has reached the level of high-throughput technology where gene expression can be measured concurrently from large numbers of cells. The results of gene expression studies are highly reproducible when strict protocols and standard operating procedures (SOP) are followed. However, differences in sample processing conditions result in significant changes in gene expression profiles making direct comparison of different studies difficult. Unsupervised machine learning (ML) uses clustering algorithms combined with semi-automated cell labeling and manual annotation of individual cells. They do not scale up well and a workflow used on a specific dataset will not perform well with other studies. Supervised ML classification shows superior classification accuracy and generalization properties as compared to unsupervised ML methods. We describe a supervised ML method that deploys artificial neural networks (ANN), for 5-class classification of healthy peripheral blood mononuclear cells (PBMC) from multiple diverse studies. Results: We used 58 data sets to train ANN incrementally - over ten cycles of training and testing. The sample processing involved four protocols: separation of PBMC, separation of PBMC + enrichment (by negative selection), separation of PBMC + FACS, and separation of PBMC + MACS. The training data set included between 85 and 110 thousand cells, and the test set had approximately 13 thousand cells. Training and testing were done with various combinations of data sets from four principal data sources. The overall accuracy of classification on independent data sets reached 5-class classification accuracy of 94%. Classification accuracy for B cells, monocytes, and T cells exceeded 95%. Classification accuracy of natural killer (NK) cells was 75% because of the similarity between NK cells and T cell subsets. The accuracy of dendritic cells (DC) was low due to very low numbers of DC in the training sets. Conclusions: The incremental learning ANN model can accurately classify the main types of PBMC. With the inclusion of more DC and resolving ambiguities between T cell and NK cell gene expression profiles, we will enable high accuracy supervised ML classification of PBMC. We assembled a reference data set for healthy PBMC and demonstrated a proof-of-concept for supervised ANN method in classification of previously unseen SCT data. The classification shows high accuracy, that is consistent across different studies and sample processing methods.


Author(s):  
Edward C. Emery ◽  
Patrik Ernfors

Primary sensory neurons of the dorsal root ganglion (DRG) respond and relay sensations that are felt, such as those for touch, pain, temperature, itch, and more. The ability to discriminate between the various types of stimuli is reflected by the existence of specialized DRG neurons tuned to respond to specific stimuli. Because of this, a comprehensive classification of DRG neurons is critical for determining exactly how somatosensation works and for providing insights into cell types involved during chronic pain. This article reviews the recent advances in unbiased classification of molecular types of DRG neurons in the perspective of known functions as well as predicted functions based on gene expression profiles. The data show that sensory neurons are organized in a basal structure of three cold-sensitive neuron types, five mechano-heat sensitive nociceptor types, four A-Low threshold mechanoreceptor types, five itch-mechano-heat–sensitive nociceptor types and a single C–low-threshold mechanoreceptor type with a strong relation between molecular neuron types and functional types. As a general feature, each neuron type displays a unique and predicable response profile; at the same time, most neuron types convey multiple modalities and intensities. Therefore, sensation is likely determined by the summation of ensembles of active primary afferent types. The new classification scheme will be instructive in determining the exact cellular and molecular mechanisms underlying somatosensation, facilitating the development of rational strategies to identify causes for chronic pain.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Szilárd Nemes ◽  
Toshima Z. Parris ◽  
Anna Danielsson ◽  
Zakaria Einbeigi ◽  
Gunnar Steineck ◽  
...  

DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.


2016 ◽  
Vol 32 (1) ◽  
pp. 70-79 ◽  
Author(s):  
S. A. Babichev ◽  
A. I. Kornelyuk ◽  
V. I. Lytvynenko ◽  
V. V. Osypenko

Sign in / Sign up

Export Citation Format

Share Document