Convolutional Neural Networks for Breast Density Classification: Performance and Explanation Insights

The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman’s rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman’s rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

Download Full-text

Feature Manipulation with Genetic Programming

10.26686/wgtn.17009945 ◽

2021 ◽

Author(s):

◽

Kourosh Neshatian

Keyword(s):

Feature Selection ◽

Classification Performance ◽

Feature Ranking ◽

Feature Construction ◽

Classification Models ◽

Classification Problems ◽

Target Class ◽

Decision Variables ◽

Complex Relationships ◽

Class Labels

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>

Download Full-text

Feature Manipulation with Genetic Programming

10.26686/wgtn.17009945.v1 ◽

2021 ◽

Author(s):

◽

Kourosh Neshatian

Keyword(s):

Feature Selection ◽

Classification Performance ◽

Feature Ranking ◽

Feature Construction ◽

Classification Models ◽

Classification Problems ◽

Target Class ◽

Decision Variables ◽

Complex Relationships ◽

Class Labels

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>

Download Full-text

A Study of the Relationships between Nursing Students’ Meanings of Life, Positive Beliefs, and Well-Being

International Journal of Innovative Research in Medical Science ◽

10.23958/ijirms/vol03-i01/02 ◽

2018 ◽

Vol 3 (01) ◽

Author(s):

Fu-Ju Tsai ◽

Cheng-Yu Chen ◽

Gwo-Liang Yeh ◽

Yih-Jin Hu ◽

Chie-Chien Tseng ◽

...

Keyword(s):

Health Promotion ◽

Nursing Students ◽

Analysis Of Variance ◽

Rank Correlation ◽

Meaning Of Life ◽

Well Being ◽

Social Health ◽

Cross Sectional ◽

Nursing Educators ◽

Spearman’S Rank Correlation

Background: Nursing educators should train nursing students to pursue physical, psychological, spiritual, and social health promotion. The purpose of this study was to explore relationships between nursing students’ meaning of life, positive beliefs, and well-being. Methods: A cross-sectional correlational study with a quantitative approach was adopted. Purposive sampling was used. A total of 170 nursing students voluntarily participated in this study. A 56-item questionnaire was used to examine nursing students’ meaning of life (1-25 items), positive beliefs (1-11 items), and well-being (1-20 items). The content validity index (CVI) of the study questionnaire was established as 0.95 by seven expert scholars. The reliability values for the three parts of the measure were as follows: meaning of life, Cronbach’s α 0.96; positive beliefs, Cronbach’s α 0.93; and well-being, Cronbach’s α 0.95. Percentages, frequencies, means, SDs, Kruskal-Wallis one-way analysis of variance by rank, Spearman’s rank correlation, one-way analysis of variance, Spearman’s rho correlation, and regression analysis were used for the data analysis. Results: Nursing students had the following mean scores: meaning of life with 4.02 (SD 0.56); positive beliefs with 3.92 (SD 0.62); and well-being with 3.95 (SD 0.57). The results indicate that for all nursing students, meaning of life was positively correlated with positive beliefs, r=0.83 (P<.01); similarly, all nursing students had positive beliefs that were positively correlated with meaning of life, r=0.83 (P<.01). In the results of the study, the nursing students’ background, meaning of life and positive beliefs explained 63% of the variance in well-being (Adjusted R2 squared =0.63, F=33.41, P<.001). Conclusions: Nursing students’ sense of meaning of life and positive beliefs may impact their well-being. Therefore, nursing educators can promote meaning of life and positive beliefs to nursing students as a way to increase their well-being for physical, psychological, spiritual, and social health promotion.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

Deregulation of lncRNA HIST1H2AG-6 and AIM1-3 in peripheral blood mononuclear cells is associated with newly diagnosed type 2 diabetes

BMC Medical Genomics ◽

10.1186/s12920-021-00994-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Hui Jiang ◽

Peian Lou ◽

Xiaoluo Chen ◽

Chenguang Wu ◽

Shihe Shao

Keyword(s):

Roc Curve ◽

Multivariate Logistic Regression Analysis ◽

Rank Correlation ◽

Differentially Expressed ◽

Healthy Controls ◽

Biological Functions ◽

Multivariate Logistic Regression ◽

Spearman’S Rank Correlation ◽

Spearman’S Rank Correlation Coefficient

Abstract Background Type 2 diabetes mellitus (T2DM) is mainly affected by genetic and environmental factors; however, the correlation of long noncoding RNAs (lncRNAs) with T2DM remains largely unknown. Methods Microarray analysis was performed to identify the differentially expressed lncRNAs and messenger RNAs (mRNAs) in patients with T2DM and healthy controls, and the expression of two candidate lncRNAs (lnc-HIST1H2AG-6 and lnc-AIM1-3) were further validated using quantitative real-time polymerase chain reaction (qRT-PCR). Spearman’s rank correlation coefficient was used to measure the degree of association between the two candidate lncRNAs and differentially expressed mRNAs. Furthermore, the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and GO (Gene Ontology) enrichment analysis were used to reveal the biological functions of the two candidate lncRNAs. Additionally, multivariate logistic regression analysis and receiver operating characteristic (ROC) curve analysis were performed. Results The microarray analysis revealed that there were 55 lncRNAs and 36 mRNAs differentially expressed in patients with T2DM compared with healthy controls. Notably, lnc-HIST1H2AG-6 was significantly upregulated and lnc-AIM1-3 was significantly downregulated in patients with T2DM, which was validated in a large-scale qRT-PCR examination (90 controls and 100 patients with T2DM). Spearman’s rank correlation coefficient revealed that both lncRNAs were correlated with 36 differentially expressed mRNAs. Furthermore, functional enrichment (KEGG and GO) analysis demonstrated that the two lncRNA-related mRNAs might be involved in multiple biological functions, including cell programmed death, negative regulation of insulin receptor signal, and starch and sucrose metabolism. Multivariate logistic regression analysis revealed that lnc-HIST1H2AG-6 and lnc-AIM1-3 were significantly correlated with T2DM (OR = 5.791 and 0.071, respectively, both P = 0.000). Furthermore, the ROC curve showed that the expression of lnc-HIST1H2AG-6 and lnc-AIM1-3 might be used to differentiate patients with T2DM from healthy controls (area under the ROC curve = 0.664 and 0.769, respectively). Conclusion The profiles of lncRNA and mRNA were significantly changed in patients with T2DM. The expression levels of lnc-HIST1H2AG-6 and lnc-AIM1-3 genes were significantly correlated with some features of T2DM, which may be used to distinguish patients with T2DM from healthy controls and may serve as potential novel biomarkers for diagnosis in the future.

Download Full-text

Magnetic resonance cisternography imaging findings related to the leakage of Gadolinium into the subarachnoid space

Japanese Journal of Radiology ◽

10.1007/s11604-021-01137-1 ◽

2021 ◽

Author(s):

Rei Nakamichi ◽

Toshiaki Taoka ◽

Hisashi Kawai ◽

Tadao Yoshida ◽

Michihiko Sone ◽

...

Keyword(s):

Magnetic Resonance ◽

Subarachnoid Space ◽

Area Under The Curve ◽

Rank Correlation ◽

Predictive Performance ◽

Roc Curves ◽

Imaging Findings ◽

Spearman’S Rank Correlation ◽

Sagittal Sinus ◽

Magnetic Resonance Cisternography

Abstract Purpose To identify magnetic resonance cisternography (MRC) imaging findings related to Gadolinium-based contrast agent (GBCA) leakage into the subarachnoid space. Materials and methods The number of voxels of GBCA leakage (V-leak) on 3D-real inversion recovery images was measured in 56 patients scanned 4 h post-intravenous GBCA injection. Bridging veins (BVs) were identified on MRC. The numbers of BVs with surrounding cystic structures (BV-cyst), with arachnoid granulations protruding into the superior sagittal sinus (BV-AG-SSS) and the skull (BV-AG-skull), and including any of these factors (BV-incl) were recorded. Correlations between these variables and V-leak were examined based on the Spearman’s rank correlation coefficient. Receiver-operating characteristic (ROC) curves were generated to investigate the predictive performance of GBCA leakage. Results V-leak and the number of BV-incl were strongly correlated (r = 0.609, p < 0.0001). The numbers of BV-cyst and BV-AG-skull had weaker correlations with V-leak (r = 0.364, p = 0.006; r = 0.311, p = 0.020, respectively). The number of BV-AG-SSS was not correlated with V-leak. The ROC curve for contrast leakage exceeding 1000 voxels and the number of BV-incl had moderate accuracy, with an area under the curve of 0.871. Conclusion The number of BV-incl may be a predictor of GBCA leakage and a biomarker for waste drainage function without using GBCA.

Download Full-text

Predicting presenteeism using measures of health status

Quality of Life Research ◽

10.1007/s11136-021-02936-9 ◽

2021 ◽

Author(s):

Cheryl Jones ◽

Katherine Payne ◽

Alexander Thompson ◽

Suzanne M. M. Verstappen

Keyword(s):

Health Status ◽

Online Survey ◽

Activity Index ◽

Work Productivity ◽

Rank Correlation ◽

Ordinary Least Squares ◽

Statistical Correlation ◽

Least Squares Regression ◽

Mapping Algorithm ◽

Spearman’S Rank Correlation

Abstract Objectives To identify whether it is feasible to develop a mapping algorithm to predict presenteeism using multiattribute measures of health status. Methods Data were collected using a bespoke online survey in a purposive sample (n = 472) of working individuals with a self-reported diagnosis of Rheumatoid arthritis (RA). Survey respondents were recruited using an online panel company (ResearchNow). This study used data captured using two multiattribute measures of health status (EQ5D-5 level; SF6D) and a measure of presenteeism (WPAI, Work Productivity Activity Index). Statistical correlation between the WPAI and the two measures of health status (EQ5D-5 level; SF6D) was assessed using Spearman’s rank correlation. Five regression models were estimated to quantify the relationship between WPAI and predict presenteeism using health status. The models were specified based in index and domain scores and included covariates (age; gender). Estimated and observed presenteeism were compared using tenfold cross-validation and evaluated using Root mean square error (RMSE). Results A strong and negative correlation was found between WPAI and: EQ5D-5 level and WPAI (r = − 0.64); SF6D (r =− 0.60). Two models, using ordinary least squares regression were identified as the best performing models specifying health status using: SF6D domains with age interacted with gender (RMSE = 1.7858); EQ5D-5 Level domains and age interacted with gender (RMSE = 1.7859). Conclusions This study provides indicative evidence that two existing measures of health status (SF6D and EQ5D-5L) have a quantifiable relationship with a measure of presenteeism (WPAI) for an exemplar application of working individuals with RA. A future study should assess the external validity of the proposed mapping algorithms.

Download Full-text

Tag N’ Train: a technique to train improved classifiers on unlabeled data

Journal of High Energy Physics ◽

10.1007/jhep01(2021)153 ◽

2021 ◽

Vol 2021 (1) ◽

Cited By ~ 2

Author(s):

Oz Amram ◽

Cristina Mantilla Suarez

Keyword(s):

Real Data ◽

Unlabeled Data ◽

Machine Learning Techniques ◽

Jet Physics ◽

Classification Problems ◽

Weak Classifier ◽

Potential Applications ◽

Substantial Progress ◽

Resonance Search ◽

Class Labels

Abstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.

Download Full-text

Multi-Method Vs Single Method Appraisal of Clinical Quality Indicators for the Emergency Medical Services

International Journal for Quality in Health Care ◽

10.1093/intqhc/mzaa171 ◽

2020 ◽

Author(s):

Ian Howard ◽

Peter Cameron ◽

Maaret Castrén ◽

Lee Wallis ◽

Veronica Lindström

Keyword(s):

Group Discussion ◽

Rank Correlation ◽

Healthcare Setting ◽

Prehospital Emergency Care ◽

Spearman’S Rank Correlation ◽

Prehospital Emergency ◽

Single Method ◽

Multi Method Approach ◽

The Individual ◽

Method Approach

ABSTRACT Background Quality Indicator (QI) appraisal protocols are a novel methodology that combines multiple appraisal methods to comprehensively assess the "appropriateness" of QIs for a particular healthcare setting. However, they remain inadequately explored compared to the single appraisal method approach. This paper aimed to describe and test a QI appraisal protocol versus the single method approach, against a series of QIs potentially relevant to the South African Prehospital Emergency Care setting. Methods An appraisal protocol was developed consisting of two categorical-based appraisal methods, combined with the qualitative analysis of the discussion generated during the consensus application of each method. The output of the protocol was assessed and compared with the application and output of each method. Inter-rater reliability of each particular method was evaluated prior to group consensus rating. Variation in the number of non-valid QIs and the proportion of non-valid QIs identified between each method and the protocol were compared and assessed. Results There was mixed IRR of the individual methods. There was similarly low to moderate correlation of the results obtained between the particular methods (Spearman’s rank correlation=0.42,p<0.001). From a series of 104 QIs, 11 non-valid QIs were identified that were shared between the individual methods. A further 19 non-valid QIs were identified and not shared by each method, highlighting the benefits of a multi-method approach. The outcomes were additionally evident in the group discussion analysis, which in and of itself added further input that would not have otherwise been captured by the individual methods alone. Conclusion The utilization of a multi-method appraisal protocol offers multiple benefits, when compared to the single appraisal approach, and can provide the confidence that the outcomes of the appraisal will ensure a strong foundation on which the QI framework can be successfully implemented.

Download Full-text