Identification of Chronic Hypersensitivity Pneumonitis Biomarkers with Machine Learning and Differential Co-expression Analysis

2020 ◽  
Vol 20 ◽  
Author(s):  
Hongwei Zhang ◽  
Steven Wang ◽  
Tao Huang

Aims: We would like to identify the biomarkers for chronic hypersensitivity pneumonitis (CHP) and facilitate the precise gene therapy of CHP. Background: Chronic hypersensitivity pneumonitis (CHP) is an interstitial lung disease caused by hypersensitive reactions to inhaled antigens. Clinically, the tasks of differentiating between CHP and other interstitial lungs diseases, especially idiopathic pulmonary fibrosis (IPF), were challenging. Objective: In this study, we analyzed the public available gene expression profile of 82 CHP patients, 103 IPF patients, and 103 control samples to identify the CHP biomarkers. Method: The CHP biomarkers were selected with advanced feature selection methods: Monte Carlo Feature Selection (MCFS) and Incremental Feature Selection (IFS). A Support Vector Machine (SVM) classifier was built. Then, we analyzed these CHP biomarkers through functional enrichment analysis and differential co-expression analysis. Result: There were 674 identified CHP biomarkers. The co-expression network of these biomarkers in CHP included more negative regulations and the network structure of CHP was quite different from the network of IPF and control. Conclusion: The SVM classifier may serve as an important clinical tool to address the challenging task of differentiating between CHP and IPF. Many of the biomarker genes on the differential co-expression network showed great promise in revealing the underlying mechanisms of CHP.

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Shuai Zhang ◽  
Renliang Qu ◽  
Pengyan Wang ◽  
Shenghan Wang

Coronavirus disease 2019 (COVID-19) arising from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic since its first report in December 2019. So far, SARS-CoV-2 nucleic acid detection has been deemed as the golden standard of COVID-19 diagnosis. However, this detection method often leads to false negatives, thus triggering missed COVID-19 diagnosis. Therefore, it is urgent to find new biomarkers to increase the accuracy of COVID-19 diagnosis. To explore new biomarkers of COVID-19 in this study, expression profiles were firstly accessed from the GEO database. On this basis, 500 feature genes were screened by the minimum-redundancy maximum-relevancy (mRMR) feature selection method. Afterwards, the incremental feature selection (IFS) method was used to choose a classifier with the best performance from different feature gene-based support vector machine (SVM) classifiers. The corresponding 66 feature genes were set as the optimal feature genes. Lastly, the optimal feature genes were subjected to GO functional enrichment analysis, principal component analysis (PCA), and protein-protein interaction (PPI) network analysis. All in all, it was posited that the 66 feature genes could effectively classify positive and negative COVID-19 and work as new biomarkers of the disease.


2020 ◽  
Vol 23 (8) ◽  
pp. 805-813
Author(s):  
Ai Jiang ◽  
Peng Xu ◽  
Zhenda Zhao ◽  
Qizhao Tan ◽  
Shang Sun ◽  
...  

Background: Osteoarthritis (OA) is a joint disease that leads to a high disability rate and a low quality of life. With the development of modern molecular biology techniques, some key genes and diagnostic markers have been reported. However, the etiology and pathogenesis of OA are still unknown. Objective: To develop a gene signature in OA. Method: In this study, five microarray data sets were integrated to conduct a comprehensive network and pathway analysis of the biological functions of OA related genes, which can provide valuable information and further explore the etiology and pathogenesis of OA. Results and Discussion: Differential expression analysis identified 180 genes with significantly expressed expression in OA. Functional enrichment analysis showed that the up-regulated genes were associated with rheumatoid arthritis (p < 0.01). Down-regulated genes regulate the biological processes of negative regulation of kinase activity and some signaling pathways such as MAPK signaling pathway (p < 0.001) and IL-17 signaling pathway (p < 0.001). In addition, the OA specific protein-protein interaction (PPI) network was constructed based on the differentially expressed genes. The analysis of network topological attributes showed that differentially upregulated VEGFA, MYC, ATF3 and JUN genes were hub genes of the network, which may influence the occurrence and development of OA through regulating cell cycle or apoptosis, and were potential biomarkers of OA. Finally, the support vector machine (SVM) method was used to establish the diagnosis model of OA, which not only had excellent predictive power in internal and external data sets (AUC > 0.9), but also had high predictive performance in different chip platforms (AUC > 0.9) and also had effective ability in blood samples (AUC > 0.8). Conclusion: The 4-genes diagnostic model may be of great help to the early diagnosis and prediction of OA.


Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.


Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2020 ◽  
Vol 10 (9) ◽  
pp. 3282
Author(s):  
Angela Shin-Yu Lien ◽  
Yi-Der Jiang ◽  
Jia-Ling Tsai ◽  
Jawl-Shan Hwang ◽  
Wei-Chao Lin

Fatigue and poor sleep quality are the most common clinical complaints of people with diabetes mellitus (DM). These complaints are early signs of DM and are closely related to diabetic control and the presence of complications, which lead to a decline in the quality of life. Therefore, an accurate measurement of the relationship between fatigue, sleep status, and the complication of DM nephropathy could lead to a specific definition of fatigue and an appropriate medical treatment. This study recruited 307 people with Type 2 diabetes from two medical centers in Northern Taiwan through a questionnaire survey and a retrospective investigation of medical records. In an attempt to identify the related factors and accurately predict diabetic nephropathy, we applied hybrid research methods, integrated biostatistics, and feature selection methods in data mining and machine learning to compare and verify the results. Consequently, the results demonstrated that patients with diabetic nephropathy have a higher fatigue level and Charlson comorbidity index (CCI) score than without neuropathy, the presence of neuropathy leads to poor sleep quality, lower quality of life, and poor metabolism. Furthermore, by considering feature selection in selecting representative features or variables, we achieved consistence results with a support vector machine (SVM) classifier and merely ten representative factors and a prediction accuracy as high as 74% in predicting the presence of diabetic nephropathy.


2020 ◽  
pp. 3397-3407
Author(s):  
Nur Syafiqah Mohd Nafis ◽  
Suryanti Awang

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Xiuzhi Sang ◽  
Wanyue Xiao ◽  
Huiwen Zheng ◽  
Yang Yang ◽  
Taigang Liu

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.


2020 ◽  
Vol 14 (3) ◽  
pp. 269-279
Author(s):  
Hayet Djellali ◽  
Nacira Ghoualmi-Zine ◽  
Souad Guessoum

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246668
Author(s):  
Lihua Cai ◽  
Honglong Wu ◽  
Ke Zhou

Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.


Twitter sentiment analysis is a vital concept in determining the public opinions about products, services, events or personality. Analyzing the medical tweets on a specific topic can provide immense benefits in medical industry. However, the medical tweets require efficient feature selection approach to produce significantly accurate results. Penguin search optimization algorithm (PeSOA) has the ability to resolve NP-hard problems. This paper aims at developing an automated opinion mining framework by modeling the feature selection problem as NP-hard optimization problem and using PeSOA based feature selection approach to solve it. Initially, the medical tweets based on cancer and drugs keywords are extracted and pre-processed to filter the relevant informative tweets. Then the features are extracted based on the Natural Language Processing (NLP) concepts and the optimal features are selected using PeSOA whose results are fed as input to three baseline classifiers to achieve optimal and accurate sentiment classification. The experimental results obtained through MATLAB simulations on cancer and drug tweets using k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) indicate that the proposed PeSOA feature selection based tweet opinion mining has improved the classification performance significantly. It shows that the PeSOA feature selection with the SVM classifier provides superior sentiment classification than the other classifiers


Sign in / Sign up

Export Citation Format

Share Document