A Computational Method to Assist the Diagnosis of Breast Disease Using Dynamic Thermography

Breast cancer has been the second leading cause of cancer death among women. New techniques to enhance early diagnosis are very important to improve cure rates. This paper proposes and evaluates an image analysis method to automatically detect patients with breast benign and malignant changes (tumors). Such method explores the difference of Dynamic Infrared Thermography (DIT) patterns observed in patients’ skin. After obtaining the sequential DIT images of each patient, their temperature arrays are computed and new images in gray scale are generated. Then the regions of interest (ROIs) of those images are segmented and, from them, arrays of the ROI temperature are computed. Features are extracted from the arrays, such as the ones based on statistical, clustering, histogram comparison, fractal geometry, diversity indices and spatial statistics. Time series that are broken down into subsets of different cardinalities are generated from such features. Automatic feature selection methods are applied and used in the Support Vector Machine (SVM) classifier. In our tests, using a dataset of 68 images, 100% accuracy was achieved.

Download Full-text

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

BioMed Research International ◽

10.1155/2016/4783801 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 13

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Yu-Hong Fang ◽

Yu-Jun Zhao ◽

...

Keyword(s):

Protein Sequences ◽

Relevance Vector Machine ◽

Experimental Results ◽

Computational Method ◽

Support Vector ◽

Svm Classifier ◽

Local Phase ◽

Local Phase Quantization ◽

Phase Quantization ◽

Better Than

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Download Full-text

Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence

BioMed Research International ◽

10.1155/2015/902198 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 42

Author(s):

Yu-An Huang ◽

Zhu-Hong You ◽

Xin Gao ◽

Leon Wong ◽

Lirong Wang

Keyword(s):

Sparse Representation ◽

Protein Interactions ◽

Protein Sequence ◽

False Positive Rate ◽

Computational Method ◽

Substitution Matrix ◽

Support Vector ◽

Svm Classifier ◽

Protein Protein Interactions ◽

Discrete Cosine Transformation

Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset ofYeast,Human, andH. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we usedYeastPPIs samples as training set to predict PPIs of other five species datasets.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks

Current Gene Therapy ◽

10.2174/1566523219666190917155959 ◽

2019 ◽

Vol 19 (4) ◽

pp. 232-241 ◽

Cited By ~ 5

Author(s):

Xuegong Chen ◽

Wanwan Shi ◽

Lei Deng

Keyword(s):

Protein Interactions ◽

Experimental Studies ◽

Treatment Strategies ◽

Computational Method ◽

Biological Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Efficient Treatment ◽

Disease Associations ◽

Previous State

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.

Download Full-text

Identification of Chronic Hypersensitivity Pneumonitis Biomarkers with Machine Learning and Differential Co-expression Analysis

Current Gene Therapy ◽

10.2174/1566523220666201208093325 ◽

2020 ◽

Vol 20 ◽

Author(s):

Hongwei Zhang ◽

Steven Wang ◽

Tao Huang

Keyword(s):

Feature Selection ◽

Expression Analysis ◽

Hypersensitivity Pneumonitis ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Great Promise ◽

Support Vector ◽

Svm Classifier ◽

Clinical Tool ◽

Chronic Hypersensitivity Pneumonitis

Aims: We would like to identify the biomarkers for chronic hypersensitivity pneumonitis (CHP) and facilitate the precise gene therapy of CHP. Background: Chronic hypersensitivity pneumonitis (CHP) is an interstitial lung disease caused by hypersensitive reactions to inhaled antigens. Clinically, the tasks of differentiating between CHP and other interstitial lungs diseases, especially idiopathic pulmonary fibrosis (IPF), were challenging. Objective: In this study, we analyzed the public available gene expression profile of 82 CHP patients, 103 IPF patients, and 103 control samples to identify the CHP biomarkers. Method: The CHP biomarkers were selected with advanced feature selection methods: Monte Carlo Feature Selection (MCFS) and Incremental Feature Selection (IFS). A Support Vector Machine (SVM) classifier was built. Then, we analyzed these CHP biomarkers through functional enrichment analysis and differential co-expression analysis. Result: There were 674 identified CHP biomarkers. The co-expression network of these biomarkers in CHP included more negative regulations and the network structure of CHP was quite different from the network of IPF and control. Conclusion: The SVM classifier may serve as an important clinical tool to address the challenging task of differentiating between CHP and IPF. Many of the biomarker genes on the differential co-expression network showed great promise in revealing the underlying mechanisms of CHP.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Possibility of Human Gender Recognition Using Raman Spectra of Teeth

Molecules ◽

10.3390/molecules26133983 ◽

2021 ◽

Vol 26 (13) ◽

pp. 3983

Author(s):

Ozren Gamulin ◽

Marko Škrabić ◽

Kristina Serec ◽

Matej Par ◽

Marija Baković ◽

...

Keyword(s):

Raman Spectra ◽

Principal Component ◽

Support Vector ◽

Gender Recognition ◽

Proof Of Concept ◽

Male And Female ◽

Tooth Type ◽

Tooth Apex ◽

The Difference

Gender determination of the human remains can be very challenging, especially in the case of incomplete ones. Herein, we report a proof-of-concept experiment where the possibility of gender recognition using Raman spectroscopy of teeth is investigated. Raman spectra were recorded from male and female molars and premolars on two distinct sites, tooth apex and anatomical neck. Recorded spectra were sorted into suitable datasets and initially analyzed with principal component analysis, which showed a distinction between spectra of male and female teeth. Then, reduced datasets with scores of the first 20 principal components were formed and two classification algorithms, support vector machine and artificial neural networks, were applied to form classification models for gender recognition. The obtained results showed that gender recognition with Raman spectra of teeth is possible but strongly depends both on the tooth type and spectrum recording site. The difference in classification accuracy between different tooth types and recording sites are discussed in terms of the molecular structure difference caused by the influence of masticatory loading or gender-dependent life events.

Download Full-text

Ablation Analysis to Select Wearable Sensors for Classifying Standing, Walking, and Running

Sensors ◽

10.3390/s21010194 ◽

2020 ◽

Vol 21 (1) ◽

pp. 194

Author(s):

Sarah Gonzalez ◽

Paul Stegall ◽

Harvey Edwards ◽

Leia Stirling ◽

Ho Chit Siu

Keyword(s):

Activity Recognition ◽

Principal Components ◽

Classification Accuracy ◽

Wearable Sensors ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Measurement Units ◽

The Difference

The field of human activity recognition (HAR) often utilizes wearable sensors and machine learning techniques in order to identify the actions of the subject. This paper considers the activity recognition of walking and running while using a support vector machine (SVM) that was trained on principal components derived from wearable sensor data. An ablation analysis is performed in order to select the subset of sensors that yield the highest classification accuracy. The paper also compares principal components across trials to inform the similarity of the trials. Five subjects were instructed to perform standing, walking, running, and sprinting on a self-paced treadmill, and the data were recorded while using surface electromyography sensors (sEMGs), inertial measurement units (IMUs), and force plates. When all of the sensors were included, the SVM had over 90% classification accuracy using only the first three principal components of the data with the classes of stand, walk, and run/sprint (combined run and sprint class). It was found that sensors that were placed only on the lower leg produce higher accuracies than sensors placed on the upper leg. There was a small decrease in accuracy when the force plates are ablated, but the difference may not be operationally relevant. Using only accelerometers without sEMGs was shown to decrease the accuracy of the SVM.

Download Full-text

The Primacy of High B-Value 3T-DWI Radiomics in the Prediction of Clinically Significant Prostate Cancer

Diagnostics ◽

10.3390/diagnostics11050739 ◽

2021 ◽

Vol 11 (5) ◽

pp. 739

Author(s):

Alessandro Bevilacqua ◽

Margherita Mottola ◽

Fabio Ferroni ◽

Alice Rossi ◽

Giampaolo Gavelli ◽

...

Keyword(s):

Prostate Cancer ◽

Quantitative Imaging ◽

B Value ◽

Support Vector ◽

Svm Classifier ◽

Primary Role ◽

High B Value ◽

Wilcoxon Rank Sum Test ◽

Apparent Diffusion ◽

Clinically Significant

Predicting clinically significant prostate cancer (csPCa) is crucial in PCa management. 3T-magnetic resonance (MR) systems may have a novel role in quantitative imaging and early csPCa prediction, accordingly. In this study, we develop a radiomic model for predicting csPCa based solely on native b2000 diffusion weighted imaging (DWIb2000) and debate the effectiveness of apparent diffusion coefficient (ADC) in the same task. In total, 105 patients were retrospectively enrolled between January–November 2020, with confirmed csPCa or ncsPCa based on biopsy. DWIb2000 and ADC images acquired with a 3T-MRI were analyzed by computing 84 local first-order radiomic features (RFs). Two predictive models were built based on DWIb2000 and ADC, separately. Relevant RFs were selected through LASSO, a support vector machine (SVM) classifier was trained using repeated 3-fold cross validation (CV) and validated on a holdout set. The SVM models rely on a single couple of uncorrelated RFs (ρ < 0.15) selected through Wilcoxon rank-sum test (p ≤ 0.05) with Holm–Bonferroni correction. On the holdout set, while the ADC model yielded AUC = 0.76 (95% CI, 0.63–0.96), the DWIb2000 model reached AUC = 0.84 (95% CI, 0.63–0.90), with specificity = 75%, sensitivity = 90%, and informedness = 0.65. This study establishes the primary role of 3T-DWIb2000 in PCa quantitative analyses, whilst ADC can remain the leading sequence for detection.

Download Full-text