matthew’s correlation coefficient
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 38)

H-INDEX

6
(FIVE YEARS 5)

2022 ◽  
Vol 12 ◽  
Author(s):  
Yue Gong ◽  
Benzhi Dong ◽  
Zixiao Zhang ◽  
Yixiao Zhai ◽  
Bo Gao ◽  
...  

Vesicular transport proteins are related to many human diseases, and they threaten human health when they undergo pathological changes. Protein function prediction has been one of the most in-depth topics in bioinformatics. In this work, we developed a useful tool to identify vesicular transport proteins. Our strategy is to extract transition probability composition, autocovariance transformation and other information from the position-specific scoring matrix as feature vectors. EditedNearesNeighbours (ENN) is used to address the imbalance of the data set, and the Max-Relevance-Max-Distance (MRMD) algorithm is adopted to reduce the dimension of the feature vector. We used 5-fold cross-validation and independent test sets to evaluate our model. On the test set, VTP-Identifier presented a higher performance compared with GRU. The accuracy, Matthew’s correlation coefficient (MCC) and area under the ROC curve (AUC) were 83.6%, 0.531 and 0.873, respectively.


Biology ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 5
Author(s):  
Onkar Singh ◽  
Wen-Lian Hsu ◽  
Emily Chia-Yu Su

Interleukin (IL)-10 is a homodimer cytokine that plays a crucial role in suppressing inflammatory responses and regulating the growth or differentiation of various immune cells. However, the molecular mechanism of IL-10 regulation is only partially understood because its regulation is environment or cell type-specific. In this study, we developed a computational approach, ILeukin10Pred (interleukin-10 prediction), by employing amino acid sequence-based features to predict and identify potential immunosuppressive IL-10-inducing peptides. The dataset comprises 394 experimentally validated IL-10-inducing and 848 non-inducing peptides. Furthermore, we split the dataset into a training set (80%) and a test set (20%). To train and validate the model, we applied a stratified five-fold cross-validation method. The final model was later evaluated using the holdout set. An extra tree classifier (ETC)-based model achieved an accuracy of 87.5% and Matthew’s correlation coefficient (MCC) of 0.755 on the hybrid feature types. It outperformed an existing state-of-the-art method based on dipeptide compositions that achieved an accuracy of 81.24% and an MCC value of 0.59. Our experimental results showed that the combination of various features achieved better predictive performance..


2021 ◽  
Vol 12 ◽  
Author(s):  
Zixiao Zhang ◽  
Yue Gong ◽  
Bo Gao ◽  
Hongfei Li ◽  
Wentao Gao ◽  
...  

Soluble N-ethylmaleimide sensitive factor activating protein receptor (SNARE) proteins are a large family of transmembrane proteins located in organelles and vesicles. The important roles of SNARE proteins include initiating the vesicle fusion process and activating and fusing proteins as they undergo exocytosis activity, and SNARE proteins are also vital for the transport regulation of membrane proteins and non-regulatory vesicles. Therefore, there is great significance in establishing a method to efficiently identify SNARE proteins. However, the identification accuracy of the existing methods such as SNARE CNN is not satisfied. In our study, we developed a method based on a support vector machine (SVM) that can effectively recognize SNARE proteins. We used the position-specific scoring matrix (PSSM) method to extract features of SNARE protein sequences, used the support vector machine recursive elimination correlation bias reduction (SVM-RFE-CBR) algorithm to rank the importance of features, and then screened out the optimal subset of feature data based on the sorted results. We input the feature data into the model when building the model, used 10-fold crossing validation for training, and tested model performance by using an independent dataset. In independent tests, the ability of our method to identify SNARE proteins achieved a sensitivity of 68%, specificity of 94%, accuracy of 92%, area under the curve (AUC) of 84%, and Matthew’s correlation coefficient (MCC) of 0.48. The results of the experiment show that the common evaluation indicators of our method are excellent, indicating that our method performs better than other existing classification methods in identifying SNARE proteins.


2021 ◽  
Vol 2128 (1) ◽  
pp. 012012
Author(s):  
Mohamed R. Shoaib ◽  
Mohamed R. Elshamy ◽  
Taha E. Taha ◽  
Adel S. El-Fishawy ◽  
Fathi E. Abd El-Samie

Abstract Brain tumor is an acute cancerous disease that results from abnormal and uncontrollable cell division. Brain tumors are classified via biopsy, which is not normally done before the brain ultimate surgery. Recent advances and improvements in deep learning technology helped the health industry in getting accurate disease diagnosis. In this paper, a Convolutional Neural Network (CNN) is adopted with image pre-processing to classify brain Magnetic Resonance (MR) images into four classes: glioma tumor, meningioma tumor, pituitary tumor and normal patients, is provided. We use a transfer learning model, a CNN-based model that is designed from scratch, a pre-trained inceptionresnetv2 model and a pre-trained inceptionv3 model. The performance of the four proposed models is tested using evaluation metrics including accuracy, sensitivity, specificity, precision, F1_score, Matthew’s correlation coefficient, error, kappa and false positive rate. The obtained results show that the two proposed models are very effective in achieving accuracies of 93.15% and 91.24% for the transfer learning model and BRAIN-TUMOR-net based on CNN, respectively. The inceptionresnetv2 model achieves an accuracy of 86.80% and the inceptionv3 model achieves an accuracy of 85.34%. Practical implementation of the proposed models is presented.


Life ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 940
Author(s):  
Guo-Hua Huang ◽  
Yu-Hang Zhang ◽  
Lei Chen ◽  
You Li ◽  
Tao Huang ◽  
...  

Non-small cell lung cancer is a major lethal subtype of epithelial lung cancer, with high morbidity and mortality. The single-cell sequencing technique plays a key role in exploring the pathogenesis of non-small cell lung cancer. We proposed a computational method for distinguishing cell subtypes from the different pathological regions of non-small cell lung cancer on the basis of transcriptomic profiles, including a group of qualitative classification criteria (biomarkers) and various rules. The random forest classifier reached a Matthew’s correlation coefficient (MCC) of 0.922 by using 720 features, and the decision tree reached an MCC of 0.786 by using 1880 features. The obtained biomarkers and rules were analyzed in the end of this study.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tim W. McInerney ◽  
Brian Fulton-Howard ◽  
Christopher Patterson ◽  
Devashi Paliwal ◽  
Lars S. Jermiin ◽  
...  

Abstract Background Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute. Results We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew’s correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants. Conclusion These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.


2021 ◽  
Author(s):  
Yunzhuo Zhou ◽  
Raghad Al-Jarf ◽  
Azadeh Alavi ◽  
Thanh Binh Nguyen ◽  
Carlos H. M. Rodrigues ◽  
...  

Abstract Protein phosphorylation acts as an essential on/off switch in many cellular signalling pathways, regulating protein function. This has led to ongoing interest in targeting kinases for therapeutic intervention. Computer-aided drug discovery has been proven a useful and cost-effective approach for facilitating prioritisation and enrichment of screening libraries. Limited effort, however, has been devoted to developing and tailoring in silico tools to assist the development of kinase inhibitors and providing relevant insights on what makes potent inhibitors. To fill this gap, here we developed kinCSM, an integrative computational tool capable of accurately identifying potent cyclin-dependent kinase 2 (CDK2) inhibitors, quantitatively predicting CDK2 ligand-kinase inhibition constants (pKi) and classify inhibition modes without kinase information. kinCSM predictive models were built using supervised learning and leveraged the concept of graph-based signatures to capture both physicochemical properties and geometry properties of small molecules. CDK2 inhibitors were accurately identified with Matthew’s Correlation Coefficients of up to 0.74, and inhibition constants predicted with Pearson’s correlation of up to 0.76, both with consistent performances of 0.66 and 0.68 on non-redundant blind tests, respectively. kinCSM was also able to identify the potential type of inhibition for a given molecule, achieving Matthew’s Correlation Coefficient of up to 0.80 on cross-validation and 0.73 on blind test. Analysing the molecular composition of kinase inhibitors revealed enriched chemical fragments in potent CDK2 inhibitors and different types of inhibitors, which provides insights into the molecular mechanisms behind ligand-kinase interactions. We believe kinCSM will be an invaluable tool to guide future kinase drug discovery. To aid the fast and accurate screening of potent CDK2 kinase inhibitors, we made kinCSM freely available online at http://biosig.unimelb.edu.au/kin_csm/.


2021 ◽  
Author(s):  
Michael Easter ◽  
Angi Christensen ◽  
Michelle Miller

Locating clandestine graves is often a significant challenge for law enforcement and other investigators. A number of search techniques can be employed including visual assessments, canines, geophysical techniques, and imaging, often depending on the location/terrain, case information, and available resources. Dowsing is believed by some to be a reliable method for locating underground items of interest including water, oil, ore, and even graves; others, however, consider the practice to be controversial or even pseudoscience. Here we assess the ability of dowsing rods (wielded by previously inexperienced dowsers) to locate buried bones using a controlled blind test. Assemblages of bones were buried in three of nine holes. A control group of participants was asked to identify which holes they believed to contain bones by visual inspection, and a test group was asked to identify which holes they believed to contain bones using dowsing rods. Results indicate that neither method had a significant relationship with the true location of the bones (Matthew’s Correlation Coefficient –0.19 for the control group and 0.00 for the test group), and that there was no significant difference between the two groups (p = 0.36). In this study, dowsing was not found to be a reliable method of detecting buried bones. Some practitioners continue to advocate dowsing or other scientifically questionable search methods, even charging investigators or families substantial fees for these services. It is therefore important that such techniques are well-understood and rigorously tested, and that investigators seek and employ methods that are appropriate and valid.


Author(s):  
Meenal Chaudhari ◽  
Niraj Thapa ◽  
Hamid Ismail ◽  
Sandhya Chopade ◽  
Doina Caragea ◽  
...  

Phosphorylation, which is mediated by protein kinases and opposed by protein phosphatases, is an important post-translational modification that regulates many cellular processes, including cellular metabolism, cell migration, and cell division. Due to its essential role in cellular physiology, a great deal of attention has been devoted to identifying sites of phosphorylation on cellular proteins and understanding how modification of these sites affects their cellular functions. This has led to the development of several computational methods designed to predict sites of phosphorylation based on a protein’s primary amino acid sequence. In contrast, much less attention has been paid to dephosphorylation and its role in regulating the phosphorylation status of proteins inside cells. Indeed, to date, dephosphorylation site prediction tools have been restricted to a few tyrosine phosphatases. To fill this knowledge gap, we have employed a transfer learning strategy to develop a deep learning-based model to predict sites that are likely to be dephosphorylated. Based on independent test results, our model, which we termed DTL-DephosSite, achieved efficiency scores for phosphoserine/phosphothreonine residues of 84%, 84% and 0.68 with respect to sensitivity (SN), specificity (SP) and Matthew’s correlation coefficient (MCC). Similarly, DTL-DephosSite exhibited efficiency scores of 75%, 88% and 0.64 for phosphotyrosine residues with respect to SN, SP, and MCC.


Author(s):  
Jalpa J. Patel ◽  
S. K. Hadia

<p><span id="docs-internal-guid-12eaaa5f-7fff-c428-95bf-97a7381b2976"><span>Breast cancer is the most driving reason for death in women in both developed and developing nations. For the plan of effective classification of a system, the selection of features method must be used to decrease irregularity part in mammogram images. The proposed approach is used to crop the region of interests (ROIs) manually. Based on that number of features are extracted. In this proposed method a novel hybrid optimum feature selection (HOFS) method is used to find out the significant features to reach maximum accuracy for this classification. A number of selected features is applied to train the neural network. In this proposed method accessible informational index from the mini–mammographic image analysis society (MIAS) database was used. The classification of this mammogram database involved a neural networks classifier which attained an accuracy of 99.7% with a sensitivity of 99.5%, and specificity of 100% as the area under the curve (AUC) is 0.9975 and matthew’s correlation coefficient (MCC) represents a binary class value which reached the value of 0.9931. It can be useful in a computer-aided diagnosis system (CAD) framework to help the radiologist in analyzing breast cancer. Results achieved with the proposed method are better compared to recent work.</span></span></p>


Sign in / Sign up

Export Citation Format

Share Document