scholarly journals Screening of key biomarkers of tendinopathy based on bioinformatics and machine learning algorithms

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0259475
Author(s):  
Ya xi Zhu ◽  
Jia qiang Huang ◽  
Yu yang Ming ◽  
Zhao Zhuang ◽  
Hong Xia

Tendinopathy is a complex multifaceted tendinopathy often associated with overuse and with its high prevalence resulting in significant health care costs. At present, the pathogenesis and effective treatment of tendinopathy are still not sufficiently elucidated. The purpose of this research is to intensely explore the genes, functional pathways, and immune infiltration characteristics of the occurrence and development of tendinopathy. The gene expression profile of GSE106292, GSE26051 and GSE167226 are downloaded from GEO (NCBI comprehensive gene expression database) and analyzed by WGCNA software bag using R software, GSE26051, GSE167226 data set is combined to screen the differential gene analysis. We subsequently performed gene enrichment analysis of Gene Ontology (GO) and "Kyoto Encyclopedia of Genes and Genomes" (KEGG), and immune cell infiltration analysis. By constructing the LASSO regression model, Support vector machine (SVM-REF) and Gaussian mixture model (GMMs) algorithms are used to screen, to identify early diagnostic genes. We have obtained a total of 171 DEGs through WGCNA analysis and differentially expressed genes (DEGs) screening. By GO and KEGG enrichment analysis, it is found that these dysregulated genes were related to mTOR, HIF-1, MAPK, NF-κB and VEGF signaling pathways. Immune infiltration analysis showed that M1 macrophages, activated mast cells and activated NK cells had infiltration significance. After analysis of THE LASSO SVM-REF and GMMs algorithms, we found that the gene MACROD1 may be a gene for early diagnosis. We identified the potential of tendon disease early diagnosis way and immune gene regulation MACROD1 key infiltration characteristics based on comprehensive bioinformatics analysis. These hub genes and functional pathways may as early biomarkers of tendon injuries and molecular therapy level target is used to guide drug and basic research.

2021 ◽  
Author(s):  
Yaxi Zhu ◽  
Jia Qiang Huang ◽  
Yun Dong Zhou ◽  
Yu Yang Ming ◽  
Zhao Zhuang ◽  
...  

Abstract Tendinopathy is a complex, multifaceted tendon disease that is often associated with overuse and causes significant health care costs with its high prevalence. At present, the pathogenesis and effective treatment of tendinopathy cannot be fully elucidated. This study aims to deeply explore and analyze the key genes, functional pathways, and immune infiltration characteristics of the occurrence and development of tendinopathy.MethodsThe gene expression profiles of GSE106292, GSE26051 and GSE167226 were downloaded from GEO database. The WGCNA analysis was performed on GSE106292 data set by the R package in R software, and differential gene analysis was performed on GSE26051 and GSE167226 data sets by combining and screening. The gene enrichment analysis of GO and KEGG and immune cell infiltration analysis were performed. Lasso logistic regression, support vector machine (SVM-REF) and Gaussian mixture model (GMMS) algorithm were used to screen and identify early diagnostic genes.ResultsWe have obtained a total of 171 DEGs through WGCNA analysis and screening of different expressed genes. By GO and KEGG enrichment analysis, it was found that these malregulated genes were related to mTOR, HIF-1, MAPK, NF-κB and VEGF signaling pathways. Immunoinfiltration analysis showed that M1 macrophages, activated mast cells and activated NK cells had infiltration significance. MacroD1 may be an early diagnosis gene, and it was found based on Lasso, SVM-REF and GMMS algorithm.ConclusionsBased on comprehensive bioinformatics analysis, we identified the potential early diagnosis genes MACROD1, key regulatory pathways and immune infiltration characteristics of tendinopathy. These key genes and pathways may be used as biomarkers and molecular therapeutic targets for early tendon injury to guide drugs and basic research.


2021 ◽  
Vol 11 ◽  
Author(s):  
Wei Yan ◽  
Hua Shi ◽  
Tao He ◽  
Jian Chen ◽  
Chen Wang ◽  
...  

ObjectiveIn order to enhance the detection rate of multiple myeloma and execute an early and more precise disease management, an artificial intelligence assistant diagnosis system is developed.Methods4,187 routine blood and biochemical examination records were collected from Shengjing Hospital affiliated to China Medical University from January 2010 to January 2020, which include 1,741 records of multiple myeloma (MM) and 2,446 records of non-myeloma (infectious diseases, rheumatic immune system diseases, hepatic diseases and renal diseases). The data set was split into training and test subsets with the ratio of 4:1 while connecting hemoglobin, serum creatinine, serum calcium, immunoglobulin (A, G and M), albumin, total protein, and the ratio of albumin to globulin data. An early assistant diagnostic model of MM was established by Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM), Deep Neural Networks (DNN), and Random Forest (RF). Out team calculated the precision and recall of the system. The performance of the diagnostic model was evaluated by using the receiver operating characteristic (ROC) curve.ResultsBy designing the features properly, the typical machine learning algorithms SVM, DNN, RF and GBDT all performed well. GBDT had the highest precision (92.9%), recall (90.0%) and F1 score (0.915) for the myeloma group. The maximized area under the ROC (AUROC) was calculated, and the results of GBDT (AUC: 0.975; 95% confidence interval (CI): 0.963–0.986) outperformed that of SVM, DNN and RF.ConclusionThe model established by artificial intelligence derived from routine laboratory results can accurately diagnose MM, which can boost the rate of early diagnosis.


2020 ◽  
Vol 19 ◽  
pp. 153303382090982
Author(s):  
Melek Akcay ◽  
Durmus Etiz ◽  
Ozer Celik ◽  
Alaattin Ozen

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.


Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


2012 ◽  
Vol 30 (15_suppl) ◽  
pp. e21011-e21011
Author(s):  
Song Tian ◽  
John DiCarlo ◽  
Jiaye Yu ◽  
George J Quellhorst ◽  
Raymond K Blanchard ◽  
...  

e21011 Background: Thyroid nodules can be detected in as high as 67% of the population. Distinguishing thyroid cancers from benign lesions is crucial for determining an appropriate treatment plan. For years a gene expression signature for discriminating malignant from benign thyroid nodules has been sought by clinicians. In this study, multivariate bioinformatics tools were used to generate a qPCR based gene expression signature for determining malignancy in thyroid nodules. Methods: Multiple mathematical models, such as Random Forest, Support Vector Machine (SVM), and Nearest Shrunken Centroid (NSC), were used to analyze published microarray data sets and select 366 putative classifier (biomarker) mRNA targets. The selected 366 genes were further evaluated for their expression pattern by real-time PCR using a panel of 49 pathology assessed thyroid nodule samples (fresh frozen, 23 malignant and 26 benign). Results: Using the qPCR data set, Random Forest was compared with SVM and NSC classifier methods and was found to be more successful in finding genes with better discriminative powers. A Random Forest method identified a panel of 7 genes together with 5 reference genes as a gene expression signature for thyroid malignancy, which led to the development of a companion classifying algorithm to provide a probability score to assess malignancy of thyroid nodules. In our limited sample set, this signature was shown to distinguish malignant and benign thyroid nodules with 92% accuracy and 100% specificity. Conclusions: Our results suggest that a combination of multiple bioinformatics analysis tools is the proper approach for biomarker candidate selection from high-throughput gene expression data. As demonstrated here, panel of 12 genes and a companion classification algorithm has the potential to successfully discriminate malignant thyroid nodule with high accuracy and specificity. This panel of twelve genes is for molecular biology applications only.


Author(s):  
RYO INOKUCHI ◽  
SADAAKI MIYAMOTO

Recently kernel methods in support vector machines have widely been used in machine learning algorithms to obtain nonlinear models. Clustering is an unsupervised learning method which divides whole data set into subgroups, and popular clustering algorithms such as c-means are employing kernel methods. Other kernel-based clustering algorithms have been inspired from kernel c-means. However, the formulation of kernel c-means has a high computational complexity. This paper gives an alternative formulation of kernel-based clustering algorithms derived from competitive learning clustering. This new formulation obviously uses sequential updating or on-line learning to avoid high computational complexity. We apply kernel methods to related algorithms: learning vector quantization and self-organizing map. We moreover consider kernel methods for sequential c-means and its fuzzy version by the proposed formulation.


The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Jianyi Li ◽  
Xiaojie Tang ◽  
Yukun Du ◽  
Jun Dong ◽  
Zheng Zhao ◽  
...  

Purpose. Osteosarcoma is the most common primary and highly invasive bone tumor in children and adolescents. The purpose of this study is to construct a multi-gene expression feature related to autophagy, which can be used to predict the prognosis of patients with osteosarcoma. Materials and methods. The clinical and gene expression data of patients with osteosarcoma were obtained from the target database. Enrichment analysis of autophagy-related genes related to overall survival (OS-related ARGs) screened by univariate Cox regression was used to determine OS-related ARGs function and signal pathway. In addition, the selected OS-related ARGs were incorporated into multivariate Cox regression to construct prognostic signature for the overall survival (OS) of osteosarcoma. Use the dataset obtained from the GEO database to verify the signature. Besides, gene set enrichment analysis (GSEA) were applied to further elucidate the molecular mechanisms. Finally, the nomogram is established by combining the risk signature with the clinical characteristics. Results. Our study eventually included 85 patients. Survival analysis showed that patients with low riskScore had better OS. In addition, 16 genes were included in OS-related ARGs. We also generate a prognosis signature based on two OS-related ARGs. The signature can significantly divide patients into low-risk groups and high-risk groups, and has been verified in the data set of GEO. Subsequently, the riskScore, primary tumor site and metastasis status were identified as independent prognostic factors for OS and a nomogram were generated. The C-index of nomogram is 0.789 (95% CI: 0.703~0.875), ROC curve and calibration chart shows that nomogram has a good consistency between prediction and observation of patients. Conclusions. ARGs was related to the prognosis of osteosarcoma and can be used as a biomarker of prognosis in patients with osteosarcoma. Nomogram can be used to predict OS of patients and improve treatment strategies.


Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Mahmood Umar ◽  
Nor Bahiah Ahmad ◽  
Anazida Zainal

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset. 


Sign in / Sign up

Export Citation Format

Share Document