Predicting Hub Genes of Glioblastomas Based on Support Vector Machine Combined with CFS algorithms

2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.

2020 ◽  
Vol 16 (5) ◽  
pp. 654-663 ◽  
Author(s):  
Yina Wang ◽  
Benrong Zheng ◽  
Manbin Xu ◽  
Shaoping Cai ◽  
Jeong Younseo ◽  
...  

Background: Renal cell carcinoma (RCC) is the most common malignant tumor of the adult kidney. Objective: The aim of this study was to identify key genes signatures during RCC and uncover their potential mechanisms. Methods: Firstly, the gene expression profiles of GSE53757 which contained 144 samples, including 72 kidney cancer samples and 72 controls, were downloaded from the GEO database. And then differentially expressed genes (DEGs) between the kidney cancer samples and the controls were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key genes of DEGs. In addition, the classification model between the kidney cancer samples and the controls was built by Adaboost based on the selected key genes. Results: 213 DEGs including 80 up-regulated and 133 down-regulated genes were selected as the feature genes to build the classification model between the kidney cancer samples and the controls by CFS method. The accuracy of the classification model by using 5-folds cross-validation test and independent set test is 84.4% and 83.3%, respectively. Besides, TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 also can be found in the top 20 hub genes screened by proteinprotein interaction (PPI) network. Conclusion: It indicated that CFS is a useful tool to identify key genes in kidney cancer. Besides, we also predicted genes such as TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 that might target genes to diagnose the kidney cancer.


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Gaoteng Yuan ◽  
Yihui Liu ◽  
Wei Huang ◽  
Bing Hu

Purpose. The objective of this study is to investigate the use of texture analysis (TA) of magnetic resonance image (MRI) enhanced scan and machine learning methods for distinguishing different grades in breast invasive ductal carcinoma (IDC). Preoperative prediction of the grade of IDC can provide reference for different clinical treatments, so it has important practice values in clinic. Methods. Firstly, a breast cancer segmentation model based on discrete wavelet transform (DWT) and K-means algorithm is proposed. Secondly, TA was performed and the Gabor wavelet analysis is used to extract the texture feature of an MRI tumor. Then, according to the distance relationship between the features, key features are sorted and feature subsets are selected. Finally, the feature subset is classified by using a support vector machine and adjusted parameters to achieve the best classification effect. Results. By selecting key features for classification prediction, the classification accuracy of the classification model can reach 81.33%. 3-, 4-, and 5-fold cross-validation of the prediction accuracy of the support vector machine model is 77.79%~81.94%. Conclusion. The pathological grading of IDC can be predicted and evaluated by texture analysis and feature extraction of breast tumors. This method can provide much valuable information for doctors’ clinical diagnosis. With further development, the model demonstrates high potential for practical clinical use.


2019 ◽  
Vol 17 ◽  
Author(s):  
Yanqiu Yao ◽  
Xiaosa Zhao ◽  
Qiao Ning ◽  
Junping Zhou

Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology. Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on support vector machine. Finally, based on the optimal feature subset, we constructed the effective model by using Support Vector Machine on the training dataset. Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the area under the receiver-operating characteristic curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336. Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.


Molecules ◽  
2012 ◽  
Vol 17 (4) ◽  
pp. 4560-4582 ◽  
Author(s):  
Khac-Minh Thai ◽  
Thuy-Quyen Nguyen ◽  
Trieu-Du Ngo ◽  
Thanh-Dao Tran ◽  
Thi-Ngoc-Phuong Huynh

2019 ◽  
Vol 2 (2) ◽  
pp. 43
Author(s):  
Lalu Mutawalli ◽  
Mohammad Taufan Asri Zaen ◽  
Wire Bagye

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.


Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10441
Author(s):  
Hui Bi ◽  
Min Zhang ◽  
Jialin Wang ◽  
Gang Long

Background This study aims to identify potential biomarkers associated with acute kidney injury (AKI) post kidney transplantation. Material and Methods Two mRNA expression profiles from Gene Expression Omnibus repertory were downloaded, including 20 delayed graft function (DGF) and 68 immediate graft function (IGF) samples. Differentially expressed genes (DEGs) were identified between DGF and IGF group. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis of DEGs were performed. Then, a protein-protein interaction analysis was performed to extract hub genes. The key genes were searched by literature retrieval and cross-validated based on the training dataset. An external dataset was used to validate the expression levels of key genes. Receiver operating characteristic curve analyses were performed to evaluate diagnostic performance of key genes for AKI. Results A total of 330 DEGs were identified between DGF and IGF samples, including 179 up-regulated and 151 down-regulated genes. Of these, OLIG3, EBF3 and ETV1 were transcription factor genes. Moreover, LEP, EIF4A3, WDR3, MC4R, PPP2CB, DDX21 and GPT served as hub genes in PPI network. EBF3 was significantly up-regulated in validation GSE139061 dataset, which was consistently with our initial gene differential expression analysis. Finally, we found that LEP had a great diagnostic value for AKI (AUC = 0.740). Conclusion EBF3 may be associated with the development of AKI following kidney transplantation. Furthermore, LEP had a good diagnostic value for AKI. These findings provide deeper insights into the diagnosis and management of AKI post renal transplantation.


2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Hua Tang ◽  
Hao Lin

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine


2021 ◽  
Vol 39 (11) ◽  
Author(s):  
Sahar Zolfaghari ◽  
Mohammad Hamiruce Marhaban ◽  
Siti Anom Ahmad ◽  
Asnor Juraiza Ishak ◽  
Pegah Khosropanah ◽  
...  

Motor-imagery brain-computer interfaces, as rehabilitation tools for motor-disabled individuals, could inherently enrich neuroplasticity and subsequently restore mobility. However, this endeavour's significant challenge is classifying left and right leg motor imagery tasks from non-stationary EEG signals. A subject-independent feature extraction method is essential in a BCI system, and this work involves developing a subject-independent algorithm to classify left/right leg motion intention. The Multivariate Empirical Mode Decomposition was used to decompose EEG during left and right foot movements during imagery tasks. We validated our proposed algorithm using open-access motor imagery data to detect the user's mental intention from EEG. Five subjects of various performance categories with almost 150 trials for each left/right leg MI of hand/leg/tongue, HaLT Paradigm, utilizing C3, C4, and Cz channels were examined to generalize this study to all subjects. A set of statistical features were extracted from the intrinsic mode functions, and the most relevant features were selected for classification using Sequential Floating Feature Selection. Different classifiers were trained using extracted features, and their performances' were evaluated. The findings suggest that the non-linear support vector machine is the best classification model, resulting in the mean classification sensitivity, specificity, precision, negative predictive value, F-measure, 98.15%, 90.74%, 91.97%, 98.33%, 94.72%, 94.44%, respectively. The proposed subject-independent signal processing method significantly improved the offline calibration mode by eliminating the frequency selection step, making it the common-used method for different types of MI-based BCI participants. Offline evaluations suggest that it can lead to significant increases in classification accuracy in comparison to current approaches.


Sign in / Sign up

Export Citation Format

Share Document