scholarly journals Prediction of functional microexons by transfer learning

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Qi Cheng ◽  
Bo He ◽  
Chengkui Zhao ◽  
Hongyuan Bi ◽  
Duojiao Chen ◽  
...  

Abstract Background Microexons are a particular kind of exon of less than 30 nucleotides in length. More than 60% of annotated human microexons were found to have high levels of sequence conservation, suggesting their potential functions. There is thus a need to develop a method for predicting functional microexons. Results Given the lack of a publicly available functional label for microexons, we employed a transfer learning skill called Transfer Component Analysis (TCA) to transfer the knowledge obtained from feature mapping for the prediction of functional microexons. To provide reference knowledge, microindels were chosen because of their similarities to microexons. Then, Support Vector Machine (SVM) was used to train a classification model in the newly built feature space for the functional microindels. With the trained model, functional microexons were predicted. We also built a tool based on this model to predict other functional microexons. We then used this tool to predict a total of 19 functional microexons reported in the literature. This approach successfully predicted 16 out of 19 samples, giving accuracy greater than 80%. Conclusions In this study, we proposed a method for predicting functional microexons and applied it, with the predictive results being largely consistent with records in the literature.

2021 ◽  
Vol 11 (3) ◽  
pp. 948-954
Author(s):  
Xiang Chen ◽  
Lijun Xu ◽  
Ming Cao ◽  
Tinghua Zhang ◽  
Zhongan Shang ◽  
...  

At present, the demand for intelligentization of human-computer interaction systems (HCIS) has become increasingly prominent. Being able to recognize the emotions of users of interactive systems is a distinguishing feature of intelligent interactive systems. The intelligent HCIS can analyze the emotional changes of patients with depression, complete the interaction with the patients in a more appropriate manner, and the recognition results can assist family members or medical personnel to make response measures based on the patient’s emotional changes. Based on this background, this paper proposes a sentiment recognition method based on transfer support vector machines (TSVM) and EEG signals. The ER (ER) results based on this method are applied to HCIS. Such a HCIS is mainly used for the interaction of patients with depression. When a new field related to a certain field appears, if the new field data is relabeled, the sample is expensive, and it is very wasteful to discard all the old field data. The main innovation of this research is that the introduced classification model is TSVM. TSVM is a transfer learning strategy based on SVM. Transfer learning aims to solve related but different target domain problems by using a large amount of labeled source domain data. Therefore, the transfer support vector machine based on the transfer mechanism can use the small labeled data of the target domain and a large amount of old data in the related domain to build a high-quality classification model for the target domain, which can effectively improve the accuracy of classification. Comparing the classification results with other classification models, it can be concluded that TSVM can effectively improve the accuracy of ER in patients with depression. The HCIS based on the classification model has higher accuracy and better stability.


In present decade, identification of abnormalities in brain gains significant attention for medical diagnosis. Though numerous existing models are available, only a few methods have been proposed which classifies a set of different kinds of brain defects. This paper introduces an efficient hybridization model for classifying the provided MR brain image as normal or abnormal. The presented model initially makes use of digital wavelet transform (DWT) for extracting features and utilizes principal component analysis (PCA) for feature space reduction. Next, a kernel support vector machine (KSVM) with radial basis function (RBF) kernel is built by artificial bee colony (ABC) for optimizing the parameters namely C and σ. For experimentation, 5-fold cross validation procedure is involved and a detailed investigation of the results takes place by comparing it with the existing models. To select the parameters, ABC algorithm has undergone a comparison with the random selection approach. The presented model is tested using a benchmark MR brain dataset. The experimental values indicated that the ABC is highly efficient for constructing optimal KSVM.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.


Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


2016 ◽  
Vol 25 (3) ◽  
pp. 417-429
Author(s):  
Chong Wu ◽  
Lu Wang ◽  
Zhe Shi

AbstractFor the financial distress prediction model based on support vector machine, there are no theories concerning how to choose a proper kernel function in a data-dependent way. This paper proposes a method of modified kernel function that can availably enhance classification accuracy. We apply an information-geometric method to modifying a kernel that is based on the structure of the Riemannian geometry induced in the input space by the kernel. A conformal transformation of a kernel from input space to higher-dimensional feature space enlarges volume elements locally near support vectors that are situated around the classification boundary and reduce the number of support vectors. This paper takes the Gaussian radial basis function as the internal kernel. Additionally, this paper combines the above method with the theories of standard regularization and non-dimensionalization to construct the new model. In the empirical analysis section, the paper adopts the financial data of Chinese listed companies. It uses five groups of experiments with different parameters to compare the classification accuracy. We can make the conclusion that the model of modified kernel function can effectively reduce the number of support vectors, and improve the classification accuracy.


Molecules ◽  
2012 ◽  
Vol 17 (4) ◽  
pp. 4560-4582 ◽  
Author(s):  
Khac-Minh Thai ◽  
Thuy-Quyen Nguyen ◽  
Trieu-Du Ngo ◽  
Thanh-Dao Tran ◽  
Thi-Ngoc-Phuong Huynh

2015 ◽  
Author(s):  
Lisa M. Breckels ◽  
Sean Holden ◽  
David Wojnar ◽  
Claire M. Mulvey ◽  
Andy Christoforou ◽  
...  

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry


2019 ◽  
Vol 2 (2) ◽  
pp. 43
Author(s):  
Lalu Mutawalli ◽  
Mohammad Taufan Asri Zaen ◽  
Wire Bagye

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.


Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


Sign in / Sign up

Export Citation Format

Share Document