scholarly journals Support Vector Machine Classifier for Accurate Identification of piRNA

2018 ◽  
Vol 8 (11) ◽  
pp. 2204 ◽  
Author(s):  
Taoying Li ◽  
Mingyue Gao ◽  
Runyu Song ◽  
Qian Yin ◽  
Yan Chen

Piwi-interacting RNA (piRNA) is a newly identified class of small non-coding RNAs. It can combine with PIWI proteins to regulate the transcriptional gene silencing process, heterochromatin modifications, and to maintain germline and stem cell function in animals. To better understand the function of piRNA, it is imperative to improve the accuracy of identifying piRNAs. In this study, the sequence information included the single nucleotide composition, and 16 dinucleotides compositions, six physicochemical properties in RNA, the position specificities of nucleotides both in N-terminal and C-terminal, and the proportions of the similar peptide sequence of both N-terminal and C-terminal in positive and negative samples, which were used to construct the feature vector. Then, the F-Score was applied to choose an optimal single type of features. By combining these selected features, we achieved the best results on the jackknife and the 5-fold cross-validation running 10 times based on the support vector machine algorithm. Moreover, we further evaluated the stability and robustness of our new method.

Agriculture ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 869
Author(s):  
Yun Peng ◽  
Shenyi Zhao ◽  
Jizhan Liu

Proper identification of different grape varieties by smart machinery is of great importance to modern agriculture production. In this paper, a fast and accurate identification method based on Canonical Correlation Analysis (CCA), which can fuse different deep features extracted from Convolutional Neural Network (CNN), plus Support Vector Machine (SVM) is proposed. In this research, based on an open dataset, three types of state-of-the-art CNNs, seven species of deep features, and a multi-class SVM classifier were studied. First, the images were resized to meet the input requirements of a CNN. Then, the deep features of the input images were extracted by a specific deep features layer of the CNN. Next, two kinds of deep features from different networks were fused by CCA to increase the effective classification feature information. Finally, a multi-class SVM classifier was trained with the fused features. When applied to an open dataset, the model outcome shows that the fused deep features with any combination can obtain better identification performance than by using a single type of deep feature. The fusion of fc6 (in AlexNet network) and Fc1000 (in ResNet50 network) deep features obtained the best identification performance. The average F1 Score of 96.9% was 8.7% higher compared to the best performance of a single deep feature, i.e., Fc1000 of ResNet101, which was 88.2%. Furthermore, the F1 Score of the proposed method is 2.7% higher than the best performance obtained by using a CNN directly. The experimental results show that the method proposed in this paper can achieve fast and accurate identification of grape varieties. Based on the proposed algorithm, the smart machinery in agriculture can take more targeted measures based on the different characteristics of different grape varieties for further improvement of the yield and quality of grape production.


2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Hua Tang ◽  
Hao Lin

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine


Author(s):  
Ren-Xiang Yan ◽  
Jing Liu ◽  
Yi-Min Tao

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.


2011 ◽  
Vol 291-294 ◽  
pp. 2746-2749
Author(s):  
Yun Fei Wang ◽  
Li Ping Wang ◽  
Fu Ping Zhong ◽  
Huai Bao Chu

The stability of the slope is a complex system affected by many factors, with the characteristics of randomness and fuzziness. In the paper established the model of the support vector machine, which make use of the support vector machine considering the multiple factors affected the slope stability, and select the indicators with the characteristic of common and easy access. Through the actual inspection verified the validity of the model, shows that the model can be well applied to the analysis of slope stability with similarity, it may provide an important basis for the slope project construction.


2018 ◽  
Vol 18 (1) ◽  
pp. 123-142 ◽  
Author(s):  
Yang Yu ◽  
Ulrike Dackermann ◽  
Jianchun Li ◽  
Ernst Niederleithinger

This article presents a novel assessment framework to identify the health condition of wood utility poles. The innovative approach is based on the integration of data mining and machine learning methods and combines advanced signal processing, multi-sensor data fusion and decision ensembles to classify different damage condition types of wood poles. In the proposed framework, wavelet packet analysis is employed to transform captured multi-channel stress wave signals into energy information, which is consequently compressed by principal component analysis to extract a feature vector. Furthermore, support vector machine multi-classifier, optimized by genetic algorithm, is designed to identify the pole condition type. Finally, evidence theory is applied to fuse different assessment results from different sensors for a final decision. For validation of the proposed approach, the wood pole specimens with three common damage condition types are tested using a novel multi-sensor narrow-band frequency-excitation non-destructive testing system in the laboratory. The final experimental analysis results confirm that the proposed approach is capable of making full use of multi-sensor information and providing an effective and accurate identification on types of conditions in wood poles.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Fengnong Chen ◽  
Pulan Chen ◽  
Hamed Hamid Muhammed ◽  
Juan Zhang

The aim of the paper is to identify the breast malignant and benign lesions using the features of apparent diffusion coefficient (ADC), perfusion fraction f, pseudodiffusion coefficient D⁎, and true diffusion coefficient D from intravoxel incoherent motion (IVIM). There are 69 malignant cases (including 9 early malignant cases) and 35 benign breast cases who underwent diffusion-weighted MRI at 3.0 T with 8 b-values (0~1000 s/mm2). ADC and IVIM parameters were determined in lesions. The early malignant cases are used as advanced malignant and benign tumors, respectively, so as to assess the effectiveness on the result. A predictive model was constructed using Support Vector Machine Binary Classification (SVMBC, also known Support Vector Machine Discriminant Analysis (SVMDA)) and Partial Least Squares Discriminant Analysis (PLSDA) and compared the difference between them both. The D value and ADC provide accurate identification of malignant lesions with b=300, if early malignant tumor was considered as advanced malignant (cancer). The classification accuracy is 93.5% for cross-validation using SVMBC with ADC and tissue diffusivity only. The sensitivity and specificity are 100% and 87.0%, respectively, r2cv=0.8163, and root mean square error of cross-validation (RMSECV) is 0.043. ADC and IVIM provide quantitative measurement of tissue diffusivity for cellularity and are helpful with the method of SVMBC, getting comprehensive and complementary information for differentiation between benign and malignant breast lesions.


Author(s):  
Phasit Charoenkwan ◽  
Nuttapat Anuwongcharoen ◽  
Chanin Nantasenamat ◽  
Md. Mehedi Hasan ◽  
Watshara Shoombuatong

: In light of the growing resistance toward current antiviral drugs, efforts to discover novel and effective antiviral therapeutic agents remain a pressing scientific effort. Antiviral peptides (AVPs) represents promising therapeutic agents due to their extraordinary advantages in terms of potency, efficacy and pharmacokinetic properties. The growing volume of newly discovered peptide sequences in the post-genomic era requires computational approaches for timely and accurate identification of AVPs. Machine learning (ML) methods such as random forest and support vector machine represents robust learning algorithms that are instrumental in successful peptide-based drug discovery. Therefore, this review summarizes the current state-of-the-art on the application of ML methods for identifying AVPs directly from the sequence information. We compare the efficiency of these methods in terms of the underlying characteristics of the dataset used along with feature encoding methods, ML algorithms, cross-validation methods and prediction performance. Finally, guidelines for development of robust AVP models are also discussed. It is anticipated that this review will be serve as a useful guide for the design and development of robust AVP and related therapeutic peptide predictors in the future.


2020 ◽  
Vol 15 (6) ◽  
pp. 563-573
Author(s):  
Chengyan Wu ◽  
Qianzhong Li ◽  
Ru Xing ◽  
Guo-Liang Fan

Background: The non-coding RNA identification at the organelle genome level is a challenging task. In our previous work, an ncRNA dataset with less than 80% sequence identity was built, and a method incorporating an increment of diversity combining with support vector machine method was proposed. Objective: Based on the ncRNA_361 dataset, a novel decision-making method-an improved KNN (iKNN) classifier was proposed. Methods: In this paper, based on the iKNN algorithm, the physicochemical features of nucleotides, the degeneracy of genetic codons, and topological secondary structure were selected to represent the effective ncRNA characters. Then, the incremental feature selection method was utilized to optimize the feature set. Results: The results of iKNN indicated that the decision-making method of mean value is distinctly superior to the traditional decision-making method of majority vote the Increment of Diversity Combining Support Vector Machine (ID-SVM). The iKNN algorithm achieved an overall accuracy of 97.368% in the jackknife test, when k=3. Conclusion: It should be noted that the triplets of the structure-sequence mode under reading frames not only contains the entire sequence information but also reflects whether the base was paired or not, and the secondary structural topological parameters further describe the ncRNA secondary structure on the spatial level. The ncRNA dataset and the iKNN classifier are freely available at http://202.207.14.87:8032/fuwu/iKNN/index.asp.


Sign in / Sign up

Export Citation Format

Share Document