scholarly journals Accurate identification of alternatively spliced exons using support vector machine

2004 ◽  
Vol 21 (7) ◽  
pp. 897-901 ◽  
Author(s):  
G. Dror ◽  
R. Sorek ◽  
R. Shamir
2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Hua Tang ◽  
Hao Lin

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine


2018 ◽  
Vol 18 (1) ◽  
pp. 123-142 ◽  
Author(s):  
Yang Yu ◽  
Ulrike Dackermann ◽  
Jianchun Li ◽  
Ernst Niederleithinger

This article presents a novel assessment framework to identify the health condition of wood utility poles. The innovative approach is based on the integration of data mining and machine learning methods and combines advanced signal processing, multi-sensor data fusion and decision ensembles to classify different damage condition types of wood poles. In the proposed framework, wavelet packet analysis is employed to transform captured multi-channel stress wave signals into energy information, which is consequently compressed by principal component analysis to extract a feature vector. Furthermore, support vector machine multi-classifier, optimized by genetic algorithm, is designed to identify the pole condition type. Finally, evidence theory is applied to fuse different assessment results from different sensors for a final decision. For validation of the proposed approach, the wood pole specimens with three common damage condition types are tested using a novel multi-sensor narrow-band frequency-excitation non-destructive testing system in the laboratory. The final experimental analysis results confirm that the proposed approach is capable of making full use of multi-sensor information and providing an effective and accurate identification on types of conditions in wood poles.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Fengnong Chen ◽  
Pulan Chen ◽  
Hamed Hamid Muhammed ◽  
Juan Zhang

The aim of the paper is to identify the breast malignant and benign lesions using the features of apparent diffusion coefficient (ADC), perfusion fraction f, pseudodiffusion coefficient D⁎, and true diffusion coefficient D from intravoxel incoherent motion (IVIM). There are 69 malignant cases (including 9 early malignant cases) and 35 benign breast cases who underwent diffusion-weighted MRI at 3.0 T with 8 b-values (0~1000 s/mm2). ADC and IVIM parameters were determined in lesions. The early malignant cases are used as advanced malignant and benign tumors, respectively, so as to assess the effectiveness on the result. A predictive model was constructed using Support Vector Machine Binary Classification (SVMBC, also known Support Vector Machine Discriminant Analysis (SVMDA)) and Partial Least Squares Discriminant Analysis (PLSDA) and compared the difference between them both. The D value and ADC provide accurate identification of malignant lesions with b=300, if early malignant tumor was considered as advanced malignant (cancer). The classification accuracy is 93.5% for cross-validation using SVMBC with ADC and tissue diffusivity only. The sensitivity and specificity are 100% and 87.0%, respectively, r2cv=0.8163, and root mean square error of cross-validation (RMSECV) is 0.043. ADC and IVIM provide quantitative measurement of tissue diffusivity for cellularity and are helpful with the method of SVMBC, getting comprehensive and complementary information for differentiation between benign and malignant breast lesions.


Author(s):  
Sunil Kumar ◽  
Maninder Singh

Breast cancer is the leading cause of high fatality among women population. Identification of the benign and malignant tumor at correct time plays a critical role in the diagnosis of breast cancer. In this paper, an attempt has been made to extract the valuable information by selecting the relevant features using our proposed EGWO-SVM (enhanced grey wolf optimization-support vector machine) approach. Grey wolf optimizer (GWO) has gained a lot of popularity among other swarm intelligence methods due to its various characteristics like few tuning parameters, simplicity and easy to use, scalable, and most importantly its ability to provide faster convergence by maintaining the right balance between the exploration and exploitation during the search. Therefore, an enhanced GWO has been proposed in combination with SVM to determine the optimum subset of tumor features for accurate identification of benign and malignant tumor. The proposed approach has been tested and compared with numerous existing, state-of-the-art as well as recently published breast cancer classification approaches on the standard benchmark Wisconsin Diagnostic Breast Cancer (WDBC) database. The proposed approach outperforms all the compared approaches by improving the classification accuracy to 98.24% demonstrating its effectiveness in identifying the breast cancer.


2019 ◽  
Vol 267 ◽  
pp. 01008
Author(s):  
Yan Dou ◽  
Lanzhong Guo ◽  
Yunbo Li ◽  
Yunfei Zhu

Elevator is an important equipment in special equipment, its operation reliability is related to the life safety of elevator passengers; when the elevator fails, how to timely accurate identification of fault types and judgment of fault causes are the key to engineering application and theoretical research. This paper mainly introduces the application of support vector machine in fault diagnosis of elevator key structures.


2018 ◽  
Vol 8 (11) ◽  
pp. 2204 ◽  
Author(s):  
Taoying Li ◽  
Mingyue Gao ◽  
Runyu Song ◽  
Qian Yin ◽  
Yan Chen

Piwi-interacting RNA (piRNA) is a newly identified class of small non-coding RNAs. It can combine with PIWI proteins to regulate the transcriptional gene silencing process, heterochromatin modifications, and to maintain germline and stem cell function in animals. To better understand the function of piRNA, it is imperative to improve the accuracy of identifying piRNAs. In this study, the sequence information included the single nucleotide composition, and 16 dinucleotides compositions, six physicochemical properties in RNA, the position specificities of nucleotides both in N-terminal and C-terminal, and the proportions of the similar peptide sequence of both N-terminal and C-terminal in positive and negative samples, which were used to construct the feature vector. Then, the F-Score was applied to choose an optimal single type of features. By combining these selected features, we achieved the best results on the jackknife and the 5-fold cross-validation running 10 times based on the support vector machine algorithm. Moreover, we further evaluated the stability and robustness of our new method.


2020 ◽  
Vol 27 (4) ◽  
pp. 337-345 ◽  
Author(s):  
Ying Wang ◽  
Juanjuan Kang ◽  
Ning Li ◽  
Yuwei Zhou ◽  
Zhongjie Tang ◽  
...  

Background: Neuropeptides are a class of bioactive peptides produced from neuropeptide precursors through a series of extremely complex processes, mediating neuronal regulations in many aspects. Accurate identification of cleavage sites of neuropeptide precursors is of great significance for the development of neuroscience and brain science. Objective: With the explosive growth of neuropeptide precursor data, it is pretty much needed to develop bioinformatics methods for predicting neuropeptide precursors’ cleavage sites quickly and efficiently. Method : We started with processing the neuropeptide precursor data from SwissProt and NueoPedia into two sets of data, training dataset and testing dataset. Subsequently, six feature extraction schemes were applied to generate different feature sets and then feature selection methods were used to find the optimal feature subset of each. Thereafter the support vector machine was utilized to build models for different feature types. Finally, the performance of models were evaluated with the independent testing dataset. Results: Six models are built through support vector machine. Among them the enhanced amino acid composition-based model reaches the highest accuracy of 91.60% in the 5-fold cross validation. When evaluated with independent testing dataset, it also showed an excellent performance with a high accuracy of 90.37% and Area under Receiver Operating Characteristic curve up to 0.9576. Conclusion: The performance of the developed model was decent. Moreover, for users’ convenience, an online web server called NeuroCS is built, which is freely available at http://i.uestc.edu.cn/NeuroCS/dist/index.html#/. NeuroCS can be used to predict neuropeptide precursors’ cleavage sites effectively.


Agriculture ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 869
Author(s):  
Yun Peng ◽  
Shenyi Zhao ◽  
Jizhan Liu

Proper identification of different grape varieties by smart machinery is of great importance to modern agriculture production. In this paper, a fast and accurate identification method based on Canonical Correlation Analysis (CCA), which can fuse different deep features extracted from Convolutional Neural Network (CNN), plus Support Vector Machine (SVM) is proposed. In this research, based on an open dataset, three types of state-of-the-art CNNs, seven species of deep features, and a multi-class SVM classifier were studied. First, the images were resized to meet the input requirements of a CNN. Then, the deep features of the input images were extracted by a specific deep features layer of the CNN. Next, two kinds of deep features from different networks were fused by CCA to increase the effective classification feature information. Finally, a multi-class SVM classifier was trained with the fused features. When applied to an open dataset, the model outcome shows that the fused deep features with any combination can obtain better identification performance than by using a single type of deep feature. The fusion of fc6 (in AlexNet network) and Fc1000 (in ResNet50 network) deep features obtained the best identification performance. The average F1 Score of 96.9% was 8.7% higher compared to the best performance of a single deep feature, i.e., Fc1000 of ResNet101, which was 88.2%. Furthermore, the F1 Score of the proposed method is 2.7% higher than the best performance obtained by using a CNN directly. The experimental results show that the method proposed in this paper can achieve fast and accurate identification of grape varieties. Based on the proposed algorithm, the smart machinery in agriculture can take more targeted measures based on the different characteristics of different grape varieties for further improvement of the yield and quality of grape production.


Sign in / Sign up

Export Citation Format

Share Document