Improving secretory proteins prediction in Mycobacterium tuberculosis using the unbiased dipeptide composition with support vector machine

2018 ◽  
Vol 21 (3) ◽  
pp. 212 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Muhammad Arif ◽  
Zakir Ali ◽  
Farman Ali ◽  
...  
2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
Hua Tang ◽  
Hao Lin

Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine


Author(s):  
Vipul Nilkanth ◽  
Shekhar Mande

Elucidation of signalling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection and therefore to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning based approach and predicted the phosphosites for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure-sequence data. This approach utilizes features derived from kinase 3D-structure environment and known phosphosite sequences to generate Support Vector Machine based kinase specific predictions of phosphosites making it suitable for prediction of phosphosites of STPKs with no or scarce data of their phosphosites. Support vector machine outperformed the four machine learning algorithms we tried (random forest, logistic regression, support vector machine and k-nearest neighbours) with aucROC value of 0.88 on the independent testing dataset and a ten-fold cross validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form an useful resource for experimental biologists enabling elucidation of STPK mediated post-translational regulation of important cellular processes. The training features file and model files, together with usage instructions file, are available at: https://github.com/vipulbiocoder/Mtb-KSPP


2019 ◽  
Vol 16 (4) ◽  
pp. 325-331 ◽  
Author(s):  
Xianfang Wang ◽  
Hongfei Li ◽  
Peng Gao ◽  
Yifeng Liu ◽  
Wenjing Zeng

The catalytic activity of the enzyme is different from that of the inorganic catalyst. In a high-temperature, over-acid or over-alkaline environment, the structure of the enzyme is destroyed and then loses its activity. Although the biochemistry experiments can measure the optimal PH environment of the enzyme, these methods are inefficient and costly. In order to solve these problems, computational model could be established to determine the optimal acidic or alkaline environment of the enzyme. Firstly, in this paper, we introduced a new feature called dual g-gap dipeptide composition to formulate enzyme samples. Subsequently, the best feature was selected by using the F value calculated from analysis of variance. Finally, support vector machine was utilized to build prediction model for distinguishing acidic from alkaline enzyme. The overall accuracy of 95.9% was achieved with Jackknife cross-validation, which indicates that our method is professional and efficient in terms of acid and alkaline enzyme predictions. The feature proposed in this paper could also be applied in other fields of bioinformatics.


2020 ◽  
Vol 7 (3) ◽  
pp. 320
Author(s):  
Favorisen R. Lumbanraja ◽  
Ira Hariati Br Sitepu ◽  
Didik Kurniawan ◽  
Aristoteles Aristoteles

<p><em>Tuberkulosis (TB atau TBC) merupakan salah satu penyakit infeksi yang disebabkan oleh Bakteri Mycobacterium tuberculosis. Bakteri tersebut merupakan bakteri yang sangat kuat sehingga dalam pengobatannya memerlukan waktu yang cukup lama. Pengobatan penyakit tuberkulosis dilakukan selama 6-9 bulan secara rutin dengan sedikitnya 3 macam jenis obat. Saat ini kebanyakan masyarakat menganggap batuk dalam jangka waktu berbulan-bulan merupakan batuk biasa, jika dicermati salah satu gejala yang ditimbulkan penyakit tuberkulosis, yaitu batuk dalam jangka waktu yang panjang. Pada penelitian ini digunakan data penderita tuberkulosis di Kota Bandar Lampung, data cuaca dan matrix jarak antara kejadian penderita tuberkulosis yang satu dengan kejadian yang lainnya dalam lingkup kecamatan. Jumlah dari keseluruhan data sebanyak 600 data dengan 44 variabel. Penelitian ini juga menggunakan 3 kernel yaitu, Linear, Gaussian, dan Polynomial dengan menggunakan Metode SVM dengan kernel Linear mendapatkan nilai rata-rata R<sup>2</sup> sebesar 51.43 %, pada percobaan dengan metode SVM dengan kernel Gaussian mendapatkan nilai rata-rata R<sup>2</sup> sebesar 58.53 % dan pada percobaan dengan metode SVM dengan kernel Polynomial mendapatkan nilai rata-rata R<sup>2</sup> sebesar 36.03 %.</em></p><p><strong><em>Kata Kunci</em></strong><em> : Prediksi penderita tuberculosis, tuberculosis, Machine Learning, Support Vector Machine.</em></p><p class="Abstrak"><em>Tuberculosis (TB / TBC) is one of infectious disease caused by Mycobacterium tuberculosis bacteria. These bacteria are very strong bacteria so for the treatment takes a long time. Tuberculosis treatment is carried out for 6-9 months regularly with at least 3 types of drugs. Currently, most of people consider a cough for months is a common cough, if looked by one of the symptoms caused by tuberculosis, which is a cough for a long time. In this research, data on tuberculosis patients in the city of Bandar Lampung were used, weather data and the distance matrix between the case of tuberculosis patients with other case within the district. The total number of data is 600 data with 44 variables. This research also uses 3 kernels</em><em> </em><em>namely, Linear, Gaussian, and Polynomial by using the SVM method with the Linear kernel getting an average R<sup>2</sup> value of 51.43%, in the experiment with the SVM method with a gaussian kernel getting an average R<sup>2</sup> value of 58.53% and at Experiments with the SVM method with the Polynomial kernel obtained an average value of R<sup>2</sup> of 36.03%</em><em> .</em></p><p class="Abstrak"><strong><em>Keywords</em></strong><em> : Prediction of tuberculosis sufferers, tuberculosis, Machine Learning, Support Vector Machine.</em></p>


2020 ◽  
Author(s):  
V Vasilevska ◽  
K Schlaaf ◽  
H Dobrowolny ◽  
G Meyer-Lotz ◽  
HG Bernstein ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document