scholarly journals Deep learning on chaos game representation for proteins

2019 ◽  
Vol 36 (1) ◽  
pp. 272-279 ◽  
Author(s):  
Hannah F Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Hannah F. Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF), and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs, and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.We could show that all applied machine learning techniques (RF, SVM, and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


An Intrusion Detection System (IDS) is a system, that checks the network or data for abnormal actions and when such activity is discovered it issues an alert. Numerous IDS techniques are in use these days but one major problem with all of them is their performance. Various works have been done on this issue using support vector machine and multilayer perceptron. Supervised learning models such as support vector machines with related learning algorithms are used to analyze the data which is used for regression analysis and also classification. The IDS is used in analyzing big data as there is huge traffic which has to be analyzed to check for suspicious activities, and also be successful in doing so. Hence, an efficient and fast classification algorithm is required. Machine learning techniques such as neural networks and extreme machine learning are used. Both of these techniques are highly regarded and are considered one of the best techniques. Extreme learning machines are feed forward neural networks which have one hidden layer and no back propagation used for classification. Once the intrusion is detected using IDS through ELM then we are also going to detect the type of intrusion using the Random Forest Technique (Multi class classification) efficiently with a higher rate of accuracy and precision. The NSL_KDD dataset which is very well-known used for the training as well as testing of these IDS algorithms. This work determines that compared to artificial neural network and logistic regression extreme learning machines provide a much better rate of intrusion detection, which is 93.96% and is also proven to be more efficient in terms of execution time of 38 seconds


Author(s):  
Divya Choudhary ◽  
Siripong Malasri

This paper implements and compares machine learning algorithms to predict the amount of coolant required during transportation of temperature sensitive products. The machine learning models use trip duration, product threshold temperature and ambient temperature as the independent variables to predict the weight of gel packs need to keep the temperature of the product below its threshold temperature value. The weight of the gel packs can be translated to number of gel packs required. Regression using Neural Networks, Support Vector Regression, Gradient Boosted Regression and Elastic Net Regression are compared. The Neural Networks based model performs the best in terms of its mean absolute error value and r-squared values. A Neural Network model is then deployed on as webservice to score allowing for client application to make rest calls to estimate gel pack weights


2020 ◽  
Vol 36 (17) ◽  
pp. 4544-4550 ◽  
Author(s):  
Divya Sharma ◽  
Andrew D Paterson ◽  
Wei Xu

Abstract Motivation Research supports the potential use of microbiome as a predictor of some diseases. Motivated by the findings that microbiome data is complex in nature, and there is an inherent correlation due to hierarchical taxonomy of microbial Operational Taxonomic Units (OTUs), we propose a novel machine learning method incorporating a stratified approach to group OTUs into phylum clusters. Convolutional Neural Networks (CNNs) were used to train within each of the clusters individually. Further, through an ensemble learning approach, features obtained from each cluster were then concatenated to improve prediction accuracy. Our two-step approach comprising stratification prior to combining multiple CNNs, aided in capturing the relationships between OTUs sharing a phylum efficiently, as compared to using a single CNN ignoring OTU correlations. Results We used simulated datasets containing 168 OTUs in 200 cases and 200 controls for model testing. Thirty-two OTUs, potentially associated with risk of disease were randomly selected and interactions between three OTUs were used to introduce non-linearity. We also implemented this novel method in two human microbiome studies: (i) Cirrhosis with 118 cases, 114 controls; (ii) type 2 diabetes (T2D) with 170 cases, 174 controls; to demonstrate the model’s effectiveness. Extensive experimentation and comparison against conventional machine learning techniques yielded encouraging results. We obtained mean AUC values of 0.88, 0.92, 0.75, showing a consistent increment (5%, 3%, 7%) in simulations, Cirrhosis and T2D data, respectively, against the next best performing method, Random Forest. Availability and implementation https://github.com/divya031090/TaxoNN_OTU. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document