Deep learning on chaos game representation for proteins

Hannah F Löchel; Dominic Eger; Theodor Sperlea; Dominik Heider

doi:10.1093/bioinformatics/btz493

Deep learning on chaos game representation for proteins

Bioinformatics ◽

10.1093/bioinformatics/btz493 ◽

2019 ◽

Vol 36 (1) ◽

pp. 272-279 ◽

Cited By ~ 5

Author(s):

Hannah F Löchel ◽

Dominic Eger ◽

Theodor Sperlea ◽

Dominik Heider

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Chemical Properties ◽

Protein Sequences ◽

Machine Learning Techniques ◽

Supplementary Information ◽

Support Vector ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Deep Learning on Chaos Game Representation for Proteins

10.1101/575324 ◽

2019 ◽

Author(s):

Hannah F. Löchel ◽

Dominic Eger ◽

Theodor Sperlea ◽

Dominik Heider

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Chemical Properties ◽

Protein Sequences ◽

Machine Learning Techniques ◽

Support Vector ◽

Chaos Game Representation ◽

Benchmark Datasets ◽

Chaos Game ◽

Game Representation

AbstractClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF), and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs, and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.We could show that all applied machine learning techniques (RF, SVM, and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.

Download Full-text

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Implementing Computational Intelligence Techniques for Security Systems Design - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-2418-3.ch007 ◽

2020 ◽

pp. 146-162

Author(s):

Hesham M. Al-Ammal

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Anomaly Detection ◽

Real Life ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Learning Techniques ◽

Vector Machines

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.

Download Full-text

Chaos Game Representation for Discriminating Thermophilic from Mesophilic Protein Sequences

2009 3rd International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2009.5162487 ◽

2009 ◽

Cited By ~ 1

Author(s):

Xuehai Hu ◽

Jing-Bo Xia ◽

Xiaohui Niu ◽

Xuan Ma ◽

Chao-Hong Song ◽

...

Keyword(s):

Protein Sequences ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Predicting Thermophilic Nucleotide Sequences Based on Chaos Game Representation Features and Support Vector Machine

2011 5th International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2011.5780070 ◽

2011 ◽

Author(s):

JinLong Lu ◽

XueHai Hu ◽

Xiaolei Liu ◽

Feng Shi

Keyword(s):

Support Vector Machine ◽

Nucleotide Sequences ◽

Support Vector ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Improving Intrusion Detection System using an Extreme Learning Machine Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1043.0782s419 ◽

2019 ◽

Vol 8 (2S4) ◽

pp. 234-239

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Back Propagation ◽

Machine Learning Techniques ◽

Support Vector ◽

Extreme Learning Machines ◽

Learning Machines

An Intrusion Detection System (IDS) is a system, that checks the network or data for abnormal actions and when such activity is discovered it issues an alert. Numerous IDS techniques are in use these days but one major problem with all of them is their performance. Various works have been done on this issue using support vector machine and multilayer perceptron. Supervised learning models such as support vector machines with related learning algorithms are used to analyze the data which is used for regression analysis and also classification. The IDS is used in analyzing big data as there is huge traffic which has to be analyzed to check for suspicious activities, and also be successful in doing so. Hence, an efficient and fast classification algorithm is required. Machine learning techniques such as neural networks and extreme machine learning are used. Both of these techniques are highly regarded and are considered one of the best techniques. Extreme learning machines are feed forward neural networks which have one hidden layer and no back propagation used for classification. Once the intrusion is detected using IDS through ELM then we are also going to detect the type of intrusion using the Random Forest Technique (Multi class classification) efficiently with a higher rate of accuracy and precision. The NSL_KDD dataset which is very well-known used for the training as well as testing of these IDS algorithms. This work determines that compared to artificial neural network and logistic regression extreme learning machines provide a much better rate of intrusion detection, which is 93.96% and is also proven to be more efficient in terms of execution time of 38 seconds

Download Full-text

Modelling of soil moisture retention curve using machine learning techniques: Artificial and deep neural networks vs support vector regression models

Computers & Geosciences ◽

10.1016/j.cageo.2019.104320 ◽

2019 ◽

Vol 133 ◽

pp. 104320 ◽

Cited By ~ 10

Author(s):

Kevin O. Achieng

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Soil Moisture ◽

Regression Models ◽

Deep Neural Networks ◽

Machine Learning Techniques ◽

Support Vector ◽

Retention Curve ◽

Learning Techniques ◽

Soil Moisture Retention

Download Full-text

Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2003.09.009 ◽

2004 ◽

Vol 226 (3) ◽

pp. 341-348 ◽

Cited By ~ 99

Author(s):

Zu-Guo Yu ◽

Vo Anh ◽

Ka-Sing Lau

Keyword(s):

Protein Sequences ◽

Chaos Game Representation ◽

Hp Model ◽

Correlation Analyses ◽

Chaos Game ◽

Game Representation

Download Full-text

Machine Learning Techniques for Estimating Amount of Coolant Required in Shipping of Temperature Sensitive Products

International Journal of Emerging Technology and Advanced Engineering ◽

10.46338/ijetae1020_12 ◽

2020 ◽

Vol 10 (10) ◽

pp. 67-70

Author(s):

Divya Choudhary ◽

Siripong Malasri

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Threshold Temperature ◽

Support Vector ◽

Temperature Sensitive ◽

Learning Techniques ◽

The Neural Networks

This paper implements and compares machine learning algorithms to predict the amount of coolant required during transportation of temperature sensitive products. The machine learning models use trip duration, product threshold temperature and ambient temperature as the independent variables to predict the weight of gel packs need to keep the temperature of the product below its threshold temperature value. The weight of the gel packs can be translated to number of gel packs required. Regression using Neural Networks, Support Vector Regression, Gradient Boosted Regression and Elastic Net Regression are compared. The Neural Networks based model performs the best in terms of its mean absolute error value and r-squared values. A Neural Network model is then deployed on as webservice to score allowing for client application to make rest calls to estimate gel pack weights

Download Full-text

TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa542 ◽

2020 ◽

Vol 36 (17) ◽

pp. 4544-4550 ◽

Cited By ~ 1

Author(s):

Divya Sharma ◽

Andrew D Paterson ◽

Wei Xu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Human Microbiome ◽

Machine Learning Techniques ◽

Supplementary Information ◽

Operational Taxonomic Units ◽

Learning Techniques ◽

Conventional Machine ◽

Risk Of Disease ◽

Microbiome Data

Abstract Motivation Research supports the potential use of microbiome as a predictor of some diseases. Motivated by the findings that microbiome data is complex in nature, and there is an inherent correlation due to hierarchical taxonomy of microbial Operational Taxonomic Units (OTUs), we propose a novel machine learning method incorporating a stratified approach to group OTUs into phylum clusters. Convolutional Neural Networks (CNNs) were used to train within each of the clusters individually. Further, through an ensemble learning approach, features obtained from each cluster were then concatenated to improve prediction accuracy. Our two-step approach comprising stratification prior to combining multiple CNNs, aided in capturing the relationships between OTUs sharing a phylum efficiently, as compared to using a single CNN ignoring OTU correlations. Results We used simulated datasets containing 168 OTUs in 200 cases and 200 controls for model testing. Thirty-two OTUs, potentially associated with risk of disease were randomly selected and interactions between three OTUs were used to introduce non-linearity. We also implemented this novel method in two human microbiome studies: (i) Cirrhosis with 118 cases, 114 controls; (ii) type 2 diabetes (T2D) with 170 cases, 174 controls; to demonstrate the model’s effectiveness. Extensive experimentation and comparison against conventional machine learning techniques yielded encouraging results. We obtained mean AUC values of 0.88, 0.92, 0.75, showing a consistent increment (5%, 3%, 7%) in simulations, Cirrhosis and T2D data, respectively, against the next best performing method, Random Forest. Availability and implementation https://github.com/divya031090/TaxoNN_OTU. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text