scholarly journals DeepGS: Predicting phenotypes from genotypes using Deep Learning

2017 ◽  
Author(s):  
Wenlong Ma ◽  
Zhixu Qiu ◽  
Jie Song ◽  
Qian Cheng ◽  
Chuang Ma

AbstractMotivationGenomic selection (GS) is a new breeding strategy by which the phenotypes of quantitative traits are usually predicted based on genome-wide markers of genotypes using conventional statistical models. However, the GS prediction models typically make strong assumptions and perform linear regression analysis, limiting their accuracies since they do not capture the complex, non-linear relationships within genotypes, and between genotypes and phenotypes.ResultsWe present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypic markers when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional marker data. We used a large GS dataset to train DeepGS and compare its performance with other methods. In terms of mean normalized discounted cumulative gain value, DeepGS achieves an increase of 27.70%~246.34% over a conventional neural network in selecting top-ranked 1% individuals with high phenotypic values for the eight tested traits. Additionally, compared with the widely used method RR-BLUP, DeepGS still yields a relative improvement ranging from 1.44% to 65.24%. Through extensive simulation experiments, we also demonstrated the effectiveness and robustness of DeepGS for the absent of outlier individuals and subsets of genotypic markers. Finally, we illustrated the complementarity of DeepGS and RR-BLUP with an ensemble learning approach for further improving prediction performance.AvailabilityDeepGS is provided as an open source R package available at https://github.com/cma2015/DeepGS.

Author(s):  
Tahani Aljohani ◽  
Alexandra I. Cristea

Massive Open Online Courses (MOOCs) have become universal learning resources, and the COVID-19 pandemic is rendering these platforms even more necessary. In this paper, we seek to improve Learner Profiling (LP), i.e. estimating the demographic characteristics of learners in MOOC platforms. We have focused on examining models which show promise elsewhere, but were never examined in the LP area (deep learning models) based on effective textual representations. As LP characteristics, we predict here the employment status of learners. We compare sequential and parallel ensemble deep learning architectures based on Convolutional Neural Networks and Recurrent Neural Networks, obtaining an average high accuracy of 96.3% for our best method. Next, we predict the gender of learners based on syntactic knowledge from the text. We compare different tree-structured Long-Short-Term Memory models (as state-of-the-art candidates) and provide our novel version of a Bi-directional composition function for existing architectures. In addition, we evaluate 18 different combinations of word-level encoding and sentence-level encoding functions. Based on these results, we show that our Bi-directional model outperforms all other models and the highest accuracy result among our models is the one based on the combination of FeedForward Neural Network and the Stack-augmented Parser-Interpreter Neural Network (82.60% prediction accuracy). We argue that our prediction models recommended for both demographics characteristics examined in this study can achieve high accuracy. This is additionally also the first time a sound methodological approach toward improving accuracy for learner demographics classification on MOOCs was proposed.


Mathematics ◽  
2019 ◽  
Vol 7 (10) ◽  
pp. 898 ◽  
Author(s):  
Suhwan Ji ◽  
Jongmin Kim ◽  
Hyeonseung Im

Bitcoin has recently received a lot of attention from the media and the public due to its recent price surge and crash. Correspondingly, many researchers have investigated various factors that affect the Bitcoin price and the patterns behind its fluctuations, in particular, using various machine learning methods. In this paper, we study and compare various state-of-the-art deep learning methods such as a deep neural network (DNN), a long short-term memory (LSTM) model, a convolutional neural network, a deep residual network, and their combinations for Bitcoin price prediction. Experimental results showed that although LSTM-based prediction models slightly outperformed the other prediction models for Bitcoin price prediction (regression), DNN-based models performed the best for price ups and downs prediction (classification). In addition, a simple profitability analysis showed that classification models were more effective than regression models for algorithmic trading. Overall, the performances of the proposed deep learning-based prediction models were comparable.


Author(s):  
Shaun C. D'Souza

Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.


2022 ◽  
Vol 12 ◽  
Author(s):  
Wei Lu ◽  
Rongting Du ◽  
Pengshuai Niu ◽  
Guangnan Xing ◽  
Hui Luo ◽  
...  

Soybean yield is a highly complex trait determined by multiple factors such as genotype, environment, and their interactions. The earlier the prediction during the growing season the better. Accurate soybean yield prediction is important for germplasm innovation and planting environment factor improvement. But until now, soybean yield has been determined by weight measurement manually after soybean plant harvest which is time-consuming, has high cost and low precision. This paper proposed a soybean yield in-field prediction method based on bean pods and leaves image recognition using a deep learning algorithm combined with a generalized regression neural network (GRNN). A faster region-convolutional neural network (Faster R-CNN), feature pyramid network (FPN), single shot multibox detector (SSD), and You Only Look Once (YOLOv3) were employed for bean pods recognition in which recognition precision and speed were 86.2, 89.8, 80.1, 87.4%, and 13 frames per second (FPS), 7 FPS, 24 FPS, and 39 FPS, respectively. Therefore, YOLOv3 was selected considering both recognition precision and speed. For enhancing detection performance, YOLOv3 was improved by changing IoU loss function, using the anchor frame clustering algorithm, and utilizing the partial neural network structure with which recognition precision increased to 90.3%. In order to improve soybean yield prediction precision, leaves were identified and counted, moreover, pods were further classified as single, double, treble, four, and five seeds types by improved YOLOv3 because each type seed weight varies. In addition, soybean seed number prediction models of each soybean planter were built using PLSR, BP, and GRNN with the input of different type pod numbers and leaf numbers with which prediction results were 96.24, 96.97, and 97.5%, respectively. Finally, the soybean yield of each planter was obtained by accumulating the weight of all soybean pod types and the average accuracy was up to 97.43%. The results show that it is feasible to predict the soybean yield of plants in situ with high precision by fusing the number of leaves and different type soybean pods recognized by a deep neural network combined with GRNN which can speed up germplasm innovation and planting environmental factor optimization.


2018 ◽  
Author(s):  
Shaun C. D'Souza

Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.


Computers ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 99
Author(s):  
Sultan Daud Khan ◽  
Louai Alarabi ◽  
Saleh Basalamah

COVID-19 caused the largest economic recession in the history by placing more than one third of world’s population in lockdown. The prolonged restrictions on economic and business activities caused huge economic turmoil that significantly affected the financial markets. To ease the growing pressure on the economy, scientists proposed intermittent lockdowns commonly known as “smart lockdowns”. Under smart lockdown, areas that contain infected clusters of population, namely hotspots, are placed on lockdown, while economic activities are allowed to operate in un-infected areas. In this study, we proposed a novel deep learning prediction framework for the accurate prediction of hotpots. We exploit the benefits of two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) and propose a hybrid framework that has the ability to extract multi time-scale features from convolutional layers of CNN. The multi time-scale features are then concatenated and provide as input to 2-layers LSTM model. The LSTM model identifies short, medium and long-term dependencies by learning the representation of time-series data. We perform a series of experiments and compare the proposed framework with other state-of-the-art statistical and machine learning based prediction models. From the experimental results, we demonstrate that the proposed framework beats other existing methods with a clear margin.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Lei Wang ◽  
Juhua Zhang

Abstract Background One of the main challenges for the CRISPR-Cas9 system is selecting optimal single-guide RNAs (sgRNAs). Recently, deep learning has enhanced sgRNA prediction in eukaryotes. However, the prokaryotic chromatin structure is different from eukaryotes, so models trained on eukaryotes may not apply to prokaryotes. Results We designed and implemented a convolutional neural network to predict sgRNA activity in Escherichia coli. The network was trained and tested on the recently-released sgRNA activity dataset. Our convolutional neural network achieved excellent performance, yielding average Spearman correlation coefficients of 0.5817, 0.7105, and 0.3602, respectively for Cas9, eSpCas9 and Cas9 with a recA coding region deletion. We confirmed that the sgRNA prediction models trained on prokaryotes do not apply to eukaryotes and vice versa. We adopted perturbation-based approaches to analyze distinct biological patterns between prokaryotic and eukaryotic editing. Then, we improved the predictive performance of the prokaryotic Cas9 system by transfer learning. Finally, we determined that potential off-target scores accumulated on a genome-wide scale affect on-target activity, which could slightly improve on-target predictive performance. Conclusions We developed convolutional neural networks to predict sgRNA activity for wild type and mutant Cas9 in prokaryotes. Our results show that the prediction accuracy of our method is improved over state-of-the-art models.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 862
Author(s):  
Tong Liu ◽  
Zheng Wang

We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.


2020 ◽  
Vol 34 (08) ◽  
pp. 13294-13299
Author(s):  
Hangzhi Guo ◽  
Alexander Woodruff ◽  
Amulya Yadav

Farmer suicides have become an urgent social problem which governments around the world are trying hard to solve. Most farmers are driven to suicide due to an inability to sell their produce at desired profit levels, which is caused by the widespread uncertainty/fluctuation in produce prices resulting from varying market conditions. To prevent farmer suicides, this paper takes the first step towards resolving the issue of produce price uncertainty by presenting PECAD, a deep learning algorithm for accurate prediction of future produce prices based on past pricing and volume patterns. While previous work presents machine learning algorithms for prediction of produce prices, they suffer from two limitations: (i) they do not explicitly consider the spatio-temporal dependence of future prices on past data; and as a result, (ii) they rely on classical ML prediction models which often perform poorly when applied to spatio-temporal datasets. PECAD addresses these limitations via three major contributions: (i) we gather real-world daily price and (produced) volume data of different crops over a period of 11 years from an official Indian government administered website; (ii) we pre-process this raw dataset via state-of-the-art imputation techniques to account for missing data entries; and (iii) PECAD proposes a novel wide and deep neural network architecture which consists of two separate convolutional neural network models (trained for pricing and volume data respectively). Our simulation results show that PECAD outperforms existing state-of-the-art baseline methods by achieving significantly lesser root mean squared error (RMSE) - PECAD achieves ∼25% lesser coefficient of variance than state-of-the-art baselines. Our work is done in collaboration with a non-profit agency that works on preventing farmer suicides in the Indian state of Jharkhand, and PECAD is currently being reviewed by them for potential deployment.


2019 ◽  
Vol 20 (5) ◽  
pp. 1070 ◽  
Author(s):  
Cheng Peng ◽  
Siyu Han ◽  
Hui Zhang ◽  
Ying Li

Non-coding RNAs (ncRNAs) play crucial roles in multiple fundamental biological processes, such as post-transcriptional gene regulation, and are implicated in many complex human diseases. Mostly ncRNAs function by interacting with corresponding RNA-binding proteins. The research on ncRNA–protein interaction is the key to understanding the function of ncRNA. However, the biological experiment techniques for identifying RNA–protein interactions (RPIs) are currently still expensive and time-consuming. Due to the complex molecular mechanism of ncRNA–protein interaction and the lack of conservation for ncRNA, especially for long ncRNA (lncRNA), the prediction of ncRNA–protein interaction is still a challenge. Deep learning-based models have become the state-of-the-art in a range of biological sequence analysis problems due to their strong power of feature learning. In this study, we proposed a hierarchical deep learning framework RPITER to predict RNA–protein interaction. For sequence coding, we improved the conjoint triad feature (CTF) coding method by complementing more primary sequence information and adding sequence structure information. For model design, RPITER employed two basic neural network architectures of convolution neural network (CNN) and stacked auto-encoder (SAE). Comprehensive experiments were performed on five benchmark datasets from PDB and NPInter databases to analyze and compare the performances of different sequence coding methods and prediction models. We found that CNN and SAE deep learning architectures have powerful fitting abilities for the k-mer features of RNA and protein sequence. The improved CTF coding method showed performance gain compared with the original CTF method. Moreover, our designed RPITER performed well in predicting RNA–protein interaction (RPI) and could outperform most of the previous methods. On five widely used RPI datasets, RPI369, RPI488, RPI1807, RPI2241 and NPInter, RPITER obtained A U C of 0.821, 0.911, 0.990, 0.957 and 0.985, respectively. The proposed RPITER could be a complementary method for predicting RPI and constructing RPI network, which would help push forward the related biological research on ncRNAs and lncRNAs.


Sign in / Sign up

Export Citation Format

Share Document