scholarly journals Identification of natural selection in genomic data with deep convolutional neural network

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Arnaud Nguembang Fadja ◽  
Fabrizio Riguzzi ◽  
Giorgio Bertorelle ◽  
Emiliano Trucchi

Abstract Background With the increase in the size of genomic datasets describing variability in populations, extracting relevant information becomes increasingly useful as well as complex. Recently, computational methodologies such as Supervised Machine Learning and specifically Convolutional Neural Networks have been proposed to make inferences on demographic and adaptive processes using genomic data. Even though it was already shown to be powerful and efficient in different fields of investigation, Supervised Machine Learning has still to be explored as to unfold its enormous potential in evolutionary genomics. Results The paper proposes a method based on Supervised Machine Learning for classifying genomic data, represented as windows of genomic sequences from a sample of individuals belonging to the same population. A Convolutional Neural Network is used to test whether a genomic window shows the signature of natural selection. Training performed on simulated data show that the proposed model can accurately predict neutral and selection processes on portions of genomes taken from real populations with almost 90% accuracy.

2021 ◽  
Author(s):  
Arnaud Nguembang Fadja ◽  
Fabrizio Riguzzi ◽  
Giorgio Bertorelle ◽  
Emiliano Trucchi

Abstract Background: With the increase in the size of genomic datasets describing variability in populations, extracting relevant information becomes increasingly useful as well as complex. Recently, computational methodologies such as Supervised Machine Learning and specifically Convolutional Neural Networks have been proposed to order to make inferences on demographic and adaptive processes using genomic data, Even though it was already shown to be powerful and efficient in different fields of investigation, Supervised Machine Learning has still to be explored as to unfold its enormous potential in evolutionary genomics. Results: The paper proposes a method based on Supervised Machine Learning for classifying genomic data, represented as windows of genomic sequences from a sample of individuals belonging to the same population. A Convolutional Neural Network is used to test whether a genomic window shows the signature of natural selection. Experiments performed on simulated data show that the proposed model can accurately predict neutral and selection processes on genomic data with more than 99% accuracy.


2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Luis Torada ◽  
Lucrezia Lorenzon ◽  
Alice Beddis ◽  
Ulas Isildak ◽  
Linda Pattini ◽  
...  

Abstract Background The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called , to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. Results enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, implements a convolutional neural network which is trained using simulations. We show how the method implemented in can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. Conclusions While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called . The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.


Soft Matter ◽  
2020 ◽  
Vol 16 (7) ◽  
pp. 1751-1759 ◽  
Author(s):  
Eric N. Minor ◽  
Stian D. Howard ◽  
Adam A. S. Green ◽  
Matthew A. Glaser ◽  
Cheol S. Park ◽  
...  

We demonstrate a method for training a convolutional neural network with simulated images for usage on real-world experimental data.


2019 ◽  
Author(s):  
Po-Ting Lai ◽  
Wei-Liang Lu ◽  
Ting-Rung Kuo ◽  
Chia-Ru Chung ◽  
Jen-Chieh Han ◽  
...  

BACKGROUND Research on disease-disease association, like comorbidity and complication, provides important insights into disease treatment and drug discovery, and a large body of literature has been published in the field. However, using current search tools, it is not easy for researchers to retrieve information on the latest disease association findings. For one thing, comorbidity and complication keywords pull up large numbers of PubMed studies. Secondly, disease is not highlighted in search results. Third, disease-disease association (DDA) is not identified, as currently no DDA extraction dataset or tools are available. OBJECTIVE Since there are no available disease-disease association extraction (DDAE) datasets or tools, we aim to develop (1) a DDAE dataset and (2) a neural network model for extracting DDAs from literature. METHODS In this study, we formulate DDAE as a supervised machine learning classification problem. To develop the system, we first build a DDAE dataset. We then employ two machine-learning models, support vector machine (SVM) and convolutional neural network (CNN), to extract DDAs. Furthermore, we evaluate the effect of using the output layer as features of the SVM-based model. Finally, we implement large margin context-aware convolutional neural network (LC-CNN) architecture to integrate context features and CNN through the large margin function. RESULTS Our DDAE dataset consists of 521 PubMed abstracts. Experiment results show that the SVM-based approach achieves an F1-measure of 80.32%, which is higher than the CNN-based approach (73.32%). Using the output layer of CNN as a feature for SVM does not further improve the performance of SVM. However, our LC-CNN achieves the highest F1-measure of 84.18%, and demonstrates combining the hinge loss function of SVM with CNN into a single NN architecture outperforms other approaches. CONCLUSIONS To facilitate the development of text-mining research for DDAE, we develop the first publicly available DDAE dataset consisting of disease mentions, MeSH IDs and relation annotations. We develop different conventional ML models and NN architectures, and evaluate their effects on our DDAE dataset. To further improve DDAE performance, we propose an LC-CNN model for DDAE that outperforms other approaches.


2020 ◽  
Vol 11 (6) ◽  
pp. 17-30
Author(s):  
Imene Elloumi Zitouna

This paper presents an overview of our learning-based orchestrator for intelligent Open vSwitch that we present this using Machine Learning in Software-Defined Networking technology. The first task consists of extracting relevant information from the Data flow generated from a SDN and using them to learn, to predict and to accurately identify the optimal destination OVS using Reinforcement Learning and QLearning Algorithm. The second task consists to select this using our hybrid orchestrator the optimal Intelligent SDN controllers with Supervised Learning. Therefore, we propose as a solution using Intelligent Software-Defined Networking controllers (SDN) frameworks, OpenFlow deployments and a new intelligent hybrid Orchestration for multi SDN controllers. After that, we feeded these feature to a Convolutional Neural Network model to separate the classes that we’re working on. The result was very promising the model achieved an accuracy of 72.7% on a database of 16 classes. In any case, this paper sheds light to researchers looking for the trade-offs between SDN performance and IA customization.


Geophysics ◽  
2021 ◽  
pp. 1-48
Author(s):  
Jan-Willem Vrolijk ◽  
Gerrit Blacquiere

It is well known that source deghosting can best be applied to common-receiver gathers, while receiver deghosting can best be applied to common-shot records. The source-ghost wavefield observed in the common-shot domain contains the imprint of the subsurface, which complicates source deghosting in common-shot domain, in particular when the subsurface is complex. Unfortunately, the alternative, i.e., the common-receiver domain, is often coarsely sampled, which complicates source deghosting in this domain as well. To solve the latter issue, we propose to train a convolutional neural network to apply source deghosting in this domain. We subsample all shot records with and without the receiver ghost wavefield to obtain the training data. Due to reciprocity this training data is a representative data set for source deghosting in the coarse common-receiver domain. We validate the machine-learning approach on simulated data and on field data. The machine learning approach gives a significant uplift to the simulated data compared to conventional source deghosting. The field-data results confirm that the proposed machine-learning approach is able to remove the source-ghost wavefield from the coarsely-sampled common-receiver gathers.


Author(s):  
Satoru Tsuiki ◽  
Takuya Nagaoka ◽  
Tatsuya Fukuda ◽  
Yuki Sakamoto ◽  
Fernanda R. Almeida ◽  
...  

Abstract Purpose In 2-dimensional lateral cephalometric radiographs, patients with severe obstructive sleep apnea (OSA) exhibit a more crowded oropharynx in comparison with non-OSA. We tested the hypothesis that machine learning, an application of artificial intelligence (AI), could be used to detect patients with severe OSA based on 2-dimensional images. Methods A deep convolutional neural network was developed (n = 1258; 90%) and tested (n = 131; 10%) using data from 1389 (100%) lateral cephalometric radiographs obtained from individuals diagnosed with severe OSA (n = 867; apnea hypopnea index > 30 events/h sleep) or non-OSA (n = 522; apnea hypopnea index < 5 events/h sleep) at a single center for sleep disorders. Three kinds of data sets were prepared by changing the area of interest using a single image: the original image without any modification (full image), an image containing a facial profile, upper airway, and craniofacial soft/hard tissues (main region), and an image containing part of the occipital region (head only). A radiologist also performed a conventional manual cephalometric analysis of the full image for comparison. Results The sensitivity/specificity was 0.87/0.82 for full image, 0.88/0.75 for main region, 0.71/0.63 for head only, and 0.54/0.80 for the manual analysis. The area under the receiver-operating characteristic curve was the highest for main region 0.92, for full image 0.89, for head only 0.70, and for manual cephalometric analysis 0.75. Conclusions A deep convolutional neural network identified individuals with severe OSA with high accuracy. Future research on this concept using AI and images can be further encouraged when discussing triage of OSA.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Peter M. Maloca ◽  
Philipp L. Müller ◽  
Aaron Y. Lee ◽  
Adnan Tufail ◽  
Konstantinos Balaskas ◽  
...  

AbstractMachine learning has greatly facilitated the analysis of medical data, while the internal operations usually remain intransparent. To better comprehend these opaque procedures, a convolutional neural network for optical coherence tomography image segmentation was enhanced with a Traceable Relevance Explainability (T-REX) technique. The proposed application was based on three components: ground truth generation by multiple graders, calculation of Hamming distances among graders and the machine learning algorithm, as well as a smart data visualization (‘neural recording’). An overall average variability of 1.75% between the human graders and the algorithm was found, slightly minor to 2.02% among human graders. The ambiguity in ground truth had noteworthy impact on machine learning results, which could be visualized. The convolutional neural network balanced between graders and allowed for modifiable predictions dependent on the compartment. Using the proposed T-REX setup, machine learning processes could be rendered more transparent and understandable, possibly leading to optimized applications.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 949
Author(s):  
Jiangyi Wang ◽  
Min Liu ◽  
Xinwu Zeng ◽  
Xiaoqiang Hua

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.


Sensors ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 210 ◽  
Author(s):  
Zied Tayeb ◽  
Juri Fedjaev ◽  
Nejla Ghaboosi ◽  
Christoph Richter ◽  
Lukas Everding ◽  
...  

Non-invasive, electroencephalography (EEG)-based brain-computer interfaces (BCIs) on motor imagery movements translate the subject’s motor intention into control signals through classifying the EEG patterns caused by different imagination tasks, e.g., hand movements. This type of BCI has been widely studied and used as an alternative mode of communication and environmental control for disabled patients, such as those suffering from a brainstem stroke or a spinal cord injury (SCI). Notwithstanding the success of traditional machine learning methods in classifying EEG signals, these methods still rely on hand-crafted features. The extraction of such features is a difficult task due to the high non-stationarity of EEG signals, which is a major cause by the stagnating progress in classification performance. Remarkable advances in deep learning methods allow end-to-end learning without any feature engineering, which could benefit BCI motor imagery applications. We developed three deep learning models: (1) A long short-term memory (LSTM); (2) a spectrogram-based convolutional neural network model (CNN); and (3) a recurrent convolutional neural network (RCNN), for decoding motor imagery movements directly from raw EEG signals without (any manual) feature engineering. Results were evaluated on our own publicly available, EEG data collected from 20 subjects and on an existing dataset known as 2b EEG dataset from “BCI Competition IV”. Overall, better classification performance was achieved with deep learning models compared to state-of-the art machine learning techniques, which could chart a route ahead for developing new robust techniques for EEG signal decoding. We underpin this point by demonstrating the successful real-time control of a robotic arm using our CNN based BCI.


Sign in / Sign up

Export Citation Format

Share Document