scholarly journals Detecting sequence signals in targeting peptides using deep learning

2019 ◽  
Vol 2 (5) ◽  
pp. e201900429 ◽  
Author(s):  
Jose Juan Almagro Armenteros ◽  
Marco Salvatore ◽  
Olof Emanuelsson ◽  
Ole Winther ◽  
Gunnar von Heijne ◽  
...  

In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.

2019 ◽  
Author(s):  
J.J. Almagro Armenteros ◽  
M. Salvatore ◽  
O. Emanuelsson ◽  
O. Winther ◽  
G. von Heijne ◽  
...  

AbstractIn bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state of art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria and chloroplasts or other plastids.By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, i.e. the one following the initial methionine, has a strong influence on the classification. When subsequently examining all targeting peptides, we observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position two, but only 20% of other plant proteins. Further highlighting the importance of the second residue, we also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide.TargetP 2.0 is available at http://www.cbs.dtu.dk/services/TargetP-2.0/index.php


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2021 ◽  
Vol 11 (5) ◽  
pp. 603
Author(s):  
Chunlei Shi ◽  
Xianwei Xin ◽  
Jiacai Zhang

Machine learning methods are widely used in autism spectrum disorder (ASD) diagnosis. Due to the lack of labelled ASD data, multisite data are often pooled together to expand the sample size. However, the heterogeneity that exists among different sites leads to the degeneration of machine learning models. Herein, the three-way decision theory was introduced into unsupervised domain adaptation in the first time, and applied to optimize the pseudolabel of the target domain/site from functional magnetic resonance imaging (fMRI) features related to ASD patients. The experimental results using multisite fMRI data show that our method not only narrows the gap of the sample distribution among domains but is also superior to the state-of-the-art domain adaptation methods in ASD recognition. Specifically, the ASD recognition accuracy of the proposed method is improved on all the six tasks, by 70.80%, 75.41%, 69.91%, 72.13%, 71.01% and 68.85%, respectively, compared with the existing methods.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2021 ◽  
Vol 11 (8) ◽  
pp. 785
Author(s):  
Quentin Miagoux ◽  
Vidisha Singh ◽  
Dereck de Mézquita ◽  
Valerie Chaudru ◽  
Mohamed Elati ◽  
...  

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.


2012 ◽  
pp. 704-723
Author(s):  
Albert Ali Salah

Biometrics aims at reliable and robust identification of humans from their personal traits, mainly for security and authentication purposes, but also for identifying and tracking the users of smarter applications. Frequently considered modalities are fingerprint, face, iris, palmprint and voice, but there are many other possible biometrics, including gait, ear image, retina, DNA, and even behaviours. This chapter presents a survey of machine learning methods used for biometrics applications, and identifies relevant research issues. The author focuses on three areas of interest: offline methods for biometric template construction and recognition, information fusion methods for integrating multiple biometrics to obtain robust results, and methods for dealing with temporal information. By introducing exemplary and influential machine learning approaches in the context of specific biometrics applications, the author hopes to provide the reader with the means to create novel machine learning solutions to challenging biometrics problems.


Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


Big Data ◽  
2016 ◽  
pp. 1917-1933
Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


Author(s):  
Derya Yiltas-Kaplan

This chapter focuses on the process of the machine learning with considering the architecture of software-defined networks (SDNs) and their security mechanisms. In general, machine learning has been studied widely in traditional network problems, but recently there have been a limited number of studies in the literature that connect SDN security and machine learning approaches. The main reason of this situation is that the structure of SDN has emerged newly and become different from the traditional networks. These structural variances are also summarized and compared in this chapter. After the main properties of the network architectures, several intrusion detection studies on SDN are introduced and analyzed according to their advantages and disadvantages. Upon this schedule, this chapter also aims to be the first organized guide that presents the referenced studies on the SDN security and artificial intelligence together.


2022 ◽  
pp. 171-195
Author(s):  
Jale Bektaş

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.


Sign in / Sign up

Export Citation Format

Share Document