An Accurate Bioinformatics Tool For Anti-Cancer Peptide Generation Through Deep Learning Omics

ABSTRACTThe Anti-cancer targets play crucial role in signalling processes of cells. We have developed an Anti-Cancer Scanner (ACS) tool for identification of Anti-cancer targets in form of peptides. ACS tool also allows fast fingerprinting of the Anti-cancer targets of significance in the current bioinformatics research. There are tools currently available which predicts the above-mentioned features in single platform. In the present work, we have compared the features predicted by ACS with other on-line available methods and evaluated the performance of the ACS tool. ACS scanned the Anti-cancer target protein sequences provided by the user against the Anti-cancer target data-sets. It has been developed in PERL language and it is scalable having an extensible application in bioinformatics with robust coding architecture. It achieves a prediction accuracy of 95%, which is much higher than the existing tools.

Download Full-text

BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information

International Journal of Molecular Sciences ◽

10.3390/ijms20040978 ◽

2019 ◽

Vol 20 (4) ◽

pp. 978 ◽

Cited By ~ 5

Author(s):

Zhao-Hui Zhan ◽

Li-Na Jia ◽

Yong Zhou ◽

Li-Ping Li ◽

Hai-Cheng Yi

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Prediction Accuracy ◽

Sparse Matrices ◽

Protein Sequences ◽

Biological Research ◽

Sequence Information ◽

Feature Extraction Method ◽

Cellular Processes ◽

High Level

The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to propose an innovative and practical approach with convincing performance of prediction accuracy. In this study, based on the protein sequences from a biological perspective, we put forward an effective deep learning method, named BGFE, to predict ncRNA and protein interactions. Protein sequences are represented by bi-gram probability feature extraction method from Position Specific Scoring Matrix (PSSM), and for ncRNA sequences, k-mers sparse matrices are employed to represent them. Furthermore, to extract hidden high-level feature information, a stacked auto-encoder network is employed with the stacked ensemble integration strategy. We evaluate the performance of the proposed method by using three datasets and a five-fold cross-validation after classifying the features through the random forest classifier. The experimental results clearly demonstrate the effectiveness and the prediction accuracy of our approach. In general, the proposed method is helpful for ncRNA and protein interacting predictions and it provides some serviceable guidance in future biological research.

Download Full-text

Generative Adversarial Domain Adaptation for Nucleus Quantification in Images of Tissue Immunohistochemically Stained for Ki-67

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00108 ◽

2020 ◽

pp. 666-679 ◽

Cited By ~ 2

Author(s):

Xuhong Zhang ◽

Toby C. Cornish ◽

Lin Yang ◽

Tellen D. Bennett ◽

Debashis Ghosh ◽

...

Keyword(s):

Deep Learning ◽

Domain Adaptation ◽

Training Data ◽

Data Sets ◽

Learning Models ◽

Convolutional Network ◽

Ki 67 ◽

Data Set ◽

Target Data ◽

Real Target

PURPOSE We focus on the problem of scarcity of annotated training data for nucleus recognition in Ki-67 immunohistochemistry (IHC)–stained pancreatic neuroendocrine tumor (NET) images. We hypothesize that deep learning–based domain adaptation is helpful for nucleus recognition when image annotations are unavailable in target data sets. METHODS We considered 2 different institutional pancreatic NET data sets: one (ie, source) containing 38 cases with 114 annotated images and the other (ie, target) containing 72 cases with 20 annotated images. The gold standards were manually annotated by 1 pathologist. We developed a novel deep learning–based domain adaptation framework to count different types of nuclei (ie, immunopositive tumor, immunonegative tumor, nontumor nuclei). We compared the proposed method with several recent fully supervised deep learning models, such as fully convolutional network-8s (FCN-8s), U-Net, fully convolutional regression network (FCRN) A, FCRNB, and fully residual convolutional network (FRCN). We also evaluated the proposed method by learning with a mixture of converted source images and real target annotations. RESULTS Our method achieved an F1 score of 81.3% and 62.3% for nucleus detection and classification in the target data set, respectively. Our method outperformed FCN-8s (53.6% and 43.6% for nucleus detection and classification, respectively), U-Net (61.1% and 47.6%), FCRNA (63.4% and 55.8%), and FCRNB (68.2% and 60.6%) in terms of F1 score and was competitive with FRCN (81.7% and 70.7%). In addition, learning with a mixture of converted source images and only a small set of real target labels could further boost the performance. CONCLUSION This study demonstrates that deep learning–based domain adaptation is helpful for nucleus recognition in Ki-67 IHC stained images when target data annotations are not available. It would improve the applicability of deep learning models designed for downstream supervised learning tasks on different data sets.

Download Full-text

In-Pero: Exploiting deep learning embeddings of protein sequences to predict the localisation of peroxisomal proteins

10.1101/2021.01.18.427146 ◽

2021 ◽

Author(s):

Marco Anteghini ◽

Vitor AP Martins dos Santos ◽

Edoardo Saccenti

Keyword(s):

Deep Learning ◽

Mitochondrial Protein ◽

Protein Sequences ◽

Data Sets ◽

Learning Approaches ◽

Protein Amino Acid ◽

Data Set ◽

Link Type ◽

Cellular Localisation ◽

Membrane Bound

AbstractPeroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools. All data sets and codes are available at https://github.com/MarcoAnteghini and at www.systemsbiology.nl

Download Full-text

Confidentiality of Statistical Records: A Threat-Monitoring Scheme for On Line Dialogue

Methods of Information in Medicine ◽

10.1055/s-0038-1635718 ◽

1976 ◽

Vol 15 (01) ◽

pp. 36-42 ◽

Cited By ~ 14

Author(s):

J. Schlörer

Keyword(s):

Statistical Data ◽

Cost Benefit ◽

Data Bank ◽

High Ratio ◽

Point Of View ◽

Data Sets ◽

Monitoring Scheme ◽

Access Controls ◽

On Line ◽

Bona Fide

From a statistical data bank containing only anonymous records, the records sometimes may be identified and then retrieved, as personal records, by on line dialogue. The risk mainly applies to statistical data sets representing populations, or samples with a high ratio n/N. On the other hand, access controls are unsatisfactory as a general means of protection for statistical data banks, which should be open to large user communities. A threat monitoring scheme is proposed, which will largely block the techniques for retrieval of complete records. If combined with additional measures (e.g., slight modifications of output), it may be expected to render, from a cost-benefit point of view, intrusion attempts by dialogue valueless, if not absolutely impossible. The bona fide user has to pay by some loss of information, but considerable flexibility in evaluation is retained. The proposal of controlled classification included in the scheme may also be useful for off line dialogue systems.

Download Full-text

Selection of one-dimensional sedimentation: models for on-line use

Water Science & Technology ◽

10.2166/wst.1995.0100 ◽

1995 ◽

Vol 31 (2) ◽

pp. 193-204 ◽

Cited By ~ 7

Author(s):

Koen Grijspeerdt ◽

Peter Vanrolleghem ◽

Willy Verstraete

Keyword(s):

Steady State ◽

Selection Criteria ◽

Data Sets ◽

Concentration Profiles ◽

A Posteriori ◽

One Dimensional ◽

On Line ◽

Dynamic Concentration ◽

Selection Of ◽

Modelling Task

A comparative study of several recently proposed one-dimensional sedimentation models has been made. This has been achieved by fitting these models to steady-state and dynamic concentration profiles obtained in a down-scaled secondary decanter. The models were evaluated with several a posteriori model selection criteria. Since the purpose of the modelling task is to do on-line simulations, the calculation time was used as one of the selection criteria. Finally, the practical identifiability of the models for the available data sets was also investigated. It could be concluded that the model of Takács et al. (1991) gave the most reliable results.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?

Algorithms ◽

10.3390/a14070212 ◽

2021 ◽

Vol 14 (7) ◽

pp. 212

Author(s):

Youssef Skandarani ◽

Pierre-Marc Jodoin ◽

Alain Lalande

Keyword(s):

Deep Learning ◽

Cardiac Mri ◽

Expert Knowledge ◽

Medical Image Analysis ◽

Ground Truth ◽

Cine Mri ◽

Data Sets ◽

Mri Segmentation ◽

Segmentation Evaluation ◽

Ground Truth Data

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.

Download Full-text

443 An immunotherapy trio in advanced HNSCC for coordinated B and T cell antigen response

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0443 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A469-A469

Author(s):

Bernard Fox ◽

Tarsem Moudgil ◽

Traci Hilton ◽

Noriko Iwamoto ◽

Christopher Paustian ◽

...

Keyword(s):

Clinical Trial ◽

Immune Response ◽

T Cells ◽

T Cell ◽

T Cell Responses ◽

Data Sets ◽

Cell Responses ◽

Anti Cancer ◽

Bead Arrays ◽

Hnscc Cell

BackgroundOutcomes for recurrent or metastatic (R/M) head and neck squamous cell carcinoma (HNSCC) are dismal and responses to anti-PD-1 appear best in tumors with PD-1+ T cells in proximity to PD-L1+ cells, arguing that improved outcome is associated with a pre-existing anti-cancer immune response. Based on this, we hypothesize that vaccines which prime and/or expand T cells to a spectrum of antigens overexpressed by HNSCC combined with T cell agonists, like anti-GITR, that provide costimulatory signals will improve the anti-PD-1 response rates. We have developed a cancer vaccine, DPV-001, that contains more than 300 proteins for genes overexpressed by HNSCC, encapsulated in a CLEC9A-targeted microvesicle and containing TLR/NOD agonists and DAMPs. Recently, we reported that combining anti-GITR + vaccine + anti-PD-1 augmented therapeutic efficacy in a preclinical model and now plan a phase 1b trial of this combination in patients with advanced HNSCC.MethodsSera from patients receiving DPV-001 as adjuvant therapy for definitively treated NSCLC, were analyzed for IgG responses to human proteins by MAP bead arrays and results compared to TCGA gene expression data sets for HNSCC. HNSCC cell lines were evaluated by RNASeq and peptides were eluted from HLA, analyzed by mass spectroscopy and correlated against MAP bead arrays and TCGA data sets. Tumor-reactive T cells from a vaccinated patient were enriched and expanded, and used in cytokine release assay (CRA) against autologous NSCLC and partially HLA matched allogeneic HNSCC cell lines.ResultsPatients receiving DPV-001 (N=13) made 147 IgG responses to at least 70 proteins for genes overexpressed by HNSCC. Preliminary evaluation of the HNSCC peptidome against the results of MAP bead array identify antigens that are target of a humoral immune response. Additionally, tumor-reactive T cells from DPV-001 vaccinated patient recognize two partially HLA-matched HNSCC targets, but not a mis-matched target.ConclusionsRecent observations from our lab and others have correlated IgG Ab responses with T cell responses to epitopes of the same protein. Based on the data summarized above, we hypothesize that we have induced T cell responses against a broad spectrum of shared cancer antigens that are common among adenocarcinomas and squamous cell cancers. Our planned clinical trial will vaccinate and boost the induced responses by costimulation with anti-GITR and then sequence in delayed anti-PD-1 to relieve checkpoint inhibition. MAP bead arrays and the peptidome library generated above will be used to assess anti-cancer B and T cell responses.Trial RegistrationNCT04470024Ethics ApprovalThe original clinical trial was approved by the Providence Portland Medical Center IRB, approval # 13-046. The proposed clinical trial has not yet been reviewed by the IRB.

Download Full-text

VISUAL APPROACH TO SUPERVISED VARIABLE SELECTION BY SELF-ORGANIZING MAP

International Journal of Neural Systems ◽

10.1142/s0129065705000098 ◽

2005 ◽

Vol 15 (01n02) ◽

pp. 101-110 ◽

Cited By ~ 1

Author(s):

TIMO SIMILÄ ◽

SAMPSA LAINE

Keyword(s):

Variable Selection ◽

The Self ◽

Data Sets ◽

Self Organizing Map ◽

Robust Method ◽

Relevant Variables ◽

Visual Approach ◽

Predefined Criterion ◽

Target Data ◽

Self Organizing

Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.

Download Full-text