sequence profile
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 23)

H-INDEX

20
(FIVE YEARS 5)

2021 ◽  
Vol 42 (2) ◽  
pp. 242-250
Author(s):  
S.I. Oyedeji ◽  
I.M. Odoh ◽  
A.O. Ojerinde ◽  
H.O. Awobode

The gold standard for malaria diagnosis is evidence of parasitological confirmation but the traditional method by light microscopy and the routinely used rapid diagnostic tests (RDTs) have limitations. Molecular assays are known to have higher sensitivity and specificity but there are indications that they may also be compromised by genetic variability of the target sequence. The aim of this study therefore, was to evaluate the DNA sequence profile of the diagnostic target of the P. falciparum 18S rRNA PCR assay in field isolates from North-Central Nigeria. Blood samples were collected from 324 children presenting with acute febrile illness suspected to be uncomplicated malaria. Light microscopy and 18S rRNA PCR assay were employed to determine the presence of P. falciparum parasites. Sequence profile of the diagnostic target was evaluated by Sanger sequencing of the PCR products on ABI PRISM® 3100 DNA sequencer (PE Applied Biosystems). Of the 324 children enrolled into this study, 134 (41.4%) were positive for P. falciparum by microscopy while 218 (67.3%) were positive by PCR. The sensitivity of microscopy was 61.47%(95% CI= 57.88% - 69.64%) using the PCR assay as reference standard. The degree of agreement between microscopy and PCR as measured by Cohen's kappa was  moderate (κ = 0.502, 95% CI = 0.463 - 0.715).Sequence analysis showed that the DNA target of the P. falciparum 18S rRNA PCR from the field isolates were highly conserved. Only one A>T single nucleotide polymorphism was found within the target sequence  among the isolates in this study. This study showed that the DNA target sequence of the18S rRNA PCR assay is highly conserved in field isolates in the study region suggesting little or no impact of selective pressure acting on the locus and has implications for the enhanced sensitivity of the molecular assay.


2021 ◽  
Author(s):  
Negin Sadat Babaiha ◽  
Rosa Aghdam ◽  
Changiz Eslahchi

AbstractLocalization of messenger RNAs (mRNA) as a widely observed phenomenon is considered as an efficient way to target proteins to a specific region of a cell and is also known as a strategy for gene regulation. The importance of correct intracellular RNA placement in the development of embryonic and neural dendrites has long been demonstrated in former studies. Improper localization of RNA in the cell, which has been shown to occur due to a variety of reasons, including mutations in trans-regulatory elements, is also associated with the occurrence of some neuromuscular diseases as well as cancer. We propose NN-RNALoc, a neural network-based model to predict the cellular location of mRNAs. The features extracted from mRNA sequences along with the information gathered from their proteins are fed to this prediction model. We introduce a novel distance-based sub-sequence profile for representation of RNA sequences which is more memory and time efficient and comparying to the k-mer frequencies, can possibly better encode sequences when the distance k increases. The performance of NN-RNALoc on the following benchmark datsets CeFra-seq and RNALocate, is compared to the results achieved by two powerful prediction models that were proposed in former studies named as mRNALoc and RNATracker The results reveal that the employment of protein-protein interaction information, which plays a crucial role in many biological functions, together with the novel distance-based sub-sequence profiles of mRNA sequences, leads to a more accurate prediction model. Besides, NN-RNALoc significantly reduces the required computing time compared to previous studies. Source code and data used in this study are available at: https://github.com/NeginBabaiha/NN-RNALoc


2021 ◽  
Vol 2068 (1) ◽  
pp. 012011
Author(s):  
Duoduo Hang ◽  
Ji Zhang ◽  
Chuanwen Chang ◽  
Wei Zhu ◽  
Beibei Wu ◽  
...  

Abstract Ship target classification is of great significance in both military and civilian fields. We propose a ship target classification algorithm for low-resolution radars with echo sequence profile images. This algorithm can be realized in the following steps. First, we collect radar profile image data. We use five perspectives of a radar target, including target shape, Radar Cross Section (RCS), echo amplitude, motion attribute, and features of two-dimensional grayscale maps, to extract eight-dimensional feature vectors. The proposed algorithm uses the Support Vector Machine (SVM) as the classifier, and the parameters of the classifier are optimized by either grid search or the Particle Swarm Optimization (PSO) algorithm. The proposed algorithm is verified through real data classification tests.


2021 ◽  
Author(s):  
Fatemeh Zare-Mirakabad ◽  
Armin Behjati ◽  
Seyed Shahriar Arab ◽  
Abbas Nowzari-Dalini

Protein sequences can be viewed as a language; therefore, we benefit from using the models initially developed for natural languages such as transformers. ProtAlbert is one of the best pre-trained transformers on protein sequences, and its efficiency enables us to run the model on longer sequences with less computation power while having similar performance with the other pre-trained transformers. This paper includes two main parts: transformer analysis and profile prediction. In the first part, we propose five algorithms to assess the attention heads in different layers of ProtAlbert for five protein characteristics, nearest-neighbor interactions, type of amino acids, biochemical and biophysical properties of amino acids, protein secondary structure, and protein tertiary structure. These algorithms are performed on 55 proteins extracted from CASP13 and three case study proteins whose sequences, experimental tertiary structures, and HSSP profiles are available. This assessment shows that although the model is only pre-trained on protein sequences, attention heads in the layers of ProtAlbert are representative of some protein family characteristics. This conclusion leads to the second part of our work. We propose an algorithm called PA_SPP for protein sequence profile prediction by pre-trained ProtAlbert using masked-language modeling. PA_SPP algorithm can help the researchers to predict an HSSP profile while there are no similar sequences to a query sequence in the database for making the HSSP profile.


2021 ◽  
Author(s):  
Emidio Capriotti ◽  
Piero Fariselli

Abstract Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. This observation indicates that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.


2021 ◽  
Author(s):  
Marc Creixell ◽  
Aaron Samuel Meyer

Cell signaling is orchestrated in part through a network of protein kinases and phosphatases. Dysregulation of kinase signaling is widespread in diseases such as cancer and is readily targetable through inhibitors of kinase enzymatic activity. Mass spectrometry-based analysis of kinase signaling can provide a global view of kinase signaling regulation but making sense of these data is complicated by its stochastic coverage of the proteome, measurement of substrates rather than kinase signaling itself, and the scale of the data collected. Here, we implement a dual data and motif clustering strategy (DDMC) that simultaneously clusters substrate peptides into similarly regulated groups based on their variation within an experiment and their sequence profile. We show that this can help to identify putative upstream kinases and supply more robust clustering. We apply this clustering to large-scale clinical proteomic profiling of lung cancer and identify conserved proteomic signatures of tumorigenicity, genetic mutations, and tumor immune infiltration. We propose that DDMC provides a general and flexible clustering strategy for the analysis of phosphoproteomic data.


2021 ◽  
Vol 12 ◽  
Author(s):  
Igor B. Rogozin ◽  
Abiel Roche-Lima ◽  
Kathrin Tyryshkin ◽  
Kelvin Carrasquillo-Carrión ◽  
Artem G. Lada ◽  
...  

Cancer genomes harbor numerous genomic alterations and many cancers accumulate thousands of nucleotide sequence variations. A prominent fraction of these mutations arises as a consequence of the off-target activity of DNA/RNA editing cytosine deaminases followed by the replication/repair of edited sites by DNA polymerases (pol), as deduced from the analysis of the DNA sequence context of mutations in different tumor tissues. We have used the weight matrix (sequence profile) approach to analyze mutagenesis due to Activation Induced Deaminase (AID) and two error-prone DNA polymerases. Control experiments using shuffled weight matrices and somatic mutations in immunoglobulin genes confirmed the power of this method. Analysis of somatic mutations in various cancers suggested that AID and DNA polymerases η and θ contribute to mutagenesis in contexts that almost universally correlate with the context of mutations in A:T and G:C sites during the affinity maturation of immunoglobulin genes. Previously, we demonstrated that AID contributes to mutagenesis in (de)methylated genomic DNA in various cancers. Our current analysis of methylation data from malignant lymphomas suggests that driver genes are subject to different (de)methylation processes than non-driver genes and, in addition to AID, the activity of pols η and θ contributes to the establishment of methylation-dependent mutation profiles. This may reflect the functional importance of interplay between mutagenesis in cancer and (de)methylation processes in different groups of genes. The resulting changes in CpG methylation levels and chromatin modifications are likely to cause changes in the expression levels of driver genes that may affect cancer initiation and/or progression.


2021 ◽  
Vol 8 (3) ◽  
pp. 40
Author(s):  
Yuma Takei ◽  
Takashi Ishida

Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.


2021 ◽  
Author(s):  
Xiaodi Yang ◽  
Shiping Yang ◽  
Xianyi Lian ◽  
Stefan Wuchty ◽  
Ziding Zhang

AbstractTo predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron (MLP). Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two types of transfer learning methods (i.e., ‘frozen’ type and ‘fine-tuning’ type) that reliably predict interactions in a target human-virus domain based on training in a source human-virus domain, by retraining CNN layers. Our transfer learning strategies can effectively apply prior knowledge transfer from large source dataset/task to small target dataset/task to improve prediction performance. Finally, we utilize the ‘frozen’ type of transfer learning to predict human-SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions. Source code and datasets are available at https://github.com/XiaodiYangCAU/TransPPI/.


Sign in / Sign up

Export Citation Format

Share Document