conservation score
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 7)

H-INDEX

6
(FIVE YEARS 0)

2021 ◽  
Vol 17 (10) ◽  
pp. e1009541
Author(s):  
Petar I. Penev ◽  
Claudia Alvarez-Carreño ◽  
Eric Smith ◽  
Anton S. Petrov ◽  
Loren Dean Williams

We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a ‘cost’ of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.


2021 ◽  
Vol 8 ◽  
Author(s):  
Pâmella Borges ◽  
Gabriela Pasqualim ◽  
Ursula Matte

Mucopolysaccharidosis type I (MPS I) is an autosomal recessive disease characterized by the deficiency of alpha-L-iduronidase (IDUA), an enzyme involved in glycosaminoglycan degradation. More than 200 disease-causing variants have been reported and characterized in the IDUA gene. It also has several variants of unknown significance (VUS) and literature conflicting interpretations of pathogenicity. This study evaluated 586 variants obtained from the literature review, five population databases, in addition to dbSNP, Human Genome Mutation Database (HGMD), and ClinVar. For the variants described in the literature, two datasets were created based on the strength of the criteria. The stricter criteria subset had 108 variants with expression study, analysis of healthy controls, and/or complete gene sequence. The less stringent criteria subset had additional 52 variants found in the literature review, HGMD or ClinVar, and dbSNP with an allele frequency higher than 0.001. The other 426 variants were considered VUS. The two strength criteria datasets were used to evaluate 33 programs plus a conservation score. BayesDel (addAF and noAF), PON-P2 (genome and protein), and ClinPred algorithms showed the best sensitivity, specificity, accuracy, and kappa value for both criteria subsets. The VUS were evaluated with these five algorithms. Based on the results, 122 variants had total consensus among the five predictors, with 57 classified as predicted deleterious and 65 as predicted neutral. For variants not included in PON-P2, 88 variants were considered deleterious and 92 neutral by all other predictors. The remaining 124 did not obtain a consensus among predictors.


Author(s):  
Kevin A Murgas ◽  
Yanlin Ma ◽  
Lidea K Shahidi ◽  
Sayan Mukherjee ◽  
Andrew S Allen ◽  
...  

Abstract Motivation Conservation is broadly used to identify biologically important (epi)genomic regions. In the case of tumor growth, preferential conservation of DNA methylation can be used to identify areas of particular functional importance to the tumor. However, reliable assessment of methylation conservation based on multiple tissue samples per patient requires the decomposition of methylation variation at multiple levels. Results We developed a Bayesian hierarchical model that allows for variance decomposition of methylation on three levels: between-patient normal tissue variation, between-patient tumor-effect variation, and within-patient tumor variation. We then defined a model-based conservation score to identify loci of reduced within-tumor methylation variation relative to between-patient variation. We fit the model to multi-sample methylation array data from 21 colorectal cancer (CRC) patients using a Monte Carlo Markov Chain algorithm (Stan). Sets of genes implicated in CRC tumorigenesis exhibited preferential conservation, demonstrating the model’s ability to identify functionally relevant genes based on methylation conservation. A pathway analysis of preferentially conserved genes implicated several CRC relevant pathways and pathways related to neoantigen presentation and immune evasion. Conclusions Our findings suggest that preferential methylation conservation may be used to identify novel gene targets that are not consistently mutated in CRC. The flexible structure makes the model amenable to the analysis of more complex multi-sample data structures. Availability The data underlying this article are available in the NCBI GEO Database, under accession code GSE166212. The R analysis code is available at https://github.com/kevin-murgas/DNAmethylation-hierarchicalmodel. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Asma Ali Hassan Ali ◽  
Muntaser Ibrahim

AbstractIntroductionChemokines are small transmembrane proteins with immune surveillance and immune cell recruitment functions. the expression of CCR5 gene affects virus production and viral load(1). The CCR5 gene contains two introns, three exons, and two promoters, and it is necessary as a co-receptor for the entry of the macrophage-tropic HIV strains. Mutations in the coding region of CCR5 affect the protein structure, which will affect production, chemokine binding, transport, signaling and expression of the CCR5 receptor.MethodsSNPs within CCR5 gene were retrieved from ensemble database. Coding SNPs were analyzed using SNPnexus. Coding non-synonymous SNPs in CCR5 binding domains with Viral gp120 were analyzed using SIFT, PolyPhen and I-mutant tools. Project HOPE then used to modelled the 3D structure of the protein resulting from these SNPs. Non-coding SNPs that affects miRNAs in 3’ rejoin were analyzing using PolymiRTS. SNPs that affect transcription factor binding were analyzed using regulomeDB.Results(178) non-synonyms missense SNPs were found to have deleterious and damaging effect on the structure and function of the protein. In CCR5 binding domains with Viral gp120: 3 SNPs rs145061115, rs199824195 and rs201797884 were found to affect both structure and function and stability of chemokine protein. The 2 SNPs rs185691679 and rs199722070 has a role in disruption and creation of the target sites in miRNA seeds due to their high conservation score.ConclusionMutations in CCR5 gene may explain and represent the molecular basis of the resistance to HIV infection.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1076
Author(s):  
Victor Jaravine ◽  
James Balmford ◽  
Patrick Metzger ◽  
Melanie Boerries ◽  
Harald Binder ◽  
...  

A novel approach is developed to address the challenge of annotating with phenotypic effects those exome variants for which relevant empirical data are lacking or minimal. The predictive annotation method is implemented as a stacked ensemble of supervised base-learners, including distributed random forest and gradient boosting machines. Ensemble models were trained and cross-validated on evidence-based categorical variant effect annotations from the ClinVar database, and were applied to 84 million non-synonymous single nucleotide variants (SNVs). The consensus model combined 39 functional mutation impacts, cross-species conservation score, and gene indispensability score. The indispensability score, accounting for differences in variant pathogenicities including in essential and mutation-tolerant genes, considerably improved the predictions. The consensus combination is consistent with as many input scores as possible while minimizing false predictions. The input scores are ranked based on their ability to predict effects. The score rankings and categorical phenotypic variant effect predictions are aimed for direct use in clinical and biological applications to prioritize human exome variants and mutations.


2020 ◽  
Vol 36 (11) ◽  
pp. 3605-3606
Author(s):  
Pumin Li ◽  
Qi Xu ◽  
Xu Hua ◽  
Zhongwei Xie ◽  
Jie Li ◽  
...  

Abstract Summary The R/Bioconductor package primirTSS is a fast and convenient tool that allows implementation of the analytical method to identify transcription start sites of microRNAs by integrating ChIP-seq data of H3K4me3 and Pol II. It further ensures the precision by employing the conservation score and sequence features. The tool showed a good performance when using H3K4me3 or Pol II Chip-seq data alone as input, which brings convenience to applications where multiple datasets are hard to acquire. This flexible package is provided with both R-programming interfaces as well as graphical web interfaces. Availability and implementation primirTSS is available at: http://bioconductor.org/packages/primirTSS. The documentation of the package including an accompanying tutorial was deposited at: https://bioconductor.org/packages/release/bioc/vignettes/primirTSS/inst/doc/primirTSS.html. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Jose Luis Cabrera Alarcon ◽  
Jose Antonio Enriquez ◽  
Fátima Sánchez-Cabo

ABSTRACTBackgroundPrediction of pathogenic variants is one of the biggest challenges for researchers and clinicians in the time of next-generation sequencing technologies. Stratification of individuals based on truly pathogenic variants might lead to improved, personalized treatments.ResultsWe present Frequency Conservation Score (FCS) and Frequency Conservation Score for Mitochondrial DNA (FCSMt) two methods for the detection of pathogenic single nucleotide variants in nuclear and mitochondrial DNA, respectively. These scores are based in a random forest model trained over a set of potentially relevant predictors: (i) conservation scores (PhastCons and phyloP); (ii) locus variability at each genomic position built from gnomAD database and (iii) physicochemical distance for amino acids substitutions and the impact/consequence over the canonical transcript. FCS showed an AUC of 98% for deleteriousness in an independent validation dataset, outperforming other scores such as metaLR, metaSVM, REVEL, DANN, CADD, SIFT, PROVEAN or FATHMM-MKL. Moreover, FCSMt presented an AUC=0.92 for pathogenic mitochondrial SNVs detection. The tool is available at http://bioinfo.cnic.es/FCSConclusionsFCS and FCS-Mt improve pathogenic mutation detection, allowing the prioritization of relevant variants in Whole Exome and Whole Genome Sequencing Analysis.


2018 ◽  
Vol 19 (10) ◽  
pp. 3273 ◽  
Author(s):  
Cindy Ulloa-Guerrero ◽  
Maria Delgado ◽  
Carlos Jaramillo

Helicobacter pylori cytotoxin-associated gene A protein (CagA) has been associated with the increase in virulence and risk of cancer. It has been demonstrated that CagA’s translocation is dependent on its interaction with phosphatidylserine. We evaluated the variability of the N-terminal CagA in 127 sequences reported in NCBI, by referring to molecular interaction forces with the phosphatidylserine and the docking of three mutations chosen from variations in specific positions. The major sites of conservation of the residues involved in CagA–Phosphatidylserine interaction were 617, 621 and 626 which had no amino acid variation. Position 636 had the lowest conservation score; mutations in this position were evaluated to observe the differences in intermolecular forces for the CagA–Phosphatidylserine complex. We evaluated the docking of three mutations: K636A, K636R and K636N. The crystal and mutation models presented a ΔG of −8.919907, −8.665261, −8.701923, −8.515097 Kcal/mol, respectively, while mutations K636A, K636R, K636N and the crystal structure presented 0, 3, 4 and 1 H-bonds, respectively. Likewise, the bulk effect of the ΔG and amount of H-bonds was estimated in all of the docking models. The type of mutation affected both the ΔG ( χ 2 ( 1 ) = 93.82 , p-value < 2.2 × 10 − 16 ) and the H-bonds ( χ 2 ( 1 ) = 91.93 , p-value < 2.2 × 10 − 16 ). Overall, 76.9% of the strains that exhibit the K636N mutation produced a severe pathology. The average H-bond count diminished when comparing the mutations with the crystal structure of all the docking models, which means that other molecular forces are involved in the CagA–Phosphatidylserine complex interaction.


Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 456 ◽  
Author(s):  
Wanjing Zheng ◽  
Yoko Satta

RIG-I-like receptors (retinoic acid-inducible gene-I-like receptors, or RLRs) are family of pattern-recognition receptors for RNA viruses, consisting of three members: retinoic acid-inducible gene I (RIG-I), melanoma differentiation-associated gene 5 (MDA5) and laboratory of genetics and physiology 2 (LGP2). To understand the role of RLRs in bird evolution, we performed molecular evolutionary analyses on the coding genes of avian RLRs using filtered predicted coding sequences from 62 bird species. Among the three RLRs, conservation score and dN/dS (ratio of nonsynonymous substitution rate over synonymous substitution rate) analyses indicate that avian MDA5 has the highest conservation level in the helicase domain but a lower level in the caspase recruitment domains (CARDs) region, which differs from mammals; LGP2, as a whole gene, has a lower conservation level than RIG-I or MDA5. We found evidence of positive selection across all bird lineages in RIG-I and MDA5 but only on the stem lineage of Galliformes in LGP2, which could be related to the loss of RIG-I in Galliformes. Analyses also suggest that selection relaxation may have occurred in LGP2 during the middle of bird evolution and the CARDs region of MDA5 contains many positively selected sites, which might explain its conservation level. Spearman’s correlation test indicates that species-to-ancestor dN/dS of RIG-I shows a negative correlation with endogenous retroviral abundance in bird genomes, suggesting the possibility of interaction between immunity and endogenous retroviruses during bird evolution.


Author(s):  
Cindy P. Ulloa Guerrero ◽  
Maria del Pilar Delgado ◽  
Carlos A. Jaramillo

Helicobacter pylori cytotoxin-associated gene A protein (CagA) has been associated with the increase in virulence and risk of cancer. It has been demonstrated that CagA's translocation is dependent on its interaction with phosphatidylserine. We evaluated the variability of the N-terminal CagA in 127 sequences reported in NCBI, by referring to molecular interaction forces with the Phosphatidylserine and the docking of 3 mutations chosen from variations in specific positions. The major sites of conservation of the residues involved in CagA-Phosphatidylserine interaction were 617, 621 and 626 which had no amino acid variation. Position 636 had the lowest conservation score, so mutations in this position were evaluated to observe the differences in intermolecular forces of the CagA-Phosphatidylserine complex. We evaluated the docking of 3 mutations: K636A, K636R and K636N. The models of the crystal and mutations presented a &Delta;G of &minus;8.919907, &minus;8.665261, &minus;8.701923, &minus;8.515097 Kcal/mol, respectively, while mutations K636A, K636R, K636N and the crystal structure presented 0, 3, 4 and 1 H-bonds, respectively. Likewise, the bulk effect of the &Delta;G and amount of H-bonds was estimated in all of the docking models. The type of mutation affected both the &Delta;G (&chi;2(1) = 93.82, p-value &lt; 2.2 &times; 10&minus;16) and the H-bonds (&chi;2(1) = 91.93, p-value &lt; 2.2 &times; 10&minus;16). In all the data, 76.9% of the strains that exhibit the K636N mutation produced a severe pathology. The average H-bond count diminished when comparing the mutations with the crystal structure of all the docking models, which means that other molecular forces are involved in the CagA-Phosphatidylserine complex interaction.


Sign in / Sign up

Export Citation Format

Share Document