scholarly journals Estimating statistical significance of local protein profile-profile alignments

2018 ◽  
Author(s):  
Mindaugas Margelevičius

Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. A new version (v1.5.1) of the COMER software is available at https://sourceforge.net/projects/comer. The COMER software is also available on Github at https://github.com/minmarg/comer and as a Docker image (https://hub.docker.com/r/minmar/comer).


2001 ◽  
Vol 11 (4) ◽  
pp. 519-530 ◽  
Author(s):  
Ruth M. Younger ◽  
Claire Amadou ◽  
Graeme Bethel ◽  
Anke Ehlers ◽  
Kirsten Fischer Lindahl ◽  
...  

Olfactory receptor (OR) loci frequently cluster and are present on most human chromosomes. They are members of the seven transmembrane receptor (7-TM) superfamily and, as such, are part of one of the largest mammalian multigene families, with an estimated copy number of up to 1000 ORs per haploid genome. As their name implies, ORs are known to be involved in the perception of odors and possibly also in other, nonolfaction-related, functions. Here, we report the characterization of ORs that are part of the MHC-linked OR clusters in human and mouse (partial sequence only). These clusters are of particular interest because of their possible involvement in olfaction-driven mate selection. In total, we describe 50 novel OR loci (36 human, 14 murine), making the human MHC-linked cluster the largest sequenced OR cluster in any organism so far. Comparative and phylogenetic analyses confirm the cluster to be MHC-linked but divergent in both species and allow the identification of at least one ortholog that will be useful for future regulatory and functional studies. Quantitative feature analysis shows clear evidence of duplications of blocks of OR genes and reveals the entire cluster to have a genomic environment that is very different from its neighboring regions. Based on in silico transcript analysis, we also present evidence of extensive long-distance splicing in the 5′-untranslated regions and, for the first time, of alternative splicing within the single coding exon of ORs. Taken together with our previous finding that ORs are also polymorphic, the presented data indicate that the expression, function, and evolution of these interesting genes might be more complex than previously thought.[The sequence data described in this paper have been submitted to the EMBL nucleotide data library under accession nos.Z84475, Z98744, Z98745, AL021807, AL021808, AL022723, AL022727,AL031893, AL035402, AL035542, AL050328, AL050339, AL078630, AL096770,AL121944, AL133160, and AL133267.]



2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.



2016 ◽  
Vol 2 ◽  
pp. e90 ◽  
Author(s):  
Ranko Gacesa ◽  
David J. Barlow ◽  
Paul F. Long

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).



2020 ◽  
Author(s):  
David Curtis

Rare genetic variants in LDLR, APOB and PCSK9 are known causes of familial hypercholesterolaemia and it is expected that rare variants in other genes will also have effects on hyperlipidaemia risk although such genes remain to be identified. The UK Biobank consists of a sample of 500,000 volunteers and exome sequence data is available for 50,000 of them. 11,490 of these were classified as hyperlipidaemia cases on the basis of having a relevant diagnosis recorded and/or taking lipid-lowering medication while the remaining 38,463 were treated as controls. Variants in each gene were assigned weights according to rarity and predicted impact and overall weighted burden scores were compared between cases and controls, including population principal components as covariates. One biologically plausible gene, HUWE1, produced statistically significant evidence for association after correction for testing 22,028 genes with a signed log10 p value (SLP) of -6.15, suggesting a protective effect of variants in this gene. Other genes with uncorrected p<0.001 are arguably also of interest, including LDLR (SLP=3.67), RBP2 (SLP=3.14), NPFFR1 (SLP=3.02) and ACOT9 (SLP=-3.19). Gene set analysis indicated that rare variants in genes involved in metabolism and energy can influence hyperlipidaemia risk. Overall, the results provide some leads which might be followed up with functional studies and which could be tested in additional data sets as these become available. This research has been conducted using the UK Biobank Resource.



2000 ◽  
Vol 10 (12) ◽  
pp. 1968-1978 ◽  
Author(s):  
Anke Ehlers ◽  
Stephan Beck ◽  
Simon A. Forbes ◽  
John Trowsdale ◽  
Armin Volz ◽  
...  

Clusters of olfactory receptor (OR) genes are found on most human chromosomes. They are one of the largest mammalian multigene families. Here, we report a systematic study of polymorphism of OR genes belonging to the largest fully sequenced OR cluster. The cluster contains 36 OR genes, of which two belong to the vomeronasal 1 (V1-OR) family. The cluster is divided into a major and a minor region at the telomeric end of the HLA complex on chromosome 6. These OR genes could be involved in MHC-related mate preferences. The polymorphism screen was carried out with 13 genes from the HLA-linked OR cluster and three genes from chromosomes 7, 17, and 19 as controls. Ten human cell lines, representing 18 different chromosome 6s, were analyzed. They were from various ethnic origins and exhibited different HLA haplotypes. All OR genes tested, including those not linked to the HLA complex, were polymorphic. These polymorphisms were dispersed along the coding region and resulted in up to seven alleles for a given OR gene. Three polymorphisms resulted either in stop codons (genes hs6M1-4P,hs6M1-17) or in a 16–bp deletion (gene hs6M1-19P), possibly leading to lack of ligand recognition by the respective receptors in the cell line donors. In total, 13 HLA-linked OR haplotypes could be defined. Therefore, allelic variation appears to be a general feature of human OR genes.[The sequence data reported in this paper have been submitted to EMBL under accession nos. AC006137, AC004178, AJ132194, AL022727, AL031983,AL035402, AL035542, Z98744, CAB55431, AL050339, AL035402, AL096770,AL133267, AL121944, Z98745, AL021808, and AL021807.]



Author(s):  
Stanley S Levinson

Abstract Background Classical statistics were developed in a time when small sample sizes were the norm; thus, statistical significance typically ensured large clinical effects. Over the past 10–20 years, computational techniques have allowed studies with modest effects to reach statistical significance (usually P &lt; 0.05) by analyzing very large numbers of patients. In this review, I discuss how this came about and provide an intuitive understanding of the strengths and weaknesses of various statistical parameters that provide insight into clinical effect sizes. Content In this review of the literature, a simple web-based program was used for calculations. Examples are shown. Odds and risk ratios are compared with ROC curves to allow better understanding of their predictive value. Summary In these complex times, an intuitive understanding of statistical procedures is increasingly important. This review will attempt to advance the reader’s knowledge so that one can calculate the number needed to treat and its confidence interval, understand the meaning of a modest association, and determine when a study is likely to be accurate but with questionable clinical utility.



2019 ◽  
Vol 11 (1) ◽  
pp. 57-67 ◽  
Author(s):  
Noé Vázquez ◽  
Hugo López-Fernández ◽  
Cristina P. Vieira ◽  
Florentino Fdez-Riverola ◽  
Jorge Vieira ◽  
...  


2018 ◽  
Vol 11 (3) ◽  
pp. 265-270 ◽  
Author(s):  
Justin F Fraser ◽  
Lisa A Collier ◽  
Amy A Gorman ◽  
Sarah R Martha ◽  
Kathleen E Salmeron ◽  
...  

BackgroundIschemic stroke research faces difficulties in translating pathology between animal models and human patients to develop treatments. Mechanical thrombectomy, for the first time, offers a momentary window into the changes occurring in ischemia. We developed a tissue banking protocol to capture intracranial thrombi and the blood immediately proximal and distal to it.ObjectiveTo develop and share a reproducible protocol to bank these specimens for future analysis.MethodsWe established a protocol approved by the institutional review board for tissue processing during thrombectomy (www.clinicaltrials.govNCT03153683). The protocol was a joint clinical/basic science effort among multiple laboratories and the NeuroInterventional Radiology service line. We constructed a workspace in the angiography suite, and developed a step-by-step process for specimen retrieval and processing.ResultsOur protocol successfully yielded samples for analysis in all but one case. In our preliminary dataset, the process produced adequate amounts of tissue from distal blood, proximal blood, and thrombi for gene expression and proteomics analyses. We describe the tissue banking protocol, and highlight training protocols and mechanics of on-call research staffing. In addition, preliminary integrity analyses demonstrated high-quality yields for RNA and protein.ConclusionsWe have developed a novel tissue banking protocol using mechanical thrombectomy to capture thrombus along with arterial blood proximal and distal to it. The protocol provides high-quality specimens, facilitating analysis of the initial molecular response to ischemic stroke in the human condition for the first time. This approach will permit reverse translation to animal models for treatment development.



2002 ◽  
Vol 12 (12) ◽  
pp. 1854-1859
Author(s):  
Esther Betrán ◽  
Kevin Thornton ◽  
Manyuan Long

New genes that originated by various molecular mechanisms are an essential component in understanding the evolution of genetic systems. We investigated the pattern of origin of the genes created by retroposition in Drosophila. We surveyed the wholeDrosophila melanogaster genome for such new retrogenes and experimentally analyzed their functionality and evolutionary process. These retrogenes, functional as revealed by the analysis of expression, substitution, and population genetics, show a surprisingly asymmetric pattern in their origin. There is a significant excess of retrogenes that originate from the X chromosome and retropose to autosomes; new genes retroposed from autosomes are scarce. Further, we found that most of these X-derived autosomal retrogenes had evolved a testis expression pattern. These observations may be explained by natural selection favoring those new retrogenes that moved to autosomes and avoided the spermatogenesis X inactivation, and suggest the important role of genome position for the origin of new genes.[The sequence data from this study have been submitted to GenBank under accession nos. AY150701–AY150797. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: M.-L. Wu, F. Lemeunier, and P. Gibert.]



2019 ◽  
Vol 2019 ◽  
pp. 1-21
Author(s):  
Meng-Qi Yang ◽  
Yong-Mei Song ◽  
Huan-Yu Gao ◽  
Yi-Tao Xue

Objective. Heart failure is a major public health problem worldwide nowadays. However, the morbidity, mortality, and awareness of heart failure are not satisfied as well as the status of current treatments. According to the standard treatment for chronic heart failure (CHFST), Fuzi (the seminal root of Aconitum carmichaelii Debx.) formulae are widely used as a complementary treatment for heart failure in clinical practice for a long time. We are aiming to assess the efficacy and safety of Fuzi formulae (FZF) on the treatment of heart failure according to high-quality randomized controlled trials (RCTs). Methods. RCTs in PubMed, Cochrane Library, China National Knowledge Infrastructure (CNKI), Chinese Scientific Journals Database (VIP), and Wanfang Database were searched from their inception until June 2019. In addition, the U.S. National Library of Medicine (clinicaltrials.gov) and the Chinese Clinical Trial Registry (http://www.chictr.org.cn) were also searched. We included RCTs that test the efficacy and safety of FZF for the treatment of heart failure, compared with placebo, CHFST, or placebo plus CHFST. The methodological quality of included studies were evaluated by the Cochrane Collaboration’s tool for assessing risk of bias. RCTs with Cochrane risk of bias (RoB) score ≥4 were included in the analysis. The meta-analysis was conducted through RevMan 5.2 software. The GRADE approach was used to assess the quality of the evidence. Results. Twelve RCTs with 1490 participants were identified. The studies investigated the efficacy and safety of FZF, such as FZF plus the CHFST vs placebo plus CHFST (n = 4), FZF plus CHFST vs CHFST (n = 6), FZF plus digoxin tablets (DT) plus CHFST vs placebo plus DT plus CHFST (n = 1), and FZF plus placebo plus CHFST vs placebo plus DT plus CHFST (n = 1). Meta-analysis indicated that FZF have additional benefits based on the CHFST in reducing plasma NT-proBNP level, MLHFQ scores, Lee’s heart failure scores (LHFs), and composite cardiac events (CCEs). Meanwhile, it also improved the efficacy on TCM symptoms (TCMs), NYHA functional classification (NYHAfc), 6MWD, and LVEF. Adverse events were reported in 6 out of 12 studies without significant statistical difference. However, after assessing the strength of evidence, it was found that only the quality of evidence for CCEs was high, and the others were either moderate or low or very low. So we could not draw confirmative conclusions on its additional benefits except CCEs. Further clinical trials should be well designed to avoid the issues that were identified in this study. Conclusion. The efficacy and additional benefits of FZF for CCEs were certain according to the high-quality evidence assessed through GRADE. However, the efficacy and additional benefits for the other outcomes were uncertain judging from current studies. In addition, the safety assessment has a great room for improvement. Thus, further research studies are needed to find more convincing proofs.



Sign in / Sign up

Export Citation Format

Share Document