scholarly journals Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Genes ◽  
2011 ◽  
Vol 2 (3) ◽  
pp. 449-501 ◽  
Author(s):  
Alinda Nagy ◽  
György Szláma ◽  
Eszter Szarka ◽  
Mária Trexler ◽  
László Bányai ◽  
...  

In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI’s GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].

Genes ◽  
2011 ◽  
Vol 2 (3) ◽  
pp. 599-607
Author(s):  
Alinda Nagy ◽  
György Szláma ◽  
Eszter Szarka ◽  
Mária Trexler ◽  
László Bányai ◽  
...  

2021 ◽  
Author(s):  
Nathan Jawadi Chadi ◽  
Paul Saighi ◽  
Fabio Rocha Jimenez Vieira ◽  
Juliana Silva Bernardes

The characterization of protein functions is one of the main challenges in bioinformatics. Proteins are often composed of individual units termed domains, motifs that can evolve independently. The domain architecture of a given protein is the particular order and the content of its numerous domains. Some computational approaches predict the most likely domain architecture for a set of proteins. However, a few numbers of visualization tools exist, and most of them are unavailable. Here we present DAVI, an efficient and user-friendly web server for protein domain architecture clustering and visualization. DAVI accepts the output of most used domain architecture prediction tools and also produces domain architectures for a set of protein sequences. It provides a rich visualization for comparing, analyzing, and visualizing domain architectures.


2021 ◽  
Vol 149 ◽  
Author(s):  
Leeberk Raja Inbaraj ◽  
Sindhulina Chandrasingh ◽  
Nalini Arun Kumar ◽  
Jothi Suchitra ◽  
Abi Manesh

Abstract Varicella infection during pregnancy has serious and/or difficult implications and in some cases lethal outcome. Though epidemiological studies in developing countries reveal that a significant proportion of patients may remain susceptible during pregnancy, such an estimate of susceptible women is not known in India. We designed this study to study the prevalence and factors associated with susceptibility to varicella among rural and urban pregnant women in South India. We prospectively recruited 430 pregnant women and analysed their serum varicella IgG antibodies as surrogates for protection. We estimated seroprevalence, the validity of self-reported history of chickenpox and factors associated with varicella susceptibility. We found 23 (95% CI 19.1–27.3) of women were susceptible. Nearly a quarter (22.2%) of the susceptible women had a history of exposure to chickenpox anytime in the past or during the current pregnancy. Self-reported history of varicella had a positive predictive value of 82.4%. Negative history of chickenpox (adjusted prevalence ratio (PR) 1.85, 95% CI 1.15–3.0) and receiving antenatal care from a rural secondary hospital (adjusted PR 4.08, 95% CI 2.1–7.65) were significantly associated with susceptibility. We conclude that high varicella susceptibility rates during pregnancy were noted and self-reported history of varicella may not be a reliable surrogate for protection.


Insects ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 652
Author(s):  
Hongwei Tan ◽  
Muhammad Naeem ◽  
Hussain Ali ◽  
Muhammad Shakeel ◽  
Haiou Kuang ◽  
...  

In Pakistan, Apis cerana, the Asian honeybee, has been used for honey production and pollination services. However, its genomic makeup and phylogenetic relationship with those in other countries are still unknown. We collected A. cerana samples from the main cerana-keeping region in Pakistan and performed whole genome sequencing. A total of 28 Gb of Illumina shotgun reads were generated, which were used to assemble the genome. The obtained genome assembly had a total length of 214 Mb, with a GC content of 32.77%. The assembly had a scaffold N50 of 2.85 Mb and a BUSCO completeness score of 99%, suggesting a remarkably complete genome sequence for A. cerana in Pakistan. A MAKER pipeline was employed to annotate the genome sequence, and a total of 11,864 protein-coding genes were identified. Of them, 6750 genes were assigned at least one GO term, and 8813 genes were annotated with at least one protein domain. Genome-scale phylogeny analysis indicated an unexpectedly close relationship between A. cerana in Pakistan and those in China, suggesting a potential human introduction of the species between the two countries. Our results will facilitate the genetic improvement and conservation of A. cerana in Pakistan.


2006 ◽  
Vol 27 (5) ◽  
pp. 436-441 ◽  
Author(s):  
Lloyd N. Friedman ◽  
Esther R. Nash ◽  
June Bryant ◽  
Susan Henry ◽  
Julia Shi ◽  
...  

Objectives.To evaluate individuals at high risk for tuberculosis exposure who had a history of a positive tuberculin skin test (TST) result in order to determine the prevalence of unsuspected negative TST results. To confirm these findings with the QuantiFERON-TB test (QFT), an in vitro whole-blood assay that measures tuberculin-induced secretion of interferon-γ.Methods.This survey was conducted from November 2001 through December 2003 at 3 sites where TST screening is regularly done. Detailed histories and reviews of medical records were performed. TSTs were placed and read by 2 experienced healthcare workers, and blood was drawn for QFT. Any subject with a negative result of an initial TST during the study (induration diameter, <10 mm) underwent a second TST and a second QFT. The TST-negative group comprised individuals for whom both TSTs had an induration diameter of <10 mm. The confirmed-negative group comprised individuals for whom both TSTs yielded no detectable induration and results of both QFTs were negative.Results.A total of 67 immunocompetent subjects with positive results of a previous TST were enrolled in the study. Of 56 subjects who completed the TST protocol, 25 (44.6%; 95% confidence interval [CI], 31.6%-57.6%) were TST negative (P<.001). Of 31 subjects who completed the TST protocol and the QFT protocol, 8 (25.8%; 95% CI, 10.4%-41.2%) were confirmed negative (P<.005).Conclusions.A significant proportion of subjects with positive results of a previous TST were TST negative in this study, and a subset of these were confirmed negative. These individuals' TST status may have reverted or may never have been positive. It will be important in future studies to determine whether such individuals lack immunity to tuberculosis and whether they should be considered for reentry into tuberculosis screening programs.


2018 ◽  
Vol 115 (26) ◽  
pp. 6703-6708 ◽  
Author(s):  
Andrea Scaiewicz ◽  
Michael Levitt

Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).


Circulation ◽  
2008 ◽  
Vol 118 (suppl_18) ◽  
Author(s):  
Anne G Rosenfeld ◽  
Mohamud Daya ◽  
Vivian Christensen ◽  
Rebecca Rawson

Sudden cardiac death (SCD) is accompanied by preceding symptoms in a significant proportion of victims, with a median duration of up to 2 hours in some cases. The purpose of this study was to describe the characteristics of SCD victims with heralding symptoms who refused medical care. We conducted a secondary data analysis of interview data from witnesses of 99 cases of out-of-hospital presumed myocardial infarction death with known symptoms. Qualitative description methods were used to analyze qualitative data. Logistic regression was used to test the influence of type of symptoms (chest pain vs. non-chest pain), history of heart disease, and age on refusal of medical care. Categorization as refusal of medical care required conversation with someone where refusal was expressed verbally by the victim. There were 19 cases (19%) that refused medical care; their mean age was 72. The majority were male (16/19, 84%). Fifteen cases involved persistent refusal, defined as refusing care until collapse (range of <15 minutes to 60 hours). Four victims initially refused care and then permitted access to medical care. The suggestion for seeking medical care came from someone else in all but one case, and usually included multiple attempts. The care options offered but refused included calling 911 or a doctor, as well as going to the hospital, emergency department or a doctor’s office. Reasons for refusal of medical care (more than one reason in some cases) included stating the symptoms were due to something not urgent (n=10), other obligations (n=3), expressed dislike of hospitals or doctors (n=4), or recent medical reassurance of health status (n=7). Controlling for age, victims with chest pain vs. non-chest pain symptoms were more likely (OR = 3.56, p = .036, 95% CI = 1.09 –11.66) to refuse medical care and those with a history of heart disease were less likely (OR = .16, p = .004, 95% CI = .05–.55) to refuse medical care. Patients with chest pain and no history of heart disease are more likely to refuse advice to seek medical care. Public health messages about how to respond to cardiac symptoms should include strategies to overcome the reasons people refuse medical care. This research has received full or partial funding support from the American Heart Association, AHA Pacific/Mountain Affiliate (Alaska, Arizona, Colorado, Hawaii, Idaho, Montana, Oregon, Washington & Wyoming).


2021 ◽  
pp. 002580242110454
Author(s):  
Laureen Adewusi ◽  
Isabel Mark ◽  
Paige Wells ◽  
Aileen O’Brien

Individuals repeatedly detained under Section 136 (S136) of the Mental Health Act account for a significant proportion of all detentions. This study provides a detailed analysis of those repeatedly detained (‘repeat attenders’) to a London Mental Health Trust, identifying key demographic profiles when compared to non-repeat attenders, describing core clinical characteristics and determining to what degree a past history of abuse might be associated with these. All detentions to the S136 suite at South West London and St George's Mental Health NHS Trust over a 5-year period (2015–2020) were examined. Data were collected retrospectively from electronic records. A total of 1767 patients had been detained, with 81 patients identified as being a ‘repeat attenders’ (having had > = 3 detentions to the S136 suite during the study period). Repeat attenders accounted for 400 detentions, 17.7% of all detentions. Repeat attenders included a higher proportion of females (49.4%, p = 0.0001), compared to non-repeat attenders, and a higher proportion of them were of white ethnicity (85.2%, p = 0.001). 52 (64%) patients reported being a victim of past abuse or trauma. Of repeat attenders who reported past abuse or trauma, a high proportion had diagnoses of personality disorders, with deliberate self-harm as the most common reason for detention. They were more commonly discharged home with community support, rather than considered for hospital admission. In light of these findings, this paper discusses support potential strategies for those most vulnerable to repeated S136 detention, thereby minimising the ever-growing number of S136 detentions in the UK.


2019 ◽  
Vol 50 (8) ◽  
pp. 1390-1397 ◽  
Author(s):  
Joshua T. Jordan ◽  
Dale E. McNiel

AbstractBackgroundMuch of suicide research focuses on suicide attempt (SA) survivors. Given that more than half of the suicide decedent population dies on their first attempt, this means a significant proportion of the population that dies by suicide is overlooked in research. Little is known about persons who die by suicide on their first attempt–and characterizing this understudied population may improve efforts to identify more individuals at risk for suicide.MethodsData were derived from the National Violent Death Reporting System, from 2005 to 2013. Suicide cases were included if they were 18–89 years old, with a known circumstance leading to their death based on law enforcement and/or medical examiner reports. Decedents with and without a history of SA were compared on demographic, clinical, and suicide characteristics, and circumstances that contributed to their suicide.ResultsA total of 73 490 cases met criteria, and 57 920 (79%) died on their first SA. First attempt decedents were more likely to be male, married, African-American, and over 64. Demographic-adjusted models showed that first attempt decedents were more likely to use highly lethal methods, less likely to have a known mental health problem or to have disclosed their intent to others, and more likely to die in the context of physical health or criminal/legal problem.ConclusionsFirst attempt suicide decedents are demographically different from decedents with a history of SA, are more likely to use lethal methods and are more likely to die in the context of specific stressful life circumstances.


2020 ◽  
Vol 48 (W1) ◽  
pp. W72-W76 ◽  
Author(s):  
Vadim M Gumerov ◽  
Igor B Zhulin

Abstract Key steps in a computational study of protein function involve analysis of (i) relationships between homologous proteins, (ii) protein domain architecture and (iii) gene neighborhoods the corresponding proteins are encoded in. Each of these steps requires a separate computational task and sets of tools. Currently in order to relate protein features and gene neighborhoods information to phylogeny, researchers need to prepare all the necessary data and combine them by hand, which is time-consuming and error-prone. Here, we present a new platform, TREND (tree-based exploration of neighborhoods and domains), which can perform all the necessary steps in automated fashion and put the derived information into phylogenomic context, thus making evolutionary based protein function analysis more efficient. A rich set of adjustable components allows a user to run the computational steps specific to his task. TREND is freely available at http://trend.zhulinlab.org.


Sign in / Sign up

Export Citation Format

Share Document