compositional bias
Recently Published Documents


TOTAL DOCUMENTS

68
(FIVE YEARS 13)

H-INDEX

20
(FIVE YEARS 2)

Molecules ◽  
2022 ◽  
Vol 27 (2) ◽  
pp. 423
Author(s):  
Broto Chakrabarty ◽  
Nita Parekh

Ankyrin is one of the most abundant protein repeat families found across all forms of life. It is found in a variety of multi-domain and single domain proteins in humans with diverse number of repeating units. They are observed to occur in several functionally diverse proteins, such as transcriptional initiators, cell cycle regulators, cytoskeletal organizers, ion transporters, signal transducers, developmental regulators, and toxins, and, consequently, defects in ankyrin repeat proteins have been associated with a number of human diseases. In this study, we have classified the human ankyrin proteins into clusters based on the sequence similarity in their ankyrin repeat domains. We analyzed the amino acid compositional bias and consensus ankyrin motif sequence of the clusters to understand the diversity of the human ankyrin proteins. We carried out network-based structural analysis of human ankyrin proteins across different clusters and showed the association of conserved residues with topologically important residues identified by network centrality measures. The analysis of conserved and structurally important residues helps in understanding their role in structural stability and function of these proteins. In this paper, we also discuss the significance of these conserved residues in disease association across the human ankyrin protein clusters.


2021 ◽  
Vol 22 (15) ◽  
pp. 7912
Author(s):  
Rambon Shamilov ◽  
Victoria L. Robinson ◽  
Brian J. Aneskievich

Epidermal keratinocyte proteins include many with an eccentric amino acid content (compositional bias), atypical ultrastructural fate (built-in protease sensitivity), or assembly visible at the light microscope level (cytoplasmic granules). However, when considered through the looking glass of intrinsic disorder (ID), these apparent oddities seem quite expected. Keratinocyte proteins with highly repetitive motifs are of low complexity but high adaptation, providing polymers (e.g., profilaggrin) for proteolysis into bioactive derivatives, or monomers (e.g., loricrin) repeatedly cross-linked to self and other proteins to shield underlying tissue. Keratohyalin granules developing from liquid–liquid phase separation (LLPS) show that unique biomolecular condensates (BMC) and proteinaceous membraneless organelles (PMLO) occur in these highly customized cells. We conducted bioinformatic and in silico assessments of representative keratinocyte differentiation-dependent proteins. This was conducted in the context of them having demonstrated potential ID with the prospect of that characteristic driving formation of distinctive keratinocyte structures. Intriguingly, while ID is characteristic of many of these proteins, it does not appear to guarantee LLPS, nor is it required for incorporation into certain keratinocyte protein condensates. Further examination of keratinocyte-specific proteins will provide variations in the theme of PMLO, possibly recognizing new BMC for advancements in understanding intrinsically disordered proteins as reflected by keratinocyte biology.


2021 ◽  
Author(s):  
Jadranka Rota ◽  
Victoria Gwendoline Twort ◽  
Andrea Chiocchio ◽  
Carlos Pena ◽  
Christopher W. Wheat ◽  
...  

The field of molecular phylogenetics is being revolutionised with next-generation sequencing technologies making it possible to sequence large numbers of genomes for non-model organisms ushering us into the era of phylogenomics. The current challenge is no longer how to get enough data, but rather how to analyse the data and how to assess the support for the inferred phylogeny. We focus on one of the largest animal groups on the planet - butterflies and moths (order Lepidoptera). We clearly demonstrate that there are unresolved issues in the inferred phylogenetic relationships of the major lineages, despite several recent phylogenomic studies of the group. We assess the potential causes and consequences of the conflicting phylogenetic hypotheses. With a dataset consisting of 331 protein-coding genes and the alignment length over 290 000 base pairs, including 200 taxa representing 81% of lepidopteran superfamilies, we compare phylogenetic hypotheses inferred from amino acid and nucleotide alignments. The resulting two phylogenies are discordant, especially with respect to the placement of the superfamily Gelechioidea, which is likely due to compositional bias of both the nucleotide and amino acid sequences. With a series of analyses, we dissect our dataset and demonstrate that there is sufficient phylogenetic signal to resolve much of the lepidopteran tree of life. Overall, the results from the nucleotide alignment are more robust to the various perturbations of the data that we carried out. However, the lack of support for much of the backbone within Ditrysia makes the current butterfly and moth tree of life still unresolved. We conclude that taxon sampling remains an issue even in phylogenomic analyses, and recommend that poorly sampled highly diverse groups, such as Gelechioidea in Lepidoptera, should receive extra attention in the future.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ayan Roy ◽  
Fucheng Guo ◽  
Bhupender Singh ◽  
Shelly Gupta ◽  
Karan Paul ◽  
...  

The novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been spreading rapidly all over the world and has raised grave concern globally. The present research aims to conduct a robust base compositional analysis of SARS-CoV-2 to reveal adaptive intricacies to the human host. Multivariate statistical analysis revealed a complex interplay of various factors including compositional constraint, natural selection, length of viral coding sequences, hydropathicity, and aromaticity of the viral gene products that are operational to codon usage patterns, with compositional bias being the most crucial determinant. UpG and CpA dinucleotides were found to be highly preferred whereas, CpG dinucleotide was mostly avoided in SARS-CoV-2, a pattern consistent with the human host. Strict avoidance of the CpG dinucleotide might be attributed to a strategy for evading a human immune response. A lower degree of adaptation of SARS-CoV-2 to the human host, compared to Middle East respiratory syndrome (MERS) coronavirus and SARS-CoV, might be indicative of its milder clinical severity and progression contrasted to SARS and MERS. Similar patterns of enhanced adaptation between viral isolates from intermediate and human hosts, contrasted with those isolated from the natural bat reservoir, signifies an indispensable role of the intermediate host in transmission dynamics and spillover events of the virus to human populations. The information regarding avoided codon pairs in SARS-CoV-2, as conferred by the present analysis, promises to be useful for the design of vaccines employing codon pair deoptimization based synthetic attenuated virus engineering.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jennifer L. Spillane ◽  
Troy M. LaPolice ◽  
Matthew D. MacManes ◽  
David C. Plachetzki

Abstract Background Phylogenomic approaches have great power to reconstruct evolutionary histories, however they rely on multi-step processes in which each stage has the potential to affect the accuracy of the final result. Many studies have empirically tested and established methodology for resolving robust phylogenies, including selecting appropriate evolutionary models, identifying orthologs, or isolating partitions with strong phylogenetic signal. However, few have investigated errors that may be initiated at earlier stages of the analysis. Biases introduced during the generation of the phylogenomic dataset itself could produce downstream effects on analyses of evolutionary history. Transcriptomes are widely used in phylogenomics studies, though there is little understanding of how a poor-quality assembly of these datasets could impact the accuracy of phylogenomic hypotheses. Here we examined how transcriptome assembly quality affects phylogenomic inferences by creating independent datasets from the same input data representing high-quality and low-quality transcriptome assembly outcomes. Results By studying the performance of phylogenomic datasets derived from alternative high- and low-quality assembly inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with a greater number of unique partitions than low-quality assemblies. High-quality assemblies also give rise to partitions that have lower alignment ambiguity and less compositional bias. In addition, high-quality partitions hold stronger phylogenetic signal than their low-quality transcriptome assembly counterparts in both concatenation- and coalescent-based analyses. Conclusions Our findings demonstrate the importance of transcriptome assembly quality in phylogenomic analyses and suggest that a portion of the uncertainty observed in such studies could be alleviated at the assembly stage.


Author(s):  
Chiara Papetti ◽  
Massimiliano Babbucci ◽  
Agnes Dettai ◽  
Andrea Basso ◽  
Magnus Lucassen ◽  
...  

Abstract The vertebrate mitochondrial genomes generally present a typical gene order. Exceptions are uncommon and important to study the genetic mechanisms of gene order rearrangements and their consequences on phylogenetic output and mitochondrial function. Antarctic notothenioid fish carry some peculiar rearrangements of the mitochondrial gene order. In this first systematic study of 28 species, we analysed known and undescribed mitochondrial genome rearrangements for a total of eight different gene orders within the notothenioid fish. Our reconstructions suggest that transpositions, duplications and inversion of multiple genes are the most likely mechanisms of rearrangement in notothenioid mitochondrial genomes. In Trematominae, we documented an extremely rare inversion of a large genomic segment of 5300 bp that partially affected the gene compositional bias but not the phylogenetic output. The genomic region delimited by nad5 and trnF, close to the area of the Control Region, was identified as the hot spot of variation in Antarctic fish mitochondrial genomes. Analysing the sequence of several intergenic spacers and mapping the arrangements on a newly generated phylogeny showed that the entire history of the Antarctic notothenioids is characterized by multiple, relatively rapid, events of disruption of the gene order. We hypothesised that a pre-existing genomic flexibility of the ancestor of the Antarctic notothenioids may have generated a precondition for gene order rearrangement, and the pressure of purifying selection could have worked for a rapid restoration of the mitochondrial functionality and compactness after each event of rearrangement.


Author(s):  
Marco Necci ◽  
Damiano Piovesan ◽  
Damiano Clementel ◽  
Zsuzsanna Dosztányi ◽  
Silvio C E Tosatto

Abstract Motivation The earlier version of MobiDB-lite is currently used in large-scale proteome annotation platforms to detect intrinsic disorder. However, new theoretical models allow for the classification of intrinsically disordered regions into subtypes from sequence features associated with specific polymeric properties or compositional bias. Results MobiDB-lite 3.0 maintains its previous speed and performance but also provides a finer classification of disorder by identifying regions with characteristics of polyolyampholytes, positive or negative polyelectrolytes, low-complexity regions or enriched in cysteine, proline or glycine or polar residues. Subregions are abundantly detected in IDRs of the human proteome. The new version of MobiDB-lite represents a new step for the proteome level analysis of protein disorder. Availability and implementation Both the MobiDB-lite 3.0 source code and a docker container are available from the GitHub repository:https://github.com/BioComputingUP/MobiDB-lite


Author(s):  
Robert S de Moya ◽  
Kazunori Yoshizawa ◽  
Kimberly K O Walden ◽  
Andrew D Sweet ◽  
Christopher H Dietrich ◽  
...  

Abstract The insect order Psocodea is a diverse lineage comprising both parasitic (Phthiraptera) and non-parasitic members (Psocoptera). The extreme age and ecological diversity of the group may be associated with major genomic changes, such as base compositional biases expected to affect phylogenetic inference. Divergent morphology between parasitic and non-parasitic members has also obscured the origins of parasitism within the order. We conducted a phylogenomic analysis on the order Psocodea utilizing both transcriptome and genome sequencing to obtain a data set of 2,370 orthologous genes. All phylogenomic analyses, including both concatenated and coalescent methods suggest a single origin of parasitism within the order Psocodea, resolving conflicting results from previous studies. This phylogeny allows us to propose a stable ordinal level classification scheme that retains significant taxonomic names present in historical scientific literature and reflects the evolution of the group as a whole. A dating analysis, with internal nodes calibrated by fossil evidence, suggests an origin of parasitism that predates the K-Pg boundary. Nucleotide compositional biases are detected in third and first codon positions and result in the anomalous placement of the Amphientometae as sister to Psocomorpha when all nucleotide sites are analyzed. Likelihood-mapping and quartet sampling methods demonstrate that base compositional biases can also have an effect on quartet-based methods.


2020 ◽  
Author(s):  
Jennifer L Spillane ◽  
Troy M LaPolice ◽  
Matthew D MacManes ◽  
David C Plachetzki

AbstractThe empirical details of whole transcriptome sequencing and assembly have been thoroughly evaluated, but few studies have addressed how user-defined aspects of the assembly process may influence performance in phylogenomic analyses. Errors in transcriptome assembly could affect ortholog prediction, alignment quality, and phylogenetic signal. Here we investigate the impacts of transcriptome assembly quality in phylogenomic studies by constructing phylogenomic data matrices from alternative transcriptome assemblies representing high-quality and intentionally low-quality assembly outcomes. We leveraged a well-resolved topology for craniates to apply a topological constraint to our analyses, providing a way to quantify phylogenetic signal. Craniates are amply represented in publicly available raw RNA-seq repositories, allowing us to control for transcriptome tissue type as well. By studying the performance of phylogenomic datasets derived from these alternative high- and low-quality inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with partitions that have lower alignment ambiguity, less compositional bias, and stronger phylogenetic signal than low-quality transcriptome assemblies. Our findings demonstrate the importance of transcriptome assembly in phylogenomic analyses and suggest that a portion of the uncertainty observed in phylogenomic studies could be alleviated at the assembly stage.


Sign in / Sign up

Export Citation Format

Share Document