scholarly journals Comparing orthology methods and their performance by recapitulating patterns of eukaryotic genome evolution

2020 ◽  
Author(s):  
Eva S. Deutekom ◽  
Berend Snel ◽  
Teunis J.P. van Dam

AbstractInsights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets, or expanding the analyses to larger scales. Automated orthology inference should allow such large scale analyses, but substantial differences between orthologous groups generated by different approaches are observed.We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regards to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the Last Eukaryotic Common Ancestor’s gene content, pervasiveness of gene loss, and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other.We find that most orthology methods reconstruct a large Last Eukaryotic Common Ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large scale evaluation, the obtained orthologous groups differ vastly from one another.Availability and implementationThe data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.SummaryWe compared multiple orthology inference methods by looking at how well they perform in recapitulating multiple observations made in eukaryotic genome evolution.Co-occurrence of proteins is predicted fairly well by most methods and all show similar behaviour when looking at loss numbers and dynamics.All the methods show imperfect overlap when compared to manually curated orthologous groups and when compared to orthologous groups of the other methods.Differences are compared between methods by looking at how the inferred orthologies represent a high-quality set of manually curated orthologous groups.We conclude that all methods behave similar when describing general patterns in eukaryotic genome evolution. However, there are large differences within the orthologies themselves, arising from how a method can differentiate between distant homology, recent duplications, or classifying orthologous groups.

Author(s):  
Eva S Deutekom ◽  
Berend Snel ◽  
Teunis J P van Dam

Abstract Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed. We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor’s gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other. We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another. Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.


2021 ◽  
Vol 13 (5) ◽  
pp. 1010
Author(s):  
Lehui Wei ◽  
Chunhua Jiang ◽  
Yaogai Hu ◽  
Ercha Aa ◽  
Wengeng Huang ◽  
...  

This study presents observations of nighttime spread F/ionospheric irregularities and spread Es at low and middle latitudes in the South East Asia longitude of China sectors during the recovery phase of the 7–9 September 2017 geomagnetic storm. In this study, multiple observations, including a chain of three ionosondes located about the longitude of 100°E, Swarm satellites, and Global Navigation Satellite System (GNSS) ROTI maps, were used to study the development process and evolution characteristics of the nighttime spread F/ionospheric irregularities at low and middle latitudes. Interestingly, spread F and intense spread Es were simultaneously observed by three ionosondes during the recovery phase. Moreover, associated ionospheric irregularities could be observed by Swarm satellites and ground-based GNSS ionospheric TEC. Nighttime spread F and spread Es at low and middle latitudes might be due to multiple off-vertical reflection echoes from the large-scale tilts in the bottom ionosphere. In addition, we found that the periods of the disturbance ionosphere are ~1 h at ZHY station, ~1.5 h at LSH station and ~1 h at PUR station, respectively. It suggested that the large-scale tilts in the bottom ionosphere might be produced by LSTIDs (Large scale Traveling Ionospheric Disturbances), which might be induced by the high-latitude energy inputs during the recovery phase of this storm. Furthermore, the associated ionospheric irregularities observed by satellites and ground-based GNSS receivers might be caused by the local electric field induced by LSTIDs.


2020 ◽  
Author(s):  
Xiaoqing Wang ◽  
Collin Tokheim ◽  
Binbin Wang ◽  
Shengqing Stan Gu ◽  
Qin Tang ◽  
...  

SUMMARYDespite remarkable clinical efficacies of immune checkpoint blockade (ICB) in cancer treatment, ICB benefits in triple-negative breast cancer (TNBC) remain limited. Through pooled in vivo CRISPR knockout (KO) screens in syngeneic TNBC mouse models, we found that inhibition of the E3 ubiquitin ligase Cop1 in cancer cells decreases the secretion of macrophage-associated chemokines, reduces tumor macrophage infiltration, and shows synergy in anti-tumor immunity with ICB. Transcriptomics, epigenomics, and proteomics analyses revealed Cop1 functions through proteasomal degradation of the C/ebpδ protein. Cop1 substrate Trib2 functions as a scaffold linking Cop1 and C/ebpδ, which leads to polyubiquitination of C/ebpδ. Cop1 inhibition stabilizes C/ebpδ to suppress the expression of macrophage chemoattractant genes. Our integrated approach implicates Cop1 as a target for improving cancer immunotherapy efficacy by regulating chemokine secretion and macrophage levels in the TNBC tumor microenvironment.HighlightsLarge-scale in vivo CRISPR screens identify new immune targets regulating the tumor microenvironmentCop1 knockout in cancer cells enhances anti-tumor immunityCop1 modulates chemokine secretion and macrophage infiltration into tumorsCop1 targets C/ebpδ degradation via Trib2 and influences ICB response


2020 ◽  
Author(s):  
Anik Banik ◽  
Md. Fuad Mondal ◽  
Md. Mostafigur Rahman Khan ◽  
Sheikh Rashel Ahmed ◽  
Md. Mehedi Hasan

AbstractThe locust problem is a global threat for food security. Locusts can fly and migrate overseas within a zip and creating a large-scale devastation to the diversified agro-ecosystem. GIS based analysis showed the recent movement of locusts, among them Schistocerca gregaria and Locusta migratoria are predominant in Indian subcontinent and are found more notorious and devastating one. This devastation needs to be stopped to save human race from food deprivation. In our study, we screened some commonly used agricultural pesticides and strongly recommended three of them viz. biphenthrin, diafenthiuron and silafluofen which might be potential to control the desert locusts based on their binding affinity towards the locust’s survival proteins. Our phylogenetic analysis reveals that these three recommended pesticides might also show potency to the other locust species as well as they are also way safer than the other commercially available pesticides. These proposed pesticide’s bioactive analogs from fungus and bacteria may also show efficacy as next generation controlling measures of locust as well as different kind of pests. These recommended pesticides are expected to be highly effective against locusts and needs to bring forward by the entomologists’ by performing experimental field trials.HighlightsGIS map unmasked the 2020 migratory pattern of locusts which now predominant towards Indian subcontinent.Biphenthrin, diafenthiuron and silafluofen showed maximum binding affinity.Biphenthrin and diafenthiuron were relatively safer than silafluofen.Bioactive analogs from fungus and bacteria could be an alternative to control locusts.Pesticides inhibition hotspots for desert locusts were unrevealed.


2019 ◽  
Author(s):  
Wojciech Michalak ◽  
Vasileios Tsiamis ◽  
Veit Schwämmle ◽  
Adelina Rogowska-Wrzesińska

AbstractWe have developed ComplexBrowser, an open source, online platform for supervised analysis of quantitative proteomics data that focuses on protein complexes. The software uses information from CORUM and Complex Portal databases to identify protein complex components. Based on the expression changes of individual complex subunits across the proteomics experiment it calculates Complex Fold Change (CFC) factor that characterises the overall protein complex expression trend and the level of subunit co-regulation. Thus up- and down-regulated complexes can be identified. It provides interactive visualisation of protein complexes composition and expression for exploratory analysis. It also incorporates a quality control step that includes normalisation and statistical analysis based on Limma test. ComplexBrowser performance was tested on two previously published proteomics studies identifying changes in protein expression in human adenocarcinoma tissue and during activation of mouse T-cells. The analysis revealed 1519 and 332 protein complexes, of which 233 and 41 were found co-ordinately regulated in the respective studies. The adopted approach provided evidence for a shift to glucose-based metabolism and high proliferation in adenocarcinoma tissues and identification of chromatin remodelling complexes involved in mouse T-cell activation. The results correlate with the original interpretation of the experiments and also provide novel biological details about protein complexes affected. ComplexBrowser is, to our knowledge, the first tool to automate quantitative protein complex analysis for high-throughput studies, providing insights into protein complex regulation within minutes of analysis.A fully functional demo version of ComplexBrowser v1.0 is available online via http://computproteomics.bmb.sdu.dk/Apps/ComplexBrowser/The source code can be downloaded from: https://bitbucket.org/michalakw/complexbrowserHighlightsAutomated analysis of protein complexes in proteomics experimentsQuantitative measure of the coordinated changes in protein complex componentsInteractive visualisations for exploratory analysis of proteomics resultsIn briefComplexBrowser is capable of identifying protein complexes in datasets obtained from large scale quantitative proteomics experiments. It provides, in the form of the CFC factor, a quantitative measure of the coordinated changes in complex components. This facilitates assessing the overall trends in the processes governed by the identified protein complexes providing a new and complementary way of interpreting proteomics experiments.


2021 ◽  
Author(s):  
Tanima Arora ◽  
Michael Simonov ◽  
Jameel Alausa ◽  
Labeebah Subair ◽  
Brett Gerber ◽  
...  

ABSTRACTBackgroundThe COVID-19 pandemic has led to an explosion of research publications spanning epidemiology, basic and clinical science. While a digital revolution has allowed for open access to large datasets enabling real-time tracking of the epidemic, detailed, locally-specific clinical data has been less readily accessible to a broad range of academic faculty and their trainees. This perpetuates the separation of the primary missions of clinically-focused and primary research faculty resulting in lost opportunities for improved understanding of the local epidemic; expansion of the scope of scholarship; limitation of the diversity of the research pool; lack of creation of initiatives for growth and dissemination of research skills needed for the training of the next generation of clinicians and faculty.ObjectivesCreate a common, easily accessible and up-to-date database that would promote access to local COVID-19 clinical data, thereby increasing efficiency, streamlining and democratizing the research enterprise. By providing a robust dataset, a broad range of researchers (faculty, trainees) and clinicians are encouraged to explore and collaborate on novel clinically relevant research questions.MethodsWe constructed a research platform called the Yale Department of Medicine COVID-19 Explorer and Repository (DOM-CovX), to house cleaned, highly granular, de-identified, continually-updated data from over 7,000 patients hospitalized with COVID-19 (1/2020-present) across the Yale New Haven Health System. This included a front-end user interface for simple data visualization of aggregate data and more detailed clinical datasets for researchers after a review board process. The goal is to promote access to local COVID-19 clinical data, thereby increasing efficiency, streamlining and democratizing the research enterprise.Expected OutcomesAccelerate generation of new knowledge and increase scholarly productivity with particular local relevanceImprove the institutional academic climate by:Broadening research scopeExpanding research capability to more diverse group of stakeholders including clinical and research-based faculty and traineesEnhancing interdepartmental collaborationsConclusionsThe DOM-CovX Data Explorer and Repository have great potential to increase academic productivity. By providing an accessible tool for simple data analysis and access to a consistently updated, standardized and large-scale dataset, it overcomes barriers for a wide variety of researchers. Beyond academic productivity, this innovative approach represents an opportunity to improve the institutional climate by fostering collaboration, diversity of scholarly pursuits and expanding medical education. It provides a novel approach that can be expanded to other diseases beyond COVID 19.


Author(s):  
Denali Molitor ◽  
Deanna Needell

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.


2002 ◽  
Vol 29 (2) ◽  
pp. 449-488 ◽  
Author(s):  
DOUGLAS BIBER ◽  
RANDI REPPEN ◽  
SUSAN CONRAD

In their conceptual framework for linguistic literacy development, Ravid & Tolchinsky synthesize research studies from several perspectives. One of these is corpus-based research, which has been used for several large-scale research studies of spoken and written registers over the past 20 years. In this approach, a large, principled collection of natural texts (a ‘corpus’) is analysed using computational and interactive techniques, to identify the salient linguistic characteristics of each register or text variety. Three characteristics of corpus-based analysis are particularly important (see Biber, Conrad & Reppen 1998):[bull ] a special concern for the representativeness of the text sample being analysed, and for the generalizability of findings;[bull ] overt recognition of the interactions among linguistic features: the ways in which features co-occur and alternate;[bull ] a focus on register as the most important parameter of linguistic variation: strong patterns of use in one register often represent only weak patterns in other registers.


Sign in / Sign up

Export Citation Format

Share Document