scholarly journals Unraveling Genome Evolution Throughout Visual Analysis: The XCout Portal

2021 ◽  
Vol 15 ◽  
pp. 117793222110214
Author(s):  
Sergio Diaz-del-Pino ◽  
Esteban Perez-Wohlfeil ◽  
Oswaldo Trelles

Due to major breakthroughs in sequencing technologies throughout the last decades, the time and cost per sequencing experiment have reduced drastically, overcoming the data generation barrier during the early genomic era. Such a shift has encouraged the scientific community to develop new computational methods that are able to compare large genomic sequences, thus enabling large-scale studies of genome evolution. The field of comparative genomics has proven itself invaluable for studying the evolutionary mechanisms and the forces driving genome evolution. In this line, a full genome comparison study between 2 species requires a quadratic number of comparisons in terms of the number of sequences (around 400 chromosome comparisons in the case of mammalian genomes); however, when studying conserved syntenies or evolutionary rearrangements, many sequence comparisons can be skipped for not all will contain significant signals. Subsequently, the scientific community has developed fast heuristics to perform multiple pairwise comparisons between large sequences to determine whether significant sets of conserved similarities exist. The data generation problem is no longer an issue, yet the limitations have shifted toward the analysis of such massive data. Therefore, we present XCout, a Web-based visual analytics application for multiple genome comparisons designed to improve the analysis of large-scale evolutionary studies using novel techniques in Web visualization. XCout enables to work on hundreds of comparisons at once, thus reducing the time of the analysis by identifying significant signals between chromosomes across multiple species. Among others, XCout introduces several techniques to aid in the analysis of large-scale genome rearrangements, particularly (1) an interactive heatmap interface to display comparisons using automatic color scales based on similarity thresholds to ease detection at first sight, (2) an overlay system to detect individual signal contributions between chromosomes, (3) a tracking tool to trace conserved blocks across different species to perform evolutionary studies, and (4) a search engine to search annotations throughout different species.

2019 ◽  
Vol 13 ◽  
pp. 117793221882512 ◽  
Author(s):  
Sergio Diaz-del-Pino ◽  
Pablo Rodriguez-Brazzarola ◽  
Esteban Perez-Wohlfeil ◽  
Oswaldo Trelles

The eclosion of data acquisition technologies has shifted the bottleneck in molecular biology research from data acquisition to data analysis. Such is the case in Comparative Genomics, where sequence analysis has transitioned from genes to genomes of several orders of magnitude larger. This fact has revealed the need to adapt software to work with huge experiments efficiently and to incorporate new data-analysis strategies to manage results from such studies. In previous works, we presented GECKO, a software to compare large sequences; now we address the representation, browsing, data exploration, and post-processing of the massive amount of information derived from such comparisons. GECKO-MGV is a web-based application organized as client-server architecture. It is aimed at visual analysis of the results from both pairwise and multiple sequences comparison studies combining a set of common commands for image exploration with improved state-of-the-art solutions. In addition, GECKO-MGV integrates different visualization analysis tools while exploiting the concept of layers to display multiple genome comparison datasets. Moreover, the software is endowed with capabilities for contacting external-proprietary and third-party services for further data post-processing and also presents a method to display a timeline of large-scale evolutionary events. As proof-of-concept, we present 2 exercises using bacterial and mammalian genomes which depict the capabilities of GECKO-MGV to perform in-depth, customizable analyses on the fly using web technologies. The first exercise is mainly descriptive and is carried out over bacterial genomes, whereas the second one aims to show the ability to deal with large sequence comparisons. In this case, we display results from the comparison of the first Homo sapiens chromosome against the first 5 chromosomes of Mus musculus.


2016 ◽  
Vol 16 (3) ◽  
pp. 205-216 ◽  
Author(s):  
Lorne Leonard ◽  
Alan M MacEachren ◽  
Kamesh Madduri

This article reports on the development and application of a visual analytics approach to big data cleaning and integration focused on very large graphs, constructed in support of national-scale hydrological modeling. We explain why large graphs are required for hydrology modeling and describe how we create two graphs using continental United States heterogeneous national data products. The first smaller graph is constructed by assigning level-12 hydrological unit code watersheds as nodes. Creating and cleaning graphs at this scale highlight the issues that cannot be addressed without high-resolution datasets and expert intervention. Expert intervention, aided with visual analytical tools, is necessary to address edge directions at the second graph scale: subdividing continental United States streams as edges (851,265,305) and nodes (683,298,991) for large-scale hydrological modeling. We demonstrate how large graph workflows are created and are used for automated analysis to prepare the user interface for visual analytics. We explain the design of the visual interface using a watershed case study and then discuss how the visual interface is used to engage the expert user to resolve data and graph issues.


Obesity Facts ◽  
2021 ◽  
pp. 1-11
Author(s):  
Marijn Marthe Georgine van Berckel ◽  
Saskia L.M. van Loon ◽  
Arjen-Kars Boer ◽  
Volkher Scharnhorst ◽  
Simon W. Nienhuijs

<b><i>Introduction:</i></b> Bariatric surgery results in both intentional and unintentional metabolic changes. In a high-volume bariatric center, extensive laboratory panels are used to monitor these changes pre- and postoperatively. Consecutive measurements of relevant biochemical markers allow exploration of the health state of bariatric patients and comparison of different patient groups. <b><i>Objective:</i></b> The objective of this study is to compare biomarker distributions over time between 2 common bariatric procedures, i.e., sleeve gastrectomy (SG) and gastric bypass (RYGB), using visual analytics. <b><i>Methods:</i></b> Both pre- and postsurgical (6, 12, and 24 months) data of all patients who underwent primary bariatric surgery were collected retrospectively. The distribution and evolution of different biochemical markers were compared before and after surgery using asymmetric beanplots in order to evaluate the effect of primary SG and RYGB. A beanplot is an alternative to the boxplot that allows an easy and thorough visual comparison of univariate data. <b><i>Results:</i></b> In total, 1,237 patients (659 SG and 578 RYGB) were included. The sleeve and bypass groups were comparable in terms of age and the prevalence of comorbidities. The mean presurgical BMI and the percentage of males were higher in the sleeve group. The effect of surgery on lowering of glycated hemoglobin was similar for both surgery types. After RYGB surgery, the decrease in the cholesterol concentration was larger than after SG. The enzymatic activity of aspartate aminotransferase, alanine aminotransferase, and alkaline phosphate in sleeve patients was higher presurgically but lower postsurgically compared to bypass values. <b><i>Conclusions:</i></b> Beanplots allow intuitive visualization of population distributions. Analysis of this large population-based data set using beanplots suggests comparable efficacies of both types of surgery in reducing diabetes. RYGB surgery reduced dyslipidemia more effectively than SG. The trend toward a larger decrease in liver enzyme activities following SG is a subject for further investigation.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2021 ◽  
Vol 11 (11) ◽  
pp. 4751
Author(s):  
Jorge-Félix Rodríguez-Quintero ◽  
Alexander Sánchez-Díaz ◽  
Leonel Iriarte-Navarro ◽  
Alejandro Maté ◽  
Manuel Marco-Such ◽  
...  

Among the knowledge areas in which process mining has had an impact, the audit domain is particularly striking. Traditionally, audits seek evidence in a data sample that allows making inferences about a population. Mistakes are usually committed when generalizing the results and anomalies; therefore, they appear in unprocessed sets; however, there are some efforts to address these limitations using process-mining-based approaches for fraud detection. To the best of our knowledge, no fraud audit method exists that combines process mining techniques and visual analytics to identify relevant patterns. This paper presents a fraud audit approach based on the combination of process mining techniques and visual analytics. The main advantages are: (i) a method is included that guides the use of the visual capabilities of process mining to detect fraud data patterns during an audit; (ii) the approach can be generalized to any business domain; (iii) well-known process mining techniques are used (dotted chart, trace alignment, fuzzy miner…). The techniques were selected by a group of experts and were extended to enable filtering for contextual analysis, to handle levels of process abstraction, and to facilitate implementation in the area of fraud audits. Based on the proposed approach, we developed a software solution that is currently being used in the financial sector as well as in the telecommunications and hospitality sectors. Finally, for demonstration purposes, we present a real hotel management use case in which we detected suspected fraud behaviors, thus validating the effectiveness of the approach.


2019 ◽  
Vol 19 (1) ◽  
pp. 3-23
Author(s):  
Aurea Soriano-Vargas ◽  
Bernd Hamann ◽  
Maria Cristina F de Oliveira

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.


2014 ◽  
Vol 53 (1) ◽  
pp. 191-200 ◽  
Author(s):  
Walter Demczuk ◽  
Tarah Lynch ◽  
Irene Martin ◽  
Gary Van Domselaar ◽  
Morag Graham ◽  
...  

A large-scale, whole-genome comparison of CanadianNeisseria gonorrhoeaeisolates with high-level cephalosporin MICs was used to demonstrate a genomic epidemiology approach to investigate strain relatedness and dynamics. Although current typing methods have been very successful in tracing short-chain transmission of gonorrheal disease, investigating the temporal evolutionary relationships and geographical dissemination of highly clonal lineages requires enhanced resolution only available through whole-genome sequencing (WGS). Phylogenomic cluster analysis grouped 169 Canadian strains into 12 distinct clades. While someN. gonorrhoeaemultiantigen sequence types (NG-MAST) agreed with specific phylogenomic clades or subclades, other sequence types (ST) and closely related groups of ST were widely distributed among clades. Decreased susceptibility to extended-spectrum cephalosporins (ESC-DS) emerged among a group of diverse strains in Canada during the 1990s with a variety of nonmosaicpenAalleles, followed in 2000/2001 with thepenAmosaic X allele and then in 2007 with ST1407 strains with thepenAmosaic XXXIV allele. Five genetically distinct ESC-DS lineages were associated withpenAmosaic X, XXXV, and XXXIV alleles and nonmosaic XII and XIII alleles. ESC-DS with coresistance to azithromycin was observed in 5 strains with 23S rRNA C2599T or A2143G mutations. As the costs associated with WGS decline and analysis tools are streamlined, WGS can provide a more thorough understanding of strain dynamics, facilitate epidemiological studies to better resolve social networks, and improve surveillance to optimize treatment for gonorrheal infections.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Smritikana Dutta ◽  
Anwesha Deb ◽  
Prasun Biswas ◽  
Sukanya Chakraborty ◽  
Suman Guha ◽  
...  

AbstractBamboos, member of the family Poaceae, represent many interesting features with respect to their fast and extended vegetative growth, unusual, yet divergent flowering time across species, and impact of sudden, large scale flowering on forest ecology. However, not many studies have been conducted at the molecular level to characterize important genes that regulate vegetative and flowering habit in bamboo. In this study, two bamboo FD genes, BtFD1 and BtFD2, which are members of the florigen activation complex (FAC) have been identified by sequence and phylogenetic analyses. Sequence comparisons identified one important amino acid, which was located in the DNA-binding basic region and was altered between BtFD1 and BtFD2 (Ala146 of BtFD1 vs. Leu100 of BtFD2). Electrophoretic mobility shift assay revealed that this alteration had resulted into ten times higher binding efficiency of BtFD1 than BtFD2 to its target ACGT motif present at the promoter of the APETALA1 gene. Expression analyses in different tissues and seasons indicated the involvement of BtFD1 in flower and vegetative development, while BtFD2 was very lowly expressed throughout all the tissues and conditions studied. Finally, a tenfold increase of the AtAP1 transcript level by p35S::BtFD1 Arabidopsis plants compared to wild type confirms a positively regulatory role of BtFD1 towards flowering. However, constitutive expression of BtFD1 had led to dwarfisms and apparent reduction in the length of flowering stalk and numbers of flowers/plant, whereas no visible phenotype was observed for BtFD2 overexpression. This signifies that timely expression of BtFD1 may be critical to perform its programmed developmental role in planta.


Author(s):  
Jianping Kelvin Li ◽  
Misbah Mubarak ◽  
Robert B. Ross ◽  
Christopher D. Carothers ◽  
Kwan-Liu Ma

Sign in / Sign up

Export Citation Format

Share Document