scholarly journals Combining Strengths for Multi-genome Visual Analytics Comparison

2019 ◽  
Vol 13 ◽  
pp. 117793221882512 ◽  
Author(s):  
Sergio Diaz-del-Pino ◽  
Pablo Rodriguez-Brazzarola ◽  
Esteban Perez-Wohlfeil ◽  
Oswaldo Trelles

The eclosion of data acquisition technologies has shifted the bottleneck in molecular biology research from data acquisition to data analysis. Such is the case in Comparative Genomics, where sequence analysis has transitioned from genes to genomes of several orders of magnitude larger. This fact has revealed the need to adapt software to work with huge experiments efficiently and to incorporate new data-analysis strategies to manage results from such studies. In previous works, we presented GECKO, a software to compare large sequences; now we address the representation, browsing, data exploration, and post-processing of the massive amount of information derived from such comparisons. GECKO-MGV is a web-based application organized as client-server architecture. It is aimed at visual analysis of the results from both pairwise and multiple sequences comparison studies combining a set of common commands for image exploration with improved state-of-the-art solutions. In addition, GECKO-MGV integrates different visualization analysis tools while exploiting the concept of layers to display multiple genome comparison datasets. Moreover, the software is endowed with capabilities for contacting external-proprietary and third-party services for further data post-processing and also presents a method to display a timeline of large-scale evolutionary events. As proof-of-concept, we present 2 exercises using bacterial and mammalian genomes which depict the capabilities of GECKO-MGV to perform in-depth, customizable analyses on the fly using web technologies. The first exercise is mainly descriptive and is carried out over bacterial genomes, whereas the second one aims to show the ability to deal with large sequence comparisons. In this case, we display results from the comparison of the first Homo sapiens chromosome against the first 5 chromosomes of Mus musculus.

2021 ◽  
Vol 15 ◽  
pp. 117793222110214
Author(s):  
Sergio Diaz-del-Pino ◽  
Esteban Perez-Wohlfeil ◽  
Oswaldo Trelles

Due to major breakthroughs in sequencing technologies throughout the last decades, the time and cost per sequencing experiment have reduced drastically, overcoming the data generation barrier during the early genomic era. Such a shift has encouraged the scientific community to develop new computational methods that are able to compare large genomic sequences, thus enabling large-scale studies of genome evolution. The field of comparative genomics has proven itself invaluable for studying the evolutionary mechanisms and the forces driving genome evolution. In this line, a full genome comparison study between 2 species requires a quadratic number of comparisons in terms of the number of sequences (around 400 chromosome comparisons in the case of mammalian genomes); however, when studying conserved syntenies or evolutionary rearrangements, many sequence comparisons can be skipped for not all will contain significant signals. Subsequently, the scientific community has developed fast heuristics to perform multiple pairwise comparisons between large sequences to determine whether significant sets of conserved similarities exist. The data generation problem is no longer an issue, yet the limitations have shifted toward the analysis of such massive data. Therefore, we present XCout, a Web-based visual analytics application for multiple genome comparisons designed to improve the analysis of large-scale evolutionary studies using novel techniques in Web visualization. XCout enables to work on hundreds of comparisons at once, thus reducing the time of the analysis by identifying significant signals between chromosomes across multiple species. Among others, XCout introduces several techniques to aid in the analysis of large-scale genome rearrangements, particularly (1) an interactive heatmap interface to display comparisons using automatic color scales based on similarity thresholds to ease detection at first sight, (2) an overlay system to detect individual signal contributions between chromosomes, (3) a tracking tool to trace conserved blocks across different species to perform evolutionary studies, and (4) a search engine to search annotations throughout different species.


SIMULATION ◽  
2020 ◽  
Vol 96 (7) ◽  
pp. 567-581
Author(s):  
John P Morrissey ◽  
Prabhat Totoo ◽  
Kevin J Hanley ◽  
Stefanos-Aldo Papanicolopulos ◽  
Jin Y Ooi ◽  
...  

Regardless of its origin, in the near future the challenge will not be how to generate data, but rather how to manage big and highly distributed data to make it more easily handled and more accessible by users on their personal devices. VELaSSCo (Visualization for Extremely Large-Scale Scientific Computing) is a platform developed to provide new visual analysis methods for large-scale simulations serving the petabyte era. The platform adopts Big Data tools/architectures to enable in-situ processing for analytics of engineering and scientific data and hardware-accelerated interactive visualization. In large-scale simulations, the domain is partitioned across several thousand nodes, and the data (mesh and results) are stored on those nodes in a distributed manner. The VELaSSCo platform accesses this distributed information, processes the raw data, and returns the results to the users for local visualization by their specific visualization clients and tools. The global goal of VELaSSCo is to provide Big Data tools for the engineering and scientific community, in order to better manipulate simulations with billions of distributed records. The ability to easily handle large amounts of data will also enable larger, higher resolution simulations, which will allow the scientific and engineering communities to garner new knowledge from simulations previously considered too large to handle. This paper shows, by means of selected Discrete Element Method (DEM) simulation use cases, that the VELaSSCo platform facilitates distributed post-processing and visualization of large engineering datasets.


2012 ◽  
Vol 11 (3) ◽  
pp. 190-204 ◽  
Author(s):  
Narges Mahyar ◽  
Ali Sarvghad ◽  
Melanie Tory

In an observational study, we noticed that record-keeping plays a critical role in the overall process of collaborative visual data analysis. Record-keeping involves recording material for later use, ranging from data about the visual analysis processes and visualization states to notes and annotations that externalize user insights, findings, and hypotheses. In our study, co-located teams worked on collaborative visual analytics tasks using large interactive wall and tabletop displays. Part of our findings is a collaborative data analysis framework that encompasses record-keeping as one of the main activities. In this paper, our primary focus is on note-taking activity. Based on our observations, we characterize notes according to their content, scope, and usage, and describe how they fit into a process of collaborative data analysis. We then discuss suggestions to improve the design of note-taking functionality for co-located collaborative visual analytics tools.


2019 ◽  
Author(s):  
Антон Корсаков ◽  
Anton Korsakov ◽  
Дмитрий Лагерев ◽  
Dmitriy Lagerev ◽  
Леонид Пугач ◽  
...  

The relevance of the research is due to the complexity of the stage of exploratory data analysis and hypotheses for further verification by methods of statistical and/or data mining. Objective: to apply methods of visual analysis and cognitive visualization for exploratory analysis and advance preliminary hypotheses in the process of analyzing the dynamics of stillbirth of boys and girls in all areas of the Bryansk region with different density of radioactive contamination by long-lived radionuclides Cesium-137 (137Cs) and Strontium-90 ( 90Sr), on the basis of official statistics for the long-term period (1986-2016). Research methods: visual analytics and cognitive visualization, mathematical statistics: Shapiro-Wilk test, Student t-test, homoscedasticity test, linear regression. Research results: the research results confirm the feasibility of using methods of visual analytics and cognitive visualization for exploratory analysis and advancement of preliminary hypotheses. The use of cognitive visualization in the process of exploratory data analysis allows the researcher to better understand the main trends and patterns in the analyzed data. This makes it possible to reduce the time required to form hypotheses by two to three times and to improve the quality of the hypotheses put forward.


2020 ◽  
Vol 7 (4) ◽  
pp. 200128 ◽  
Author(s):  
Gavin Minty ◽  
Alex Hoppen ◽  
Ines Boehm ◽  
Abrar Alhindi ◽  
Larissa Gibb ◽  
...  

Large-scale data analysis of synaptic morphology is becoming increasingly important to the field of neurobiological research (e.g. ‘connectomics’). In particular, a detailed knowledge of neuromuscular junction (NMJ) morphology has proven to be important for understanding the form and function of synapses in both health and disease. The recent introduction of a standardized approach to the morphometric analysis of the NMJ—‘NMJ-morph’—has provided the first common software platform with which to analyse and integrate NMJ data from different research laboratories. Here, we describe the design and development of a novel macro—‘automated NMJ-morph’ or ‘aNMJ-morph’—to update and streamline the original NMJ-morph methodology. ImageJ macro language was used to encode the complete NMJ-morph workflow into seven navigation windows that generate robust data for 19 individual pre-/post-synaptic variables. The aNMJ-morph scripting was first validated against reference data generated by the parent workflow to confirm data reproducibility. aNMJ-morph was then compared with the parent workflow in large-scale data analysis of original NMJ images (240 NMJs) by multiple independent investigators. aNMJ-morph conferred a fourfold increase in data acquisition rate compared with the parent workflow, with average analysis times reduced to approximately 1 min per NMJ. Strong concordance was demonstrated between the two approaches for all 19 morphological variables, confirming the robust nature of aNMJ-morph. aNMJ-morph is a freely available and easy-to-use macro for the rapid and robust analysis of NMJ morphology and offers significant improvements in data acquisition and learning curve compared to the original NMJ-morph workflow.


2016 ◽  
Vol 16 (3) ◽  
pp. 205-216 ◽  
Author(s):  
Lorne Leonard ◽  
Alan M MacEachren ◽  
Kamesh Madduri

This article reports on the development and application of a visual analytics approach to big data cleaning and integration focused on very large graphs, constructed in support of national-scale hydrological modeling. We explain why large graphs are required for hydrology modeling and describe how we create two graphs using continental United States heterogeneous national data products. The first smaller graph is constructed by assigning level-12 hydrological unit code watersheds as nodes. Creating and cleaning graphs at this scale highlight the issues that cannot be addressed without high-resolution datasets and expert intervention. Expert intervention, aided with visual analytical tools, is necessary to address edge directions at the second graph scale: subdividing continental United States streams as edges (851,265,305) and nodes (683,298,991) for large-scale hydrological modeling. We demonstrate how large graph workflows are created and are used for automated analysis to prepare the user interface for visual analytics. We explain the design of the visual interface using a watershed case study and then discuss how the visual interface is used to engage the expert user to resolve data and graph issues.


2019 ◽  
Author(s):  
Rumen Manolov

The lack of consensus regarding the most appropriate analytical techniques for single-case experimental designs data requires justifying the choice of any specific analytical option. The current text mentions some of the arguments, provided by methodologists and statisticians, in favor of several analytical techniques. Additionally, a small-scale literature review is performed in order to explore if and how applied researchers justify the analytical choices that they make. The review suggests that certain practices are not sufficiently explained. In order to improve the reporting regarding the data analytical decisions, it is proposed to choose and justify the data analytical approach prior to gathering the data. As a possible justification for data analysis plan, we propose using as a basis the expected the data pattern (specifically, the expectation about an improving baseline trend and about the immediate or progressive nature of the intervention effect). Although there are multiple alternatives for single-case data analysis, the current text focuses on visual analysis and multilevel models and illustrates an application of these analytical options with real data. User-friendly software is also developed.


Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1670
Author(s):  
Waheeb Abu-Ulbeh ◽  
Maryam Altalhi ◽  
Laith Abualigah ◽  
Abdulwahab Ali Almazroi ◽  
Putra Sumari ◽  
...  

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.


1983 ◽  
Vol 38 ◽  
pp. 1-9
Author(s):  
Herbert F. Weisberg

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.


Sign in / Sign up

Export Citation Format

Share Document