Graph-based visual analysis for large-scale hydrological modeling

This article reports on the development and application of a visual analytics approach to big data cleaning and integration focused on very large graphs, constructed in support of national-scale hydrological modeling. We explain why large graphs are required for hydrology modeling and describe how we create two graphs using continental United States heterogeneous national data products. The first smaller graph is constructed by assigning level-12 hydrological unit code watersheds as nodes. Creating and cleaning graphs at this scale highlight the issues that cannot be addressed without high-resolution datasets and expert intervention. Expert intervention, aided with visual analytical tools, is necessary to address edge directions at the second graph scale: subdividing continental United States streams as edges (851,265,305) and nodes (683,298,991) for large-scale hydrological modeling. We demonstrate how large graph workflows are created and are used for automated analysis to prepare the user interface for visual analytics. We explain the design of the visual interface using a watershed case study and then discuss how the visual interface is used to engage the expert user to resolve data and graph issues.

Download Full-text

Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data

10.1101/722918 ◽

2019 ◽

Cited By ~ 1

Author(s):

Robert Krueger ◽

Johanna Beyer ◽

Won-Dong Jang ◽

Nam Wook Kim ◽

Artem Sokolov ◽

...

Keyword(s):

Supervised Learning ◽

Visual Analytics ◽

Cancer Biology ◽

Large Scale ◽

Hierarchical Structures ◽

Image Data ◽

Automated Analysis ◽

Cell Types ◽

High Dimensional ◽

Exploration Process

AbstractFacetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.

Download Full-text

Combining Strengths for Multi-genome Visual Analytics Comparison

Bioinformatics and Biology Insights ◽

10.1177/1177932218825127 ◽

2019 ◽

Vol 13 ◽

pp. 117793221882512 ◽

Cited By ~ 1

Author(s):

Sergio Diaz-del-Pino ◽

Pablo Rodriguez-Brazzarola ◽

Esteban Perez-Wohlfeil ◽

Oswaldo Trelles

Keyword(s):

Data Analysis ◽

Data Acquisition ◽

Visual Analytics ◽

Large Scale ◽

Visual Analysis ◽

Homo Sapiens ◽

Third Party ◽

Genome Comparison ◽

Post Processing ◽

Sequence Comparisons

The eclosion of data acquisition technologies has shifted the bottleneck in molecular biology research from data acquisition to data analysis. Such is the case in Comparative Genomics, where sequence analysis has transitioned from genes to genomes of several orders of magnitude larger. This fact has revealed the need to adapt software to work with huge experiments efficiently and to incorporate new data-analysis strategies to manage results from such studies. In previous works, we presented GECKO, a software to compare large sequences; now we address the representation, browsing, data exploration, and post-processing of the massive amount of information derived from such comparisons. GECKO-MGV is a web-based application organized as client-server architecture. It is aimed at visual analysis of the results from both pairwise and multiple sequences comparison studies combining a set of common commands for image exploration with improved state-of-the-art solutions. In addition, GECKO-MGV integrates different visualization analysis tools while exploiting the concept of layers to display multiple genome comparison datasets. Moreover, the software is endowed with capabilities for contacting external-proprietary and third-party services for further data post-processing and also presents a method to display a timeline of large-scale evolutionary events. As proof-of-concept, we present 2 exercises using bacterial and mammalian genomes which depict the capabilities of GECKO-MGV to perform in-depth, customizable analyses on the fly using web technologies. The first exercise is mainly descriptive and is carried out over bacterial genomes, whereas the second one aims to show the ability to deal with large sequence comparisons. In this case, we display results from the comparison of the first Homo sapiens chromosome against the first 5 chromosomes of Mus musculus.

Download Full-text

Unraveling Genome Evolution Throughout Visual Analysis: The XCout Portal

Bioinformatics and Biology Insights ◽

10.1177/11779322211021422 ◽

2021 ◽

Vol 15 ◽

pp. 117793222110214

Author(s):

Sergio Diaz-del-Pino ◽

Esteban Perez-Wohlfeil ◽

Oswaldo Trelles

Keyword(s):

Genome Evolution ◽

Visual Analytics ◽

Scientific Community ◽

Large Scale ◽

Visual Analysis ◽

Genome Comparison ◽

Data Generation ◽

Evolutionary Mechanisms ◽

Sequence Comparisons ◽

Evolutionary Studies

Due to major breakthroughs in sequencing technologies throughout the last decades, the time and cost per sequencing experiment have reduced drastically, overcoming the data generation barrier during the early genomic era. Such a shift has encouraged the scientific community to develop new computational methods that are able to compare large genomic sequences, thus enabling large-scale studies of genome evolution. The field of comparative genomics has proven itself invaluable for studying the evolutionary mechanisms and the forces driving genome evolution. In this line, a full genome comparison study between 2 species requires a quadratic number of comparisons in terms of the number of sequences (around 400 chromosome comparisons in the case of mammalian genomes); however, when studying conserved syntenies or evolutionary rearrangements, many sequence comparisons can be skipped for not all will contain significant signals. Subsequently, the scientific community has developed fast heuristics to perform multiple pairwise comparisons between large sequences to determine whether significant sets of conserved similarities exist. The data generation problem is no longer an issue, yet the limitations have shifted toward the analysis of such massive data. Therefore, we present XCout, a Web-based visual analytics application for multiple genome comparisons designed to improve the analysis of large-scale evolutionary studies using novel techniques in Web visualization. XCout enables to work on hundreds of comparisons at once, thus reducing the time of the analysis by identifying significant signals between chromosomes across multiple species. Among others, XCout introduces several techniques to aid in the analysis of large-scale genome rearrangements, particularly (1) an interactive heatmap interface to display comparisons using automatic color scales based on similarity thresholds to ease detection at first sight, (2) an overlay system to detect individual signal contributions between chromosomes, (3) a tracking tool to trace conserved blocks across different species to perform evolutionary studies, and (4) a search engine to search annotations throughout different species.

Download Full-text

Visual analytics of sensor movement data for cheetah behaviour analysis

Journal of Visualization ◽

10.1007/s12650-021-00742-6 ◽

2021 ◽

Author(s):

Karsten Klein ◽

Sabrina Jaeger ◽

Jörg Melzheimer ◽

Bettina Wachter ◽

Heribert Hofer ◽

...

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Automated Analysis ◽

Environmental Data ◽

Impact Factors ◽

Data Sets ◽

Movement Data ◽

Data Loggers ◽

Intuitive Interfaces ◽

Gps Data Loggers

Abstract Current tracking technology such as GPS data loggers allows biologists to remotely collect large amounts of movement data for a large variety of species. Extending, and often replacing interpretation based on observation, the analysis of the collected data supports research on animal behaviour, on impact factors such as climate change and human intervention on the globe, as well as on conservation programs. However, this analysis is difficult, due to the nature of the research questions and the complexity of the data sets. It requires both automated analysis, for example, for the detection of behavioural patterns, and human inspection, for example, for interpretation, inclusion of previous knowledge, and for conclusions on future actions and decision making. For this analysis and inspection, the movement data needs to be put into the context of environmental data, which helps to interpret the behaviour. Thus, a major challenge is to design and develop methods and intuitive interfaces that integrate the data for analysis by biologists. We present a concept and implementation for the visual analysis of cheetah movement data in a web-based fashion that allows usage both in the field and in office environments. Graphic abstract

Download Full-text

Quality Control of Data in a Large-Scale Cancer Register Program

Methods of Information in Medicine ◽

10.1055/s-0038-1636339 ◽

1966 ◽

Vol 05 (02) ◽

pp. 67-74 ◽

Cited By ~ 5

Author(s):

W. I. Lourie ◽

W. Haenszeland

Keyword(s):

United States ◽

Quality Control ◽

Cancer Patients ◽

Large Scale ◽

The United States ◽

Newly Diagnosed ◽

Related Data ◽

Cancer Register ◽

Independent Review ◽

End Results

Quality control of data collected in the United States by the Cancer End Results Program utilizing punchcards prepared by participating registries in accordance with a Uniform Punchcard Code is discussed. Existing arrangements decentralize responsibility for editing and related data processing to the local registries with centralization of tabulating and statistical services in the End Results Section, National Cancer Institute. The most recent deck of punchcards represented over 600,000 cancer patients; approximately 50,000 newly diagnosed cases are added annually.Mechanical editing and inspection of punchcards and field audits are the principal tools for quality control. Mechanical editing of the punchcards includes testing for blank entries and detection of in-admissable or inconsistent codes. Highly improbable codes are subjected to special scrutiny. Field audits include the drawing of a 1-10 percent random sample of punchcards submitted by a registry; the charts are .then reabstracted and recoded by a NCI staff member and differences between the punchcard and the results of independent review are noted.

Download Full-text

Development and study of a method for cell separation during white blood cell segmentation on images of bone marrow preparations in information and measurement systems for diagnostics of acute leukemia

Izmeritel`naya Tekhnika ◽

10.32446/0368-1025it.2020-7-68-72 ◽

2020 ◽

pp. 68-72

Author(s):

V.G. Nikitaev ◽

A.N. Pronichev ◽

V.V. Dmitrieva ◽

E.V. Polyakov ◽

A.D. Samsonova ◽

...

Keyword(s):

Bone Marrow ◽

Blood Cell ◽

Acute Leukemia ◽

White Blood Cell ◽

Cell Separation ◽

Large Scale ◽

Bone Marrow Cells ◽

Automated Analysis ◽

Cell Segmentation ◽

Measurement Systems

The issues of using of information and measurement systems based on processing of digital images of microscopic preparations for solving large-scale tasks of automating the diagnosis of acute leukemia are considered. The high density of leukocyte cells in the preparation (hypercellularity) is a feature of microscopic images of bone marrow preparations. It causes the proximity of cells to eachother and their contact with the formation of conglomerates. Measuring of the characteristics of bone marrow cells in such conditions leads to unacceptable errors (more than 50%). The work is devoted to segmentation of contiguous cells in images of bone marrow preparations. A method of cell separation during white blood cell segmentation on images of bone marrow preparations under conditions of hypercellularity of the preparation has been developed. The peculiarity of the proposed method is the use of an approach to segmentation of cell images based on the watershed method with markers. Key stages of the method: the formation of initial markers and builds the lines of watershed, a threshold binarization, shading inside the outline. The parameters of the separation of contiguous cells are determined. The experiment confirmed the effectiveness of the proposed method. The relative segmentation error was 5 %. The use of the proposed method in information and measurement systems of computer microscopy for automated analysis of bone marrow preparations will help to improve the accuracy of diagnosis of acute leukemia.

Download Full-text

Utopias of One

10.23943/princeton/9780691196541.001.0001 ◽

2019 ◽

Author(s):

Joshua Kotin

Keyword(s):

United States ◽

Soviet Union ◽

World War I ◽

Large Scale ◽

Jim Crow ◽

Personal Autonomy ◽

The United States ◽

Soviet Period ◽

Original Research ◽

The Soviet Union

This book is a new account of utopian writing. It examines how eight writers—Henry David Thoreau, W. E. B. Du Bois, Osip and Nadezhda Mandel'shtam, Anna Akhmatova, Wallace Stevens, Ezra Pound, and J. H. Prynne—construct utopias of one within and against modernity's two large-scale attempts to harmonize individual and collective interests: liberalism and communism. The book begins in the United States between the buildup to the Civil War and the end of Jim Crow; continues in the Soviet Union between Stalinism and the late Soviet period; and concludes in England and the United States between World War I and the end of the Cold War. In this way it captures how writers from disparate geopolitical contexts resist state and normative power to construct perfect worlds—for themselves alone. The book contributes to debates about literature and politics, presenting innovative arguments about aesthetic difficulty, personal autonomy, and complicity and dissent. It models a new approach to transnational and comparative scholarship, combining original research in English and Russian to illuminate more than a century and a half of literary and political history.

Download Full-text

Situational Breakdowns

10.1093/oso/9780190922061.001.0001 ◽

2019 ◽

Cited By ~ 8

Author(s):

Anne Nassauer

Keyword(s):

United States ◽

Large Scale ◽

Contextual Factors ◽

The United States ◽

Social Outcomes ◽

Dynamic Processes ◽

Collective Decisions ◽

Different Types ◽

The Impact

This book provides an account of how and why routine interactions break down and how such situational breakdowns lead to protest violence and other types of surprising social outcomes. It takes a close-up look at the dynamic processes of how situations unfold and compares their role to that of motivations, strategies, and other contextual factors. The book discusses factors that can draw us into violent situations and describes how and why we make uncommon individual and collective decisions. Covering different types of surprise outcomes from protest marches and uprisings turning violent to robbers failing to rob a store at gunpoint, it shows how unfolding situations can override our motivations and strategies and how emotions and culture, as well as rational thinking, still play a part in these events. The first chapters study protest violence in Germany and the United States from 1960 until 2010, taking a detailed look at what happens between the start of a protest and the eruption of violence or its peaceful conclusion. They compare the impact of such dynamics to the role of police strategies and culture, protesters’ claims and violent motivations, the black bloc and agents provocateurs. The analysis shows how violence is triggered, what determines its intensity, and which measures can avoid its outbreak. The book explores whether we find similar situational patterns leading to surprising outcomes in other types of small- and large-scale events: uprisings turning violent, such as Ferguson in 2014 and Baltimore in 2015, and failed armed store robberies.

Download Full-text

Ban Ki-moon, 2007–2016

10.1093/oso/9780198748915.003.0009 ◽

2018 ◽

Author(s):

Richard Gowan

Keyword(s):

United States ◽

Decision Making ◽

Security Council ◽

Large Scale ◽

Central African Republic ◽

The United States ◽

Cautious Approach ◽

Peacekeeping Operations ◽

Council Decision ◽

The Face

During Ban Ki-moon’s tenure, the Security Council was shaken by P5 divisions over Kosovo, Georgia, Libya, Syria, and Ukraine. Yet it also continued to mandate and sustain large-scale peacekeeping operations in Africa, placing major burdens on the UN Secretariat. The chapter will argue that Ban initially took a cautious approach to controversies with the Council, and earned a reputation for excessive passivity in the face of crisis and deference to the United States. The second half of the chapter suggests that Ban shifted to a more activist pressure as his tenure went on, pressing the Council to act in cases including Côte d’Ivoire, Libya, and Syria. The chapter will argue that Ban had only a marginal impact on Council decision-making, even though he made a creditable effort to speak truth to power over cases such as the Central African Republic (CAR), challenging Council members to live up to their responsibilities.

Download Full-text

Tiered Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441299 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-52

Author(s):

Lorenzo De Stefani ◽

Erisa Terolli ◽

Eli Upfal

Keyword(s):

Large Scale ◽

Analysis Of Algorithms ◽

Base Layer ◽

Single Edge ◽

Real World Data ◽

High Quality ◽

Large Graphs ◽

Massive Graphs ◽

Variance Estimate ◽

Low Probability

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Download Full-text