neuPrint: Analysis Tools for EM Connectomics

AbstractDue to technological advances in electron microscopy (EM) and deep learning, it is now practical to reconstruct a connectome, a description of neurons and the connections between them, for significant volumes of neural tissue. The limited scope of past reconstructions meant they were primarily used by domain experts, and performance was not a serious problem. But the new reconstructions, of common laboratory creatures such as the fruit fly Drosophila melanogaster, upend these assumptions. These natural neural networks now contain tens of thousands of neurons and tens of millions of connections between them, with yet larger reconstructions pending, and are of interest to a large community of non-specialists. This requires new tools that are easy to use and efficiently handle large data. We introduce neuPrint to address these data analysis challenges. neuPrint is a database and analysis ecosystem that organizes connectome data in a manner conducive to biological discovery. In particular, we propose a data model that allows users to access the connectome at different levels of abstraction primarily through a graph database, neo4j, and its powerfully expressive query language Cypher. neuPrint is compatible with modern connectome reconstruction workflows, providing tools for assessing reconstruction quality, and offering both batch and incremental updates to match modern connectome reconstruction flows. Finally, we introduce a web interface and programmer API that targets a diverse user skill set. We demonstrate the effectiveness and efficiency of neuPrint through example database queries.

Download Full-text

SQL Scorecard for Improved Stability and Performance of Data Warehouses

International Journal of Software Innovation ◽

10.4018/ijsi.2016070102 ◽

2016 ◽

Vol 4 (3) ◽

pp. 22-37 ◽

Cited By ~ 4

Author(s):

Nayem Rahman

Keyword(s):

Ad Hoc ◽

Query Language ◽

Measurement Techniques ◽

Evaluation Criteria ◽

Large Data ◽

Database Systems ◽

Optimal Level ◽

Database System ◽

Improved Stability ◽

And Performance

Scorecard-based measurement techniques are used by organizations to measure the performance of their business operations. A scorecard approach could be applied to a database system to measure performance of SQL (Structured Query Language) being executed and the extent of resources being used by SQL. In a large data warehouse, thousands of jobs run daily via batch cycles to refresh different subject areas. Simultaneously, thousands of queries by business intelligence tools and ad-hoc queries are being executed twenty-four by seven. There needs to be a controlling mechanism to make sure these batch jobs and queries are efficient and do not consume database systems resources more than optimal. The authors propose measurement of SQL query performance via a scorecard tool. The motivation behind using a scorecard tool is to make sure that the resource consumption of SQL queries is predictable and the database system environment is stable. The experimental results show that queries that pass scorecard evaluation criteria tend to utilize optimal level of database systems computing resources. These queries also show improved parallel efficiency (PE) in using computing resources (CPU, I/O and spool space) that demonstrate the usefulness of SQL scorecard.

Download Full-text

Array databases: concepts, standards, implementations

Journal Of Big Data ◽

10.1186/s40537-020-00399-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Peter Baumann ◽

Dimitar Misev ◽

Vlad Merticariu ◽

Bang Pham Huu

Keyword(s):

Service Quality ◽

Ad Hoc ◽

Query Language ◽

Distributed Processing ◽

Database Systems ◽

Database Technology ◽

Comprehensive Survey ◽

Spatio Temporal ◽

And Performance ◽

Array Databases

AbstractMulti-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to erode and not keep up with the increasing requirements on performance and service quality. Array Database systems attempt to close this gap by providing declarative query support for flexible ad-hoc analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array Database installations exist, employing massive parallelism and distributed processing. Hence, questions arise about technology and standards available, usability, and overall maturity. Several papers have compared models and formalisms, and benchmarks have been undertaken as well, typically comparing two systems against each other. While each of these represent valuable research to the best of our knowledge there is no comprehensive survey combining model, query language, architecture, and practical usability, and performance aspects. The size of this comparison differentiates our study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. This article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest Group. It has elicited the state of the art in Array Databases, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology? As it turns out, Array Databases can offer significant advantages in terms of flexibility, functionality, extensibility, as well as performance and scalability—in total, the database approach of offering “datacubes” analysis-ready heralds a new level of service quality. Investigation shows that there is a lively ecosystem of technology with increasing uptake, and proven array analytics standards are in place. Consequently, such approaches have to be considered a serious option for datacube services in science, engineering and beyond. Tools, though, vary greatly in functionality and performance as it turns out.

Download Full-text

Comprehensive Analysis of Automatic Identification System (AIS) Data in Regard to Vessel Movement Prediction

Journal of Navigation ◽

10.1017/s0373463314000253 ◽

2014 ◽

Vol 67 (5) ◽

pp. 791-809 ◽

Cited By ~ 32

Author(s):

Philipp Last ◽

Christian Bahlke ◽

Martin Hering-Bertram ◽

Lars Linsen

Keyword(s):

Large Data ◽

Automatic Identification ◽

Identification System ◽

Very High Frequency ◽

Database Queries ◽

Related Data ◽

Movement Prediction ◽

Time Period ◽

Current Usage ◽

Exchange Vessel

AIS was primarily developed to exchange vessel-related data among vessels or AIS stations by using very-high frequency (VHF) technology to increase safety at sea. This study evaluates the formal integrity, availability, and the reporting intervals of AIS data with a focus on vessel movement prediction. In contrast to former studies, this study is based on a large data collection of over 85 million AIS messages, which were continuously received within a time period of two months. Thus, the evaluated data represent a comprehensive and up-to-date view of the current usage of AIS systems installed on vessels. Results of previous studies concerning the availability of AIS data are confirmed and extended. New aspects such as reporting intervals are additionally evaluated. Received messages are stored in a database, which allows for performing database queries to evaluate the obtained data in an automatic way. This study shows that almost ten years after becoming mandatory for professional operating vessels, AIS still lacks availability for both static and dynamic data and that the reporting intervals are not as reliable as specified within the technical AIS standard.

Download Full-text

Phylogenetic analysis of large molecular data sets

Botanical Sciences ◽

10.17129/botsci.1509 ◽

2017 ◽

pp. 99

Author(s):

Pamela S. Soltis ◽

Douglas E. Soltis

Keyword(s):

Large Data ◽

Molecular Data ◽

Global Optimum ◽

Large Data Sets ◽

Phylogeny Reconstruction ◽

Local Optimum ◽

Data Sets ◽

Additional Consideration ◽

Technological Advances ◽

The Difference

Technological advances in molecular biology have greatly increased the speed and efficiency of DNA sequencing, making it possible to construct large molecular data sets for phylogeny reconstruction relatively quickly. Despite their potential for improving our understanding of phylogeny, these large data sets also provide many challenges. In this paper, we discuss several of these challenges, including 1) the failure of a search to find the most parsimonious trees (the local optimum) in a reasonable amount of time, 2) the difference between a local optimum and the global optimum, and 3) the existence of multiple classes (islands) of most parsimonious trees. We also discuss possible strategies to improve the' likelihood of finding the most parsimonious tree(s) and present two examples from our work on angiosperm phylogeny. We conclude with a discussion of two alternatives to analyses of entire large data sets, the exemplar approach and compartmentalization, and suggest that additional consideration must be given to issues of data analysis for large data sets, whether morphological or molecular.

Download Full-text

Integration of advances in sustainable technologies for the development of the Sustainable Building Assessment Tool

International Journal of Technology Management and Sustainable Development ◽

10.1386/tmsd_00030_1 ◽

2020 ◽

Vol 19 (3) ◽

pp. 335-360

Author(s):

Suchith Reddy Arukala ◽

Rathish Kumar Pancharathi

Keyword(s):

Building Materials ◽

Assessment Tool ◽

Sustainable Construction ◽

Bottom Line ◽

Sustainable Technologies ◽

Discrete Reinforcement ◽

Technological Advances ◽

Building Assessment ◽

And Performance ◽

Social Environmental

The construction sector is a resource-driven and resource-dependent industry. A rising global interest to incorporate sustainability principles in the policy-making means a careful balancing of economic growth with sustainability. To achieve this end in the Indian building sector, a triple-bottom-line-based building assessment tool like GRIHA and IGBC was introduced for assessing building sustainability. However, to revitalize the ideas of Reduce, Replace, Reuse, Recycle and Renovate (the ‘5Rs’) into implementable solutions, the technological dimension is introduced to form a quadruple bottom line (QBL) approach, i.e., social, environmental, economic and technological (SEET), for achieving sustainable construction. This study aims to address the necessity to add a new dimension, viz. technological advances in the sustainability arena of the construction industry. The objective of the study is to include technological advances in building materials, construction processes and techniques and design philosophies in the developed SBAT framework. In this extended and upgraded SBAT 2.0, advances in sustainability (AS) criterion accounts for 11.5 per cent showing its significance in achieving building sustainability. The use of discrete reinforcement, additive manufacturing, 3D printing, design based on packing density and rheological properties of concrete, use of alkali-activated materials in the mix-design and performance-based design concepts that affect future sustainability are successfully brought into the fold of SBAT framework.

Download Full-text

Law, Presence to Absence

The Oxford Handbook of Politics and Performance ◽

10.1093/oxfordhb/9780190863456.013.48 ◽

2021 ◽

pp. 72-88

Author(s):

Kate Leader

Keyword(s):

Fair Trial ◽

Criminal Trial ◽

Performance Theory ◽

Criminal Trials ◽

England And Wales ◽

Technological Advances ◽

New Challenges ◽

Being There ◽

And Performance ◽

The Right

The live presence of a defendant at trial is a long-standing feature of adversarial criminal trial. So much of what constitutes the adversarial method of adjudication is dependent on qualities that arise from this presence: confrontation and demeanor assessment, among other factors, play important roles in how truth is constructed. As such, performative matters—how a defendant enacts and inhabits her role, how she is positioned or silenced-- have long been of concern to legal scholars. These performative concerns are also centrally implicated in defendant rights, such as the right to a fair trial. But today we face new challenges that call into question fundamental beliefs around trials, defendant presence, and fairness. First, technological advances have led to defendants appearing remotely in hearings from the prison in which they are held. Second, the trial itself is arguably vanishing in most adversarial jurisdictions. Third, the use of trials in absentia means that criminal trials may take place in a defendant’s absence; in England and Wales for less serious offenses this can be done without inquiring why a defendant isn’t there. This chapter therefore seeks to understand the performative implications of these challenges by shifting the conversation from presence to absence. What difference does it make if a defendant is no longer there? Does being there facilitate greater fairness, despite the obvious issues of constraint and silencing? Drawing on sociolegal, political, and performance theory the chapter considers the implications of absence in the criminal trial, asking what happens when the defendant disappears.

Download Full-text

The Classification System of Literary Works Based on K-Means Clustering

Journal of Interconnection Networks ◽

10.1142/s0219265921410012 ◽

2021 ◽

pp. 2141001

Author(s):

Sanqiang Wei ◽

Hongxia Hou ◽

Hua Sun ◽

Wei Li ◽

Wenxia Song

Keyword(s):

Clustering Algorithm ◽

Performance Ratio ◽

Levels Of Abstraction ◽

Text Documents ◽

Text Data ◽

Literary Works ◽

Accuracy Comparison ◽

Word Classification ◽

Text Features ◽

And Performance

The plots in certain literary works are very complicated and hinder readers from understanding them. Therefore tools should be proposed to support readers; comprehension of complex literary works supports their understanding by providing the most important information to readers. A human reader must capture multiple levels of abstraction and meaning to formulate an understanding of a document. Hence, in this paper, an Improved [Formula: see text]-means clustering algorithm (IKCA) has been proposed for literary word classification. For text data, the words that can express exact semantic in a class are generally better features. This paper uses the proposed technique to capture numerous cluster centroids for every class and then select the high-frequency words in centroids the text features for classification. Furthermore, neural networks have been used to classify text documents and [Formula: see text]-mean to cluster text documents. To develop the model based on unsupervised and supervised techniques to meet and identify the similarity between documents. The numerical results show that the suggested model will enhance to increases quality comparison of the existing Algorithm and [Formula: see text]-means algorithm, accuracy comparison of ALA and IKCA (95.2%), time is taken for clustering is less than 2 hours, success rate (97.4%) and performance ratio (98.1%).

Download Full-text

Protoforms of Linguistic Database Summaries as a Human Consistent Tool for Using Natural Language in Data Mining

Software and Intelligent Sciences ◽

10.4018/978-1-4666-0261-8.ch010 ◽

2012 ◽

pp. 157-168

Author(s):

Janusz Kacprzyk ◽

Slawomir Zadrozny

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Interactive Approach ◽

Database Queries ◽

Fuzzy Database ◽

Linguistic Quantifiers ◽

Fuzzy Linguistic ◽

Web Server Logs ◽

General Tool

We consider linguistic database summaries in the sense of Yager (1982), in an implementable form proposed by Kacprzyk & Yager (2001) and Kacprzyk, Yager & Zadrozny (2000), exemplified by, for a personnel database, “most employees are young and well paid” (with some degree of truth) and their extensions as a very general tool for a human consistent summarization of large data sets. We advocate the use of the concept of a protoform (prototypical form), vividly advocated by Zadeh and shown by Kacprzyk & Zadrozny (2005) as a general form of a linguistic data summary. Then, we present an extension of our interactive approach to fuzzy linguistic summaries, based on fuzzy logic and fuzzy database queries with linguistic quantifiers. We show how fuzzy queries are related to linguistic summaries, and that one can introduce a hierarchy of protoforms, or abstract summaries in the sense of latest Zadeh’s (2002) ideas meant mainly for increasing deduction capabilities of search engines. We show an implementation for the summarization of Web server logs.

Download Full-text

Big Qual: Defining and Debating Qualitative Inquiry for Large Data Sets

International Journal of Qualitative Methods ◽

10.1177/1609406919880692 ◽

2019 ◽

Vol 18 ◽

pp. 160940691988069 ◽

Cited By ~ 1

Author(s):

Rebecca L. Brower ◽

Tamara Bertrand Jones ◽

La’Tara Osborne-Lampkin ◽

Shouping Hu ◽

Toby J. Park-Gaghan

Keyword(s):

Qualitative Data ◽

Large Data ◽

Qualitative Inquiry ◽

Large Data Sets ◽

Computer Assisted ◽

Data Sets ◽

Sampling Strategies ◽

Analysis Software ◽

Technological Advances ◽

Qualitative Data Analysis Software

Big qualitative data (Big Qual), or research involving large qualitative data sets, has introduced many newly evolving conventions that have begun to change the fundamental nature of some qualitative research. In this methodological essay, we first distinguish big data from big qual. We define big qual as data sets containing either primary or secondary qualitative data from at least 100 participants analyzed by teams of researchers, often funded by a government agency or private foundation, conducted either as a stand-alone project or in conjunction with a large quantitative study. We then present a broad debate about the extent to which big qual may be transforming some forms of qualitative inquiry. We present three questions, which examine the extent to which large qualitative data sets offer both constraints and opportunities for innovation related to funded research, sampling strategies, team-based analysis, and computer-assisted qualitative data analysis software (CAQDAS). The debate is framed by four related trends to which we attribute the rise of big qual: the rise of big quantitative data, the growing legitimacy of qualitative and mixed methods work in the research community, technological advances in CAQDAS, and the willingness of government and private foundations to fund large qualitative projects.

Download Full-text

Spectral Processing for Denoising and Compression of 3D Meshes Using Dynamic Orthogonal Iterations

Journal of Imaging ◽

10.3390/jimaging6060055 ◽

2020 ◽

Vol 6 (6) ◽

pp. 55

Author(s):

Gerasimos Arvanitis ◽

Aris S. Lalos ◽

Konstantinos Moustakas

Keyword(s):

Computational Complexity ◽

Laplacian Matrix ◽

3D Models ◽

Graph Laplacian ◽

Fixed Number ◽

Spectral Processing ◽

Reconstruction Quality ◽

And Performance ◽

Optimal Subspace

Recently, spectral methods have been extensively used in the processing of 3D meshes. They usually take advantage of some unique properties that the eigenvalues and the eigenvectors of the decomposed Laplacian matrix have. However, despite their superior behavior and performance, they suffer from computational complexity, especially while the number of vertices of the model increases. In this work, we suggest the use of a fast and efficient spectral processing approach applied to dense static and dynamic 3D meshes, which can be ideally suited for real-time denoising and compression applications. To increase the computational efficiency of the method, we exploit potential spectral coherence between adjacent parts of a mesh and then we apply an orthogonal iteration approach for the tracking of the graph Laplacian eigenspaces. Additionally, we present a dynamic version that automatically identifies the optimal subspace size that satisfies a given reconstruction quality threshold. In this way, we overcome the problem of the perceptual distortions, due to the fixed number of subspace sizes that is used for all the separated parts individually. Extensive simulations carried out using different 3D models in different use cases (i.e., compression and denoising), showed that the proposed approach is very fast, especially in comparison with the SVD based spectral processing approaches, while at the same time the quality of the reconstructed models is of similar or even better reconstruction quality. The experimental analysis also showed that the proposed approach could also be used by other denoising methods as a preprocessing step, in order to optimize the reconstruction quality of their results and decrease their computational complexity since they need fewer iterations to converge.

Download Full-text