data intensive Latest Research Papers

The Complexity and Expressive Power of Limit Datalog

Journal of the ACM ◽

10.1145/3495009 ◽

2022 ◽

Vol 69 (1) ◽

pp. 1-83

Author(s):

Mark Kaminski ◽

Egor V. Kostylev ◽

Bernardo Cuenca Grau ◽

Boris Motik ◽

Ian Horrocks

Keyword(s):

Data Analysis ◽

Expressive Power ◽

Arithmetic Functions ◽

Linear Programs ◽

Data Complexity ◽

Descriptive Complexity ◽

Data Intensive ◽

Additional Stability ◽

Decidability And Complexity ◽

The Impact

Motivated by applications in declarative data analysis, in this article, we study Datalog Z —an extension of Datalog with stratified negation and arithmetic functions over integers. This language is known to be undecidable, so we present the fragment of limit Datalog Z programs, which is powerful enough to naturally capture many important data analysis tasks. In limit Datalog Z , all intensional predicates with a numeric argument are limit predicates that keep maximal or minimal bounds on numeric values. We show that reasoning in limit Datalog Z is decidable if a linearity condition restricting the use of multiplication is satisfied. In particular, limit-linear Datalog Z is complete for Δ 2 EXP and captures Δ 2 P over ordered datasets in the sense of descriptive complexity. We also provide a comprehensive study of several fragments of limit-linear Datalog Z . We show that semi-positive limit-linear programs (i.e., programs where negation is allowed only in front of extensional atoms) capture coNP over ordered datasets; furthermore, reasoning becomes coNEXP-complete in combined and coNP-complete in data complexity, where the lower bounds hold already for negation-free programs. In order to satisfy the requirements of data-intensive applications, we also propose an additional stability requirement, which causes the complexity of reasoning to drop to EXP in combined and to P in data complexity, thus obtaining the same bounds as for usual Datalog. Finally, we compare our formalisms with the languages underpinning existing Datalog-based approaches for data analysis and show that core fragments of these languages can be encoded as limit programs; this allows us to transfer decidability and complexity upper bounds from limit programs to other formalisms. Therefore, our article provides a unified logical framework for declarative data analysis which can be used as a basis for understanding the impact on expressive power and computational complexity of the key constructs available in existing languages.

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3484983 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-32

Author(s):

Nikolaos Alachiotis ◽

Panagiotis Skrimponis ◽

Manolis Pissadakis ◽

Dionisios Pnevmatikatos

Keyword(s):

Virtual Machines ◽

Likelihood Function ◽

Hardware Acceleration ◽

Operation Performance ◽

Data Movement ◽

Memory Processing ◽

Data Intensive ◽

Maximum Likelihood Methods ◽

Time And Energy ◽

Computational Kernel

Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.

A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction

Rock Mechanics and Rock Engineering ◽

10.1007/s00603-021-02745-z ◽

2022 ◽

Author(s):

Ya-Qiang Gong ◽

Guang-Li Guo ◽

Li-Ping Wang ◽

Huai-Zhan Li ◽

Guang-Xue Zhang ◽

...

Keyword(s):

Numerical Modeling ◽

Large Scale ◽

Modeling Method ◽

Mining Subsidence ◽

Data Intensive ◽

Subsidence Prediction ◽

Rock Strata

CyVerse for Reproducible Research: RNA-Seq Analysis

Plant Bioinformatics - Methods in Molecular Biology ◽

10.1007/978-1-0716-2067-0_3 ◽

2022 ◽

pp. 57-79

Author(s):

Jason Williams

Keyword(s):

Data Storage ◽

High Performance ◽

Lessons Learned ◽

Data Availability ◽

Reproducible Research ◽

Rna Seq ◽

Data Intensive ◽

Interactive Computing ◽

Computing Environments ◽

Performance Computing

AbstractPosing complex research questions poses complex reproducibility challenges. Datasets may need to be managed over long periods of time. Reliable and secure repositories are needed for data storage. Sharing big data requires advance planning and becomes complex when collaborators are spread across institutions and countries. Many complex analyses require the larger compute resources only provided by cloud and high-performance computing infrastructure. Finally at publication, funder and publisher requirements must be met for data availability and accessibility and computational reproducibility. For all of these reasons, cloud-based cyberinfrastructures are an important component for satisfying the needs of data-intensive research. Learning how to incorporate these technologies into your research skill set will allow you to work with data analysis challenges that are often beyond the resources of individual research institutions. One of the advantages of CyVerse is that there are many solutions for high-powered analyses that do not require knowledge of command line (i.e., Linux) computing. In this chapter we will highlight CyVerse capabilities by analyzing RNA-Seq data. The lessons learned will translate to doing RNA-Seq in other computing environments and will focus on how CyVerse infrastructure supports reproducibility goals (e.g., metadata management, containers), team science (e.g., data sharing features), and flexible computing environments (e.g., interactive computing, scaling).

Big Data Technologies and Management

10.4018/978-1-6684-3662-2.ch084 ◽

2022 ◽

pp. 1734-1744

Author(s):

Jayashree K. ◽

Abirami R.

Keyword(s):

Big Data ◽

Large Scale ◽

New Paradigm ◽

It Industry ◽

Data Intensive ◽

Big Data Technologies ◽

Data Explosion ◽

Business Engineering ◽

Future Direction ◽

Big Data Computing

Developments in information technology and its prevalent growth in several areas of business, engineering, medical, and scientific studies are resulting in information as well as data explosion. Knowledge discovery and decision making from such rapidly growing voluminous data are a challenging task in terms of data organization and processing, which is an emerging trend known as big data computing. Big data has gained much attention from the academia and the IT industry. A new paradigm that combines large-scale compute, new data-intensive techniques, and mathematical models to build data analytics. Thus, this chapter discusses the background of big data. It also discusses the various application of big data in detail. The various related work and the future direction would be addressed in this chapter.

A High-throughput Parallel Viterbi Algorithm via Bitslicing

ACM Transactions on Parallel Computing ◽

10.1145/3470642 ◽

2021 ◽

Vol 8 (4) ◽

pp. 1-25

Author(s):

Saleh Khalaj Monfared ◽

Omid Hajihassani ◽

Vahid Mohsseni ◽

Dara Rahmati ◽

Saeid Gorgin

Keyword(s):

High Throughput ◽

High Performance ◽

Viterbi Algorithm ◽

Data Representation ◽

Processing Unit ◽

Viterbi Decoder ◽

Soft Decision ◽

Content Type ◽

Data Intensive ◽

Representation Scheme

In this work, we present a novel bitsliced high-performance Viterbi algorithm suitable for high-throughput and data-intensive communication. A new column-major data representation scheme coupled with the bitsliced architecture is employed in our proposed Viterbi decoder that enables the maximum utilization of the parallel processing units in modern parallel accelerators. With the help of the proposed alteration of the data scheme, instead of the conventional bit-by-bit operations, 32-bit chunks of data are processed by each processing unit. This means that a single bitsliced parallel Viterbi decoder is capable of decoding 32 different chunks of data simultaneously. Here, the Viterbi’s Add-Compare-Select procedure is implemented with our proposed bitslicing technique, where it is shown that the bitsliced operations for the Viterbi internal functionalities are efficient in terms of their performance and complexity. We have achieved this level of high parallelism while keeping an acceptable bit error rate performance for our proposed methodology. Our suggested hard and soft-decision Viterbi decoder implementations on GPU platforms outperform the fastest previously proposed works by 4.3{\times } and 2.3{\times } , achieving 21.41 and 8.24 Gbps on Tesla V100, respectively.

Implementation and Application of Artificial Intelligence in Selected Public Services

Hrvatska i komparativna javna uprava ◽

10.31297/hkju.21.4.2 ◽

2021 ◽

Vol 21 (4) ◽

pp. 601-622

Author(s):

Nikola Štefanišinová ◽

Nikoleta Jakuš Muthová ◽

Jana Štrangfeldová ◽

Katarína Šulajová

Keyword(s):

Quality Of Life ◽

Artificial Intelligence ◽

Social Services ◽

Public Services ◽

Delivery Of Healthcare ◽

The Public ◽

Artificial Intelligence Techniques ◽

Data Intensive ◽

Realistic Assessment

Data-intensive technologies, such as artificial intelligence, imply huge opportunities for transforming the delivery of healthcare and social services, improving people’s quality of life and working in the health and welfare system. The aim of this paper is to present examples of the implementation of artificial intelligence techniques in healthcare and social services and to sketch the trends and challenges in the adoption of artificial intelligence techniques, with an emphasis on the public sector and selected public services. Analysis is based on a realistic assessment of current artificial intelligence technologies and their anticipated development. Besides the benefits and potential opportunities for healthcare and social services, there are also challenges for governments. Understanding the huge potential of artificial intelligence as well as its limitations will be a key step forward, but it is essential to avoid the trap of an overestimation of artificial intelligence potential.

A Data-intensive Approach to Allocating Owner vs. NFIP portion of Average Annual Flood Losses

10.1002/essoar.10509884.1 ◽

2021 ◽

Author(s):

Md Adilur Rahim ◽

Carol Freidland ◽

Robert Rohli ◽

Nazla Bushra ◽

Rubayet Bin Mostafiz

Keyword(s):

Data Intensive ◽

Flood Losses ◽

Intensive Approach

Innovations in clinical documentation integrity practice: Continual adaptation in a data-intensive healthcare organisation

Health Information Management Journal ◽

10.1177/18333583211067845 ◽

2021 ◽

pp. 183335832110678

Author(s):

Kathleen H Pine ◽

Lee Anne Landon ◽

Claus Bossen ◽

ME VanGelder

Keyword(s):

Relationship Building ◽

Healthcare Organisation ◽

Clinical Documentation ◽

Payment Systems ◽

Clinical Coding ◽

Hospital System ◽

Data Intensive ◽

Patient Admissions ◽

Vital Component

Background Numbers of clinical documentation integrity specialists (CDIS) and CDI programs have increased rapidly. CDIS review patient records concurrently with patient admissions and visits to ensure that information is accurate, complete and non-ambiguous, and query clinicians when they see opportunities for improving data. The occupation was initially focused on improving data for reimbursement, but rapid changes to clinical coding requirements, technologies and payment systems led to a quickly evolving role for CDI programs and changes in CDIS practice. Objective This case study seeks to uncover the ongoing innovation and adaptation occurring in a CDI program by tracing the evolution of a single CDI program over time. Method We present a case study of the CDI program at the HonorHealth hospital system in Arizona. Results The HonorHealth CDI program holds a unique hybrid expertise and role within the healthcare organisation that allows it to rapidly adapt to support emergent demands both internal and external to the organisation, such as supporting accurate data collection for the COVID-19 pandemic. Conclusion CDIS are a vital component in present data-intensive resourcing efforts. The hybrid expertise of CDIS and capacity for adaption and relationship building has enabled the HonorHealth CDI program to adapt rapidly to meet a growing array of clinical documentation integrity needs, including emergent needs during the COVID-19 pandemic. Implications The HonorHealth case study can guide other CDI programs in adaptation of the CDI role and practices in response to changing organisational needs.

Are multiscale Water-Energy-Land-Food nexus studies effective in assessing agricultural sustainability?

Environmental Research Letters ◽

10.1088/1748-9326/ac435f ◽

2021 ◽

Author(s):

Sai Jagadeesh Gaddam ◽

Prasanna Venkatesh Sampath

Keyword(s):

Energy Consumption ◽

Energy Security ◽

Multiple Scales ◽

Spatial Scales ◽

Critical Role ◽

Vulnerability Index ◽

Sustainable Food ◽

Data Intensive ◽

Agricultural Efficiency ◽

Requirement Model

Abstract Several studies have highlighted the need for multiscale Water-Energy-Land-Food (WELF) nexus studies to ensure sustainable food production without endangering water and energy security. However, a systematic attempt to evaluate the efficiency of such multiscale studies has not yet been made. In this study, we used a data-intensive crop water requirement model to study the multiscale WELF nexus in southern India. In particular, we estimated the groundwater and energy consumption for cultivating five major crops between 2017 and 2019 at three distinct spatial scales ranging from 160,000 km2 (state) to 11,000 km2 (district) to 87 km2 (block). A two-at-one-time approach was used to develop six WELF interactions for each crop, which was used to evaluate the performance of each region. A Gross Vulnerability Index (GVI) was developed at multiple scales that integrated the WELF interactions to identify vulnerable hotspots from a nexus perspective. Results from this nexus study identified the regions that accounted for the largest groundwater and energy consumption, which were also adjudged to be vulnerable hotspots. Our results indicate that while a finer analysis may be necessary for drought-resistant crops like groundnut, a coarser scale analysis may be sufficient to evaluate the agricultural efficiency of water-intensive crops like paddy and sugarcane. We identified that vulnerable hotspots at local scales were often dependent on the crop under consideration, i.e., a hotspot for one crop may not necessarily be a hotspot for another. Clearly, policymaking decisions for improving irrigation efficiency through interventions such as crop-shifting would benefit from such insights. It is evident that such approaches will play a critical role in ensuring food-water-energy security in the coming decades.

data intensive
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Complexity and Expressive Power of Limit Datalog

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction

CyVerse for Reproducible Research: RNA-Seq Analysis

Big Data Technologies and Management

A High-throughput Parallel Viterbi Algorithm via Bitslicing

Implementation and Application of Artificial Intelligence in Selected Public Services

A Data-intensive Approach to Allocating Owner vs. NFIP portion of Average Annual Flood Losses

Innovations in clinical documentation integrity practice: Continual adaptation in a data-intensive healthcare organisation

Are multiscale Water-Energy-Land-Food nexus studies effective in assessing agricultural sustainability?

Export Citation Format

data intensiveRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Complexity and Expressive Power of Limit Datalog

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction

CyVerse for Reproducible Research: RNA-Seq Analysis

Big Data Technologies and Management

A High-throughput Parallel Viterbi Algorithm via Bitslicing

Implementation and Application of Artificial Intelligence in Selected Public Services

A Data-intensive Approach to Allocating Owner vs. NFIP portion of Average Annual Flood Losses

Innovations in clinical documentation integrity practice: Continual adaptation in a data-intensive healthcare organisation

Are multiscale Water-Energy-Land-Food nexus studies effective in assessing agricultural sustainability?

data intensive
Recently Published Documents