Evaluation of Optimization Strategies for Incremental Graph Queries

Machine learning is a technology which with accumulated data provides better decisions towards future applications. It is also the scientific study of algorithms implemented efficiently to perform a specific task without using explicit instructions. It may also be viewed as a subset of artificial intelligence in which it may be linked with the ability to automatically learn and improve from experience without being explicitly programmed. Its primary intention is to allow the computers learn automatically and produce more accurate results in order to identify profitable opportunities. Combining machine learning with AI and cognitive technologies can make it even more effective in processing large volumes human intervention or assistance and adjust actions accordingly. It may enable analyzing the huge data of information. It may also be linked to algorithm driven study towards improving the performance of the tasks. In such scenario, the techniques can be applied to judge and predict large data sets. The paper concerns the mechanism of supervised learning in the database systems, which would be self driven as well as secure. Also the citation of an organization dealing with student loans has been presented. The paper ends discussion, future direction and conclusion.

Download Full-text

An Investigation Into the Efficacy of Deep Learning Tools for Big Data Analysis in Health Care

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018070101 ◽

2018 ◽

Vol 10 (3) ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Rojalina Priyadarshini ◽

Rabindra K. Barik ◽

Chhabi Panigrahi ◽

Harishchandra Dubey ◽

Brojo Kishore Mishra

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Data ◽

Optimization Techniques ◽

Large Data Sets ◽

Data Sets ◽

Learning Tools ◽

Healthcare Applications ◽

Proper Training ◽

Future Prediction

This article describes how machine learning (ML) algorithms are very useful for analysis of data and finding some meaningful information out of them, which could be used in various other applications. In the last few years, an explosive growth has been seen in the dimension and structure of data. There are several difficulties faced by conventional ML algorithms while dealing with such highly voluminous and unstructured big data. The modern ML tools are designed and used to deal with all sorts of complexities of data. Deep learning (DL) is one of the modern ML tools which are commonly used to find the hidden structure and cohesion among these large data sets by giving proper training in parallel platforms with intelligent optimization techniques to further analyze and interpret the data for future prediction and classification. This article focuses on the use of DL tools and software which are used in past couple of years in various areas and especially in the area of healthcare applications.

Download Full-text

Improved hydrogeophysical characterization and monitoring through parallel modeling and inversion of time-domain resistivity andinduced-polarization data

Geophysics ◽

10.1190/1.3475513 ◽

2010 ◽

Vol 75 (4) ◽

pp. WA27-WA41 ◽

Cited By ~ 98

Author(s):

Timothy C. Johnson ◽

Roelof J. Versteeg ◽

Andy Ward ◽

Frederick D. Day-Lewis ◽

André Revil

Keyword(s):

Time Domain ◽

Distributed Storage ◽

Geophysical Methods ◽

Large Data ◽

Parallel Execution ◽

Large Data Sets ◽

Data Sets ◽

Tyrrhenian Sea ◽

Desktop Computer

Electrical geophysical methods have found wide use in the growing discipline of hydrogeophysics for characterizing the electrical properties of the subsurface and for monitoring subsurface processes in terms of the spatiotemporal changes in subsurface conductivity, chargeability, and source currents they govern. Presently, multichannel and multielectrode data collections systems can collect large data sets in relatively short periods of time. Practitioners, however, often are unable to fully utilize these large data sets and the information they contain because of standard desktop-computer processing limitations. These limitations can be addressed by utilizing the storage and processing capabilities of parallel computing environments. We have developed a parallel distributed-memory forward and inverse modeling algorithm for analyzing resistivity and time-domain induced polar-ization (IP) data. The primary components of the parallel computations include distributed computation of the pole solutions in forward mode, distributed storage and computation of the Jacobian matrix in inverse mode, and parallel execution of the inverse equation solver. We have tested the corresponding parallel code in three efforts: (1) resistivity characterization of the Hanford 300 Area Integrated Field Research Challenge site in Hanford, Washington, U.S.A., (2) resistivity characterization of a volcanic island in the southern Tyrrhenian Sea in Italy, and (3) resistivity and IP monitoring of biostimulation at a Superfund site in Brandywine, Maryland, U.S.A. Inverse analysis of each of these data sets would be limited or impossible in a standard serial computing environment, which underscores the need for parallel high-performance computing to fully utilize the potential of electrical geophysical methods in hydrogeophysical applications.

Download Full-text

Investigating the formation mechanism of polycyclic aromatic hydrocarbons and adapting particle swarm optimization techniques to search large data sets

10.32469/10355/8087 ◽

2010 ◽

Author(s):

Daniel P. Caputo

Keyword(s):

Particle Swarm Optimization ◽

Polycyclic Aromatic Hydrocarbons ◽

Formation Mechanism ◽

Aromatic Hydrocarbons ◽

Large Data ◽

Optimization Techniques ◽

Large Data Sets ◽

Data Sets ◽

Swarm Optimization ◽

Polycyclic Aromatic

Download Full-text

ACDtool: a web-server extending the original Audic-Claverie statistical test to the comparison of large data sets of counts

10.1101/304568 ◽

2018 ◽

Author(s):

Jean-Michel Claverie ◽

TA Thi Ngan

Keyword(s):

Pairwise Comparison ◽

Large Data ◽

Statistical Test ◽

Large Data Sets ◽

Supplementary Information ◽

Pairwise Comparisons ◽

Data Sets ◽

Multiple Datasets ◽

Software Packages ◽

Supplementary Material

AbstractMotivationMore than 20 years ago, our laboratory published an original statistical test (referred to as the Audic-Claverie (AC) test in the literature) to identify differentially expressed genes from the pairwise comparison of counts of cognate RNA-seq reads (then called “expressed sequence tags”) determined in different conditions. Despite its antiquity and the publications of more sophisticated software packages, this original article continued to gather more than 200 citations per year, indicating the persistent usefulness of the simple AC test for the community. This prompted us to propose a fully revamped version of the AC test with a user interface adapted to the diverse and much larger datasets produced by contemporary omics techniques.ResultsWe implemented ACDtool as an interactive, freely accessible web service proposing 3 types of analyses: 1) the pairwise comparison of individual counts, 2) pairwise comparisons of arbitrary large lists of counts, 3) the all-at-once pairwise comparisons of multiple datasets. Statistical computations are implemented using standard R functions and mathematically reformulated as to accommodate all practical ranges of count values. ACDtool can thus analyze datasets from transcriptomic, proteomic, metagenomics, barcoding, ChlP'seq, population genetics, etc, using the same mathematical approach. ACDtool is particularly well suited for comparisons of large datasets without replicates.AvailabilityACDtool is at URL: www.igs.cnrs-mrs.fr/acdtool/[email protected] informationnone.

Download Full-text

Bitmap Indices for Data Warehouses

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch091 ◽

2008 ◽

pp. 1590-1605

Author(s):

Kurt Stockinger ◽

Kesheng Wu

Keyword(s):

Time Complexity ◽

Large Data ◽

Database Systems ◽

Large Data Sets ◽

Future Research ◽

Data Sets ◽

Bitmap Index ◽

Access Method ◽

Efficient Access ◽

Efficient Query Processing

In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large data sets from real applications. According to the conventional wisdom, bitmap indices are only efficient for low-cardinality attributes. However, we show that the compressed bitmap indices are also efficient for high-cardinality attributes. Timing results demonstrate that the bitmap indices significantly outperform the projection index, which is often considered to be the most efficient access method for multi-dimensional queries. Finally, we review the bitmap index technology currently supported by commonly used commercial database systems and discuss open issues for future research and development.

Download Full-text

Adaptive code generation for data-intensive analytics

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447697 ◽

2021 ◽

Vol 14 (6) ◽

pp. 929-942

Author(s):

Wangda Zhang ◽

Junyoung Kim ◽

Kenneth A. Ross ◽

Eric Sedlar ◽

Lukas Stadler

Keyword(s):

Query Optimization ◽

Code Generation ◽

Large Data ◽

Optimization Techniques ◽

Large Data Sets ◽

Data Sets ◽

Data Intensive ◽

Performance Optimizations ◽

Data Intensive Applications ◽

Adaptive Code

Modern database management systems employ sophisticated query optimization techniques that enable the generation of efficient plans for queries over very large data sets. A variety of other applications also process large data sets, but cannot leverage database-style query optimization for their code. We therefore identify an opportunity to enhance an open-source programming language compiler with database-style query optimization. Our system dynamically generates execution plans at query time, and runs those plans on chunks of data at a time. Based on feedback from earlier chunks, alternative plans might be used for later chunks. The compiler extension could be used for a variety of data-intensive applications, allowing all of them to benefit from this class of performance optimizations.

Download Full-text

Bitmap Indices for Data Warehouses

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch007 ◽

2011 ◽

pp. 157-178 ◽

Cited By ~ 25

Author(s):

Kurt Stockinger ◽

Kesheng Wu

Keyword(s):

Time Complexity ◽

Large Data ◽

Database Systems ◽

Large Data Sets ◽

Future Research ◽

Data Sets ◽

Bitmap Index ◽

Access Method ◽

Efficient Access ◽

Efficient Query Processing

In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large data sets from real applications. According to the conventional wisdom, bitmap indices are only efficient for low-cardinality attributes. However, we show that the compressed bitmap indices are also efficient for high-cardinality attributes. Timing results demonstrate that the bitmap indices significantly outperform the projection index, which is often considered to be the most efficient access method for multi-dimensional queries. Finally, we review the bitmap index technology currently supported by commonly used commercial database systems and discuss open issues for future research and development.

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text