A Parallel FPGA Implementation of the CCSDS-123 Compression Algorithm

Satellite onboard processing for hyperspectral imaging applications is characterized by large data sets, limited processing resources and limited bandwidth of communication links. The CCSDS-123 algorithm is a specialized compression standard assembled for space-related applications. In this paper, a parallel FPGA implementation of CCSDS-123 compression algorithm is presented. The proposed design can compress any number of samples in parallel allowed by resource and I/O bandwidth constraints. The CCSDS-123 processing core has been placed on Zynq-7035 SoC and verified against the existing reference software. The estimated power use scales approximately linearly with the number of samples processed in parallel. Finally, the proposed implementation outperforms the state-of-the-art implementations in terms of both throughput and power.

Download Full-text

A review of forecasting techniques for large datasets

National Institute Economic Review ◽

10.1177/0027950108089682 ◽

2008 ◽

Vol 203 ◽

pp. 109-115 ◽

Cited By ~ 4

Author(s):

Jana Eklund ◽

George Kapetanios

Keyword(s):

State Of The Art ◽

Large Data ◽

Large Datasets ◽

Large Data Sets ◽

Multiple Models ◽

Small Subset ◽

Data Sets ◽

Single Model ◽

Data Set ◽

Forecasting Techniques

This paper aims to provide a brief and relatively non-technical overview of state-of-the-art forecasting with large data sets. We classify existing methods into four groups depending on whether data sets are used wholly or partly, whether a single model or multiple models are used and whether a small subset or the whole data set is being forecast. In particular, we provide brief descriptions of the methods and short recommendations where appropriate, without going into detailed discussions of their merits or demerits.

Download Full-text

Modular Grammatical Evolution for the Generation of Artificial Neural Networks

Evolutionary Computation ◽

10.1162/evco_a_00302 ◽

2021 ◽

pp. 1-36

Author(s):

Khabat Soltanian ◽

Ali Ebnenasir ◽

Mohsen Afsharchi

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Solution Space ◽

Single Layer ◽

Large Data ◽

Grammatical Evolution ◽

Large Data Sets ◽

Data Sets ◽

Novel Method ◽

Weak Locality

Abstract This paper presents a novel method, called Modular Grammatical Evolution (MGE), towards validating the hypothesis that restricting the solution space of NeuroEvolution to modular and simple neural networks enables the efficient generation of smaller and more structured neural networks while providing acceptable (and in some cases superior) accuracy on large data sets. MGE also enhances the state-of-the-art Grammatical Evolution (GE) methods in two directions. First, MGE's representation is modular in that each individual has a set of genes, and each gene is mapped to a neuron by grammatical rules. Second, the proposed representation mitigates two important drawbacks of GE, namely the low scalability and weak locality of representation, towards generating modular and multi-layer networks with a high number of neurons. We define and evaluate five different forms of structures with and without modularity using MGE and find single-layer modules with no coupling more productive. Our experiments demonstrate that modularity helps in finding better neural networks faster. We have validated the proposed method using ten well-known classification benchmarks with different sizes, feature counts, and output class counts. Our experimental results indicate that MGE provides superior accuracy with respect to existing NeuroEvolution methods and returns classifiers that are significantly simpler than other machine learning generated classifiers. Finally, we empirically demonstrate that MGE outperforms other GE methods in terms of locality and scalability properties.

Download Full-text

Native Language Identification With Classifier Stacking and Ensembles

Computational Linguistics ◽

10.1162/coli_a_00323 ◽

2018 ◽

Vol 44 (3) ◽

pp. 403-446 ◽

Cited By ~ 7

Author(s):

Shervin Malmasi ◽

Mark Dras

Keyword(s):

State Of The Art ◽

Native Language ◽

Ensemble Methods ◽

Large Data ◽

Language Identification ◽

Large Data Sets ◽

Data Sets ◽

Classification Models ◽

Multiple Classifiers ◽

Current State

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

Download Full-text

Numerical methods for accelerating the PCA of large data sets applied to hyperspectral imaging

10.1117/12.456960 ◽

2002 ◽

Cited By ~ 5

Author(s):

Frank Vogt ◽

Boris Mizaikoff ◽

Maurus Tacke

Keyword(s):

Numerical Methods ◽

Hyperspectral Imaging ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

From Horn-SRIQ to Datalog: A Data-Independent Transformation That Preserves Assertion Entailment

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012736 ◽

2019 ◽

Vol 33 ◽

pp. 2736-2743 ◽

Cited By ~ 4

Author(s):

David Carral ◽

Larry González ◽

Patrick Koopmann

Keyword(s):

Real World ◽

State Of The Art ◽

Description Logics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Intensive ◽

One Step ◽

Access Data ◽

Independent Transformation

Ontology-based access to large data-sets has recently gained a lot of attention. To access data efficiently, one approach is to rewrite the ontology into Datalog, and then use powerful Datalog engines to compute implicit entailments. Existing rewriting techniques support Description Logics (DLs) from ELH to Horn-SHIQ. We go one step further and present one such data-independent rewriting technique for Horn-SRIQ⊓, the extension of Horn-SHIQ that supports role chain axioms, an expressive feature prominently used in many real-world ontologies. We evaluated our rewriting technique on a large known corpus of ontologies. Our experiments show that the resulting rewritings are of moderate size, and that our approach is more efficient than state-of-the-art DL reasoners when reasoning with data-intensive ontologies.

Download Full-text

Multidimensional Business Benchmarking Analysis on Data Warehouses

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2017010103 ◽

2017 ◽

Vol 13 (1) ◽

pp. 51-75 ◽

Cited By ~ 1

Author(s):

Akiko Campbell ◽

Xiangbo Mao ◽

Jian Pei ◽

Abdullah Al-Barakati

Keyword(s):

Business Intelligence ◽

State Of The Art ◽

Technical Problem ◽

Large Data ◽

Data Cube ◽

Large Data Sets ◽

Weather Data ◽

Data Sets ◽

Business Analytics ◽

Data Set

Benchmarking analysis has been used extensively in industry for business analytics. Surprisingly, how to conduct benchmarking analysis efficiently over large data sets remains a technical problem untouched. In this paper, the authors formulate benchmark queries in the context of data warehousing and business intelligence, and develop a series of algorithms to answer benchmark queries efficiently. Their methods employ several interesting ideas and the state-of-the-art data cube computation techniques to reduce the number of aggregate cells that need to be computed and indexed. An empirical study using the TPC-H data sets and the Weather data set demonstrates the efficiency and scalability of their methods.

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Faculty Opinions recommendation of Detecting novel associations in large data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13805958.793484294 ◽

2014 ◽

Author(s):

Daniel Lee

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Novel Associations

Download Full-text

NVESTIGATION OF THE EFFICIENCY OF DISTRIBUTED INFORMATION SYSTEMS BASED ON THE PROCESSING OF LARGE AMOUNTS OF DATA

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-03 ◽

2019 ◽

Author(s):

Mykhajlo Klymash ◽

Olena Hordiichuk — Bublivska ◽

Ihor Tchaikovskyi ◽

Oksana Urikova

Keyword(s):

Distributed Systems ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Decomposition ◽

Distributed Information ◽

Software Model ◽

Computing Performance ◽

Mapreduce Model ◽

Singular Data

In this article investigated the features of processing large arrays of information for distributed systems. A method of singular data decomposition is used to reduce the amount of data processed, eliminating redundancy. Dependencies of computational efficiency on distributed systems were obtained using the MPI messaging protocol and MapReduce node interaction software model. Were analyzed the efficiency of the application of each technology for the processing of different sizes of data: Non — distributed systems are inefficient for large volumes of information due to low computing performance. It is proposed to use distributed systems that use the method of singular data decomposition, which will reduce the amount of information processed. The study of systems using the MPI protocol and MapReduce model obtained the dependence of the duration calculations time on the number of processes, which testify to the expediency of using distributed computing when processing large data sets. It is also found that distributed systems using MapReduce model work much more efficiently than MPI, especially with large amounts of data. MPI makes it possible to perform calculations more efficiently for small amounts of information. When increased the data sets, advisable to use the Map Reduce model.

Download Full-text