Adaptive data reduction for large-scale transaction data

Xiao-Bai Li; Varghese S. Jacob

doi:10.1016/j.ejor.2007.08.008

Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets

Journal of Computer-Aided Design & Computer Graphics ◽

10.3724/sp.j.1089.2021.19263 ◽

2021 ◽

Vol 33 (12) ◽

pp. 1795-1802

Author(s):

Zhiwei Ai ◽

Juelin Leng ◽

Fang Xia ◽

Huawei Wang ◽

Yi Cao

Keyword(s):

Data Reduction ◽

Large Scale ◽

Reduction Approach

Download Full-text

Large-scale distributed and scalable SOM-based architecture for high-dimensional data reduction

AI for Emerging Verticals: Human-robot computing, sensing and networking ◽

10.1049/pbpc034e_ch16 ◽

2020 ◽

pp. 315-336

Keyword(s):

Data Reduction ◽

Large Scale ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Efficient data reduction for large-scale genetic mapping

Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15 ◽

10.1145/2808719.2808732 ◽

2015 ◽

Cited By ~ 1

Author(s):

Veronika Strnadová-Neeley ◽

Aydın Buluç ◽

Jarrod Chapman ◽

John R. Gilbert ◽

Joseph Gonzalez ◽

...

Keyword(s):

Genetic Mapping ◽

Data Reduction ◽

Large Scale ◽

Efficient Data

Download Full-text

Nika: software for two-dimensional data reduction

Journal of Applied Crystallography ◽

10.1107/s0021889812004037 ◽

2012 ◽

Vol 45 (2) ◽

pp. 324-328 ◽

Cited By ~ 495

Author(s):

Jan Ilavsky

Keyword(s):

Small Angle ◽

Data Reduction ◽

Large Scale ◽

Grazing Incidence ◽

Scattering Data ◽

Two Dimensional ◽

Saxs Data ◽

One Dimensional ◽

Area Detector ◽

X Ray Scattering

Nikais anIgor Pro-based package for correction, calibration and reduction of two-dimensional area-detector data into one-dimensional data (`lineouts'). It is free (although the user needs a paid license forIgor Pro), open source and highly flexible. While typically used for small-angle X-ray scattering (SAXS) data, it can also be used for grazing-incidence SAXS data, wide-angle diffraction data and even small-angle neutron scattering data. It has been widely available to the user community since about 2005, and it is currently used at the SAXS instruments of selected large-scale facilities as their main data reduction package. It is, however, also suitable for desktop instruments when the manufacturer's software is not available or appropriate. Since it is distributed as source code, it can be scrutinized, verified and modified by users to suit their needs.

Download Full-text

Wall Street and the Housing Bubble

The American Economic Review ◽

10.1257/aer.104.9.2797 ◽

2014 ◽

Vol 104 (9) ◽

pp. 2797-2829 ◽

Cited By ~ 105

Author(s):

Ing-Haw Cheng ◽

Sahil Raina ◽

Wei Xiong

Keyword(s):

Large Scale ◽

Housing Markets ◽

Housing Bubble ◽

Wall Street ◽

Transaction Data ◽

Midlevel Managers ◽

Average Person

We analyze whether midlevel managers in securitized finance were aware of a large-scale housing bubble and a looming crisis in 2004–2006 using their personal home transaction data. We find that the average person in our sample neither timed the market nor were cautious in their home transactions, and did not exhibit awareness of problems in overall housing markets. Certain groups of securitization agents were particularly aggressive in increasing their exposure to housing during this period, suggesting the need to expand the incentives-based view of the crisis to incorporate a role for beliefs. (JEL D14, D83, E32, E44, G01, G21, R31)

Download Full-text

Real-time data reduction at 100 Tbps: Challenge and opportunity for AI-based data reduction for next-generation large-scale nuclear physics collider experiment

10.2172/1833247 ◽

2021 ◽

Author(s):

Ming Liu ◽

Jin Huang ◽

Sandeep Miryala ◽

Yihui Ren

Keyword(s):

Real Time ◽

Data Reduction ◽

Nuclear Physics ◽

Large Scale ◽

Next Generation ◽

Time Data ◽

Real Time Data ◽

Collider Experiment

Download Full-text

Fuzzy Soft Matrices Entropy

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2018070104 ◽

2018 ◽

Vol 7 (3) ◽

pp. 56-75 ◽

Cited By ~ 1

Author(s):

Omdutt Sharma ◽

Pratiksha Tiwari ◽

Priti Gupta

Keyword(s):

Information Technology ◽

Data Reduction ◽

Large Scale ◽

Reduction Technique ◽

Soft Information ◽

Information Measure ◽

Fuzzy Soft Set ◽

Huge Amount ◽

Data Reduction Technique ◽

Extract Information

This article describes how information technology and internet together infused organizations with huge amount of data. Consequently, accumulating, storing, understanding and analyzing data at a large scale is equally important and complex. Out of this data not all is information data, in order to extract information, one needs to discard redundant, irrelevant and unnecessary data. This article aims to introduce a data reduction technique which will be useful to discard irrelevant data. Here in data-reduction, the authors have used fuzzy-soft set techniques, namely fuzzy-soft information matrixes. Further, they have introduced a new fuzzy-soft information measure of fuzzy-soft matrixes.

Download Full-text

Design patterns and software techniques for large-scale, open and reproducible data reduction

10.21504/10962/172169 ◽

2021 ◽

Author(s):

◽

Gijs Jan Molenaar

Keyword(s):

Deep Learning ◽

Data Reduction ◽

Large Scale ◽

Expert Knowledge ◽

Lessons Learned ◽

Computational Time ◽

Generative Adversarial Network ◽

Software Packages ◽

Starting Point ◽

Wide Range

The preparation for the construction of the Square Kilometre Array, and the introduction of its operational precursors, such as LOFAR and MeerKAT, mark the beginning of an exciting era for astronomy. Impressive new data containing valuable science just waiting for discovery is already being generated, and these devices will produce far more data than has ever been collected before. However, with every new data instrument, the data rates grow to unprecedented quantities of data, requiring novel new data-processing tools. In addition, creating science grade data from the raw data still requires significant expert knowledge for processing this data. The software used is often developed by a scientist who lacks proper training in software development skills, resulting in the software not progressing beyond a prototype stage in quality. In the first chapter, we explore various organisational and technical approaches to address these issues by providing a historical overview of the development of radioastronomy pipelines since the inception of the field in the 1940s. In that, the steps required to create a radio image are investigated. We used the lessons-learned to identify patterns in the challenges experienced, and the solutions created to address these over the years. The second chapter describes the mathematical foundations that are essential for radio imaging. In the third chapter, we discuss the production of the KERN Linux distribution, which is a set of software packages containing most radio astronomy software currently in use. Considerable effort was put into making sure that the contained software installs appropriately, all items next to one other on the same system. Where required and possible, bugs and portability fixes were solved and reported with the upstream maintainers. The KERN project also has a website, and issue tracker, where users can report bugs and maintainers can coordinate the packaging effort and new releases. The software packages can be used inside Docker and Singularity containers, enabling the installation of these packages on a wide variety of platforms. In the fourth and fifth chapters, we discuss methods and frameworks for combining the available data reduction tools into recomposable pipelines and introduce the Kliko specification and software. This framework was created to enable end-user astronomers to chain and containerise operations of software in KERN packages. Next, we discuss the Common Workflow Language (CommonWL), a similar but more advanced and mature pipeline framework invented by bio-informatics scientists. CommonWL is supported by a wide range of tools already; among other schedulers, visualisers and editors. Consequently, when a pipeline is made with CommonWL, it can be deployed and manipulated with a wide range of tools. In the final chapter, we attempt something unconventional, applying a generative adversarial network based on deep learning techniques to perform the task of sky brightness reconstruction. Since deep learning methods often require a large number of training samples, we constructed a CommonWL simulation pipeline for creating dirty images and corresponding sky models. This simulated dataset has been made publicly available as the ASTRODECONV2019 dataset. It is shown that this method is useful to perform the restoration and matches the performance of a single clean cycle. In addition, we incorporated domain knowledge by adding the point spread function to the network and by utilising a custom loss function during training. Although it was not possible to improve the cleaning performance of commonly used existing tools, the computational time performance of the approach looks very promising. We suggest that a smaller scope should be the starting point for further studies and optimising of the training of the neural network could produce the desired results.

Download Full-text

Large-scale support vector machine classification with redundant data reduction

Neurocomputing ◽

10.1016/j.neucom.2014.10.102 ◽

2016 ◽

Vol 172 ◽

pp. 189-197 ◽

Cited By ~ 17

Author(s):

Xiang-Jun Shen ◽

Lei Mu ◽

Zhen Li ◽

Hao-Xiang Wu ◽

Jian-Ping Gou ◽

...

Keyword(s):

Support Vector Machine ◽

Data Reduction ◽

Large Scale ◽

Support Vector ◽

Support Vector Machine Classification ◽

Redundant Data

Download Full-text

Data Reduction and Dynamic p-y Curves of Helical Piles from Large-Scale Shake Table Tests

Journal of Geotechnical and Geoenvironmental Engineering ◽

10.1061/(asce)gt.1943-5606.0002146 ◽

2019 ◽

Vol 145 (10) ◽

pp. 04019075 ◽

Cited By ~ 1

Author(s):

M. K. ElSawy ◽

M. H. El Naggar ◽

A. B. Cerato ◽

A. W. Elgamal

Keyword(s):

Data Reduction ◽

Large Scale ◽

Shake Table ◽

Shake Table Tests ◽

Helical Piles

Download Full-text