Towards Information Discovery On Large Scale Data: state-of-the-art

2021 ◽

Vol 9 (11) ◽

pp. 911-917

Author(s):

Surabhi Kumari

Keyword(s):

Large Scale ◽

State Of The Art ◽

System Perspective ◽

Large Scale Data ◽

Fundamental Cause ◽

Performance Limitation ◽

Computing Node ◽

Large Scale Data Processing ◽

And Performance ◽

Scale Data

Abstract: MPC (multi-party computation) is a comprehensive cryptographic concept that can be used to do computations while maintaining anonymity. MPC allows a group of people to work together on a function without revealing the plaintext's true input or output. Privacy-preserving voting, arithmetic calculation, and large-scale data processing are just a few of the applications of MPC. Each MPC party can run on a single computing node from a system perspective. Multiple parties' computing nodes could be homogenous or heterogeneous; nevertheless, MPC protocols' distributed workloads are always homogeneous (symmetric). We investigate the system performance of a representative MPC framework and a collection of MPC applications in this paper. On homogeneous and heterogeneous compute nodes, we describe the complete online calculation workflow of a state-of-the-art MPC protocol and examine the fundamental cause of its stall time and performance limitation. Keywords: Cloud Computing, IoT, MPC, Amazon Service, Virtualization.

Download Full-text

A Deep Multiview Active Learning for Large-Scale Image Classification

Mathematical Problems in Engineering ◽

10.1155/2020/6639503 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Tuozhong Yao ◽

Wenfeng Wang ◽

Yuhong Gu

Keyword(s):

Active Learning ◽

Large Scale ◽

State Of The Art ◽

Feature Representation ◽

Deep Convolutional Neural Networks ◽

Large Scale Data ◽

Version Space ◽

Potential Applications ◽

Effort Reduction ◽

Scale Data

Multiview active learning (MAL) is a technique which can achieve a large decrease in the size of the version space than traditional active learning and has great potential applications in large-scale data analysis. In this paper, we present a new deep multiview active learning (DMAL) framework which is the first to combine multiview active learning and deep learning for annotation effort reduction. In this framework, our approach advances the existing active learning methods in two aspects. First, we incorporate two different deep convolutional neural networks into active learning which uses multiview complementary information to improve the feature learnings. Second, through the properly designed framework, the feature representation and the classifier can be simultaneously updated with progressively annotated informative samples. The experiments with two challenging image datasets demonstrate that our proposed DMAL algorithm can achieve promising results than several state-of-the-art active learning algorithms.

Download Full-text

Glyfn: A Glyph-Aware Fusion Network for Distributed Chinese Event Detection

10.5121/csit.2021.110114 ◽

2021 ◽

Author(s):

Qi Zhai ◽

Zhigang Kan ◽

Linhui Feng ◽

Linbo Qiao ◽

Feng Liu

Keyword(s):

Event Detection ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Special Kind ◽

Detection Task ◽

Experimental Results ◽

Large Scale Data ◽

Unstructured Text ◽

Scale Data

Recently, Chinese event detection has attracted more and more attention. As a special kind of hieroglyphics, Chinese glyphs are semantically useful but still unexplored in this task. In this paper, we propose a novel Glyph-Aware Fusion Network, named GlyFN. It introduces the glyphs' information into the pre-trained language model representation. To obtain a better representation, we design a Vector Linear Fusion mechanism to fuse them. Specifically, it first utilizes a max-pooling to capture salient information. Then, we use the linear operation of vectors to retain unique information. Moreover, for large-scale unstructured text, we distribute the data into different clusters parallelly. Finally, we conduct extensive experiments on ACE2005 and large-scale data. Experimental results show that GlyFN obtains increases of 7.48(10.18%) and 6.17(8.7%) in the F1-score for trigger identification and classification over the state-of-the-art methods, respectively. Furthermore, the event detection task for large-scale unstructured text can be efficiently accomplished through distribution.

Download Full-text

Mask R-CNN with data augmentation for food detection and recognition

10.36227/techrxiv.11974362 ◽

2020 ◽

Author(s):

Than Le

Keyword(s):

Large Scale ◽

Data Augmentation ◽

State Of The Art ◽

Projective Representation ◽

Large Scale Data ◽

Food Detection ◽

Food Recognition ◽

Data Driven Approach ◽

Scale Data ◽

Detection And Recognition

In this paper, we focus on simple data-driven approach to solve deep learning based on implementing the Mask R-CNN module by analyzing deeper manipulation of datasets. We firstly approach to affine transformation and projective representation to data augmentation analysis in order to increasing large-scale data manually based on the state-of-the-art in views of computer vision. Then we evaluate our method concretely by connection our datasets by visualization data and completely in testing to many methods to understand intelligent data analysis in object detection and segmentation by using more than 5000 image according to many similar objects. As far as, it illustrated efficiency of small applications such as food recognition, grasp and manipulation in robotics<br>

Download Full-text

Optimizations for filter-based join algorithms in MapReduce

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201220 ◽

2021 ◽

pp. 1-18

Author(s):

Salahaldeen Rababa ◽

Amer Al-Badarneh

Keyword(s):

Cost Analysis ◽

Execution Time ◽

Large Scale ◽

Programming Model ◽

State Of The Art ◽

Total Execution Time ◽

Large Scale Data ◽

Heterogeneous Datasets ◽

Join Algorithms ◽

Scale Data

Large-scale datasets collected from heterogeneous sources often require a join operation to extract valuable information. MapReduce is an efficient programming model for processing large-scale data. However, it has some limitations in processing heterogeneous datasets. This is because of the large amount of redundant intermediate records that are transferred through the network. Several filtering techniques have been developed to improve the join performance, but they require multiple MapReduce jobs to process the input datasets. To address this issue, the adaptive filter-based join algorithms are presented in this paper. Specifically, three join algorithms are introduced to perform the processes of filters creation and redundant records elimination within a single MapReduce job. A cost analysis of the introduced join algorithms shows that the I/O cost is reduced compared to the state-of-the-art filter-based join algorithms. The performance of the join algorithms was evaluated in terms of the total execution time and the total amount of I/O data transferred. The experimental results show that the adaptive Bloom join, semi-adaptive intersection Bloom join, and adaptive intersection Bloom join decrease the total execution time by 30%, 25%, and 35%, respectively; and reduce the total amount of I/O data transferred by 18%, 25%, and 50%, respectively.

Download Full-text

A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data

Proceedings of the WICSA/ECSA 2012 Companion Volume on - WICSA/ECSA '12 ◽

10.1145/2361999.2362039 ◽

2012 ◽

Cited By ~ 8

Author(s):

Edmon Begoli

Keyword(s):

Data Analysis ◽

Knowledge Discovery ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Large Scale Data ◽

Short Survey ◽

Scale Data

Download Full-text

Mask R-CNN with data augmentation for food detection and recognition

10.36227/techrxiv.11974362.v1 ◽

2020 ◽

Author(s):

Than Le

Keyword(s):

Large Scale ◽

Data Augmentation ◽

State Of The Art ◽

Projective Representation ◽

Large Scale Data ◽

Food Detection ◽

Food Recognition ◽

Data Driven Approach ◽

Scale Data ◽

Detection And Recognition

In this paper, we focus on simple data-driven approach to solve deep learning based on implementing the Mask R-CNN module by analyzing deeper manipulation of datasets. We firstly approach to affine transformation and projective representation to data augmentation analysis in order to increasing large-scale data manually based on the state-of-the-art in views of computer vision. Then we evaluate our method concretely by connection our datasets by visualization data and completely in testing to many methods to understand intelligent data analysis in object detection and segmentation by using more than 5000 image according to many similar objects. As far as, it illustrated efficiency of small applications such as food recognition, grasp and manipulation in robotics<br>

Download Full-text