A Tour of Lattice-Based Skyline Algorithms

Handbook of Research on Investigations in Artificial Life Research and Development - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5396-0.ch006 ◽

2018 ◽

pp. 96-122

Author(s):

Markus Endres ◽

Lena Rudenko

Keyword(s):

Real Time ◽

Data Streams ◽

High Performance ◽

State Of The Art ◽

Experimental Results ◽

Lattice Structures ◽

Skyline Query ◽

Basic Concepts ◽

Generic Index

A skyline query retrieves all objects in a dataset that are not dominated by other objects according to some given criteria. There exist many skyline algorithms which can be classified into generic, index-based, and lattice-based algorithms. This chapter takes a tour through lattice-based skyline algorithms. It summarizes the basic concepts and properties, presents high-performance parallel approaches, shows how one overcomes the low-cardinality restriction of lattice structures, and finally presents an application on data streams for real-time skyline computation. Experimental results on synthetic and real datasets show that lattice-based algorithms outperform state-of-the-art skyline techniques, and additionally have a linear runtime complexity.

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text

An Effective Algorithm for Denoising Salt-And-Pepper Noise in Real-Time

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2020/v6i130149 ◽

2020 ◽

pp. 14-27

Author(s):

Obed Appiah ◽

James Benjamin Hayfron-Acquah ◽

Michael Asante

Keyword(s):

Real Time ◽

High Performance ◽

State Of The Art ◽

Median Filter ◽

Image Data ◽

Poor Performance ◽

Relevant Information ◽

Salt And Pepper Noise ◽

Running Time ◽

Salt And Pepper

For computer vision systems to effectively perform diagnoses, identification, tracking, monitoring and surveillance, image data must be devoid of noise. Various types of noises such as Salt-and-pepper or Impulse, Gaussian, Shot, Quantization, Anisotropic, and Periodic noises corrupts images making it difficult to extract relevant information from them. This has led to a lot of proposed algorithms to help fix the problem. Among the proposed algorithms, the median filter has been successful in handling salt-and-pepper noise and preserving edges in images. However, its moderate to high running time and poor performance when images are corrupted with high densities of noise, has led to various proposed modifications of the median filter. The challenge observed with all these modifications is the trade-off between efficient running time and quality of denoised images. This paper proposes an algorithm that delivers quality denoised images in low running time. Two state-of-the-art algorithms are combined into one and a technique called Mid-Value-Decision-Median introduced into the proposed algorithm to deliver high quality denoised images in real-time. The proposed algorithm, High-Performance Modified Decision Based Median Filter (HPMDBMF) runs about 200 times faster than the state-of-the-art Modified Decision Based Median Filter (MDBMF) and still generate equivalent output.

Download Full-text

Enabling Packet Classification with Low Update Latency for SDN Switch on FPGA

Sustainability ◽

10.3390/su12083068 ◽

2020 ◽

Vol 12 (8) ◽

pp. 3068 ◽

Cited By ~ 2

Author(s):

Chenglong Li ◽

Tao Li ◽

Junnan Li ◽

Zilin Shi ◽

Baosheng Wang

Keyword(s):

Real Time ◽

Field Programmable Gate Array ◽

High Performance ◽

Recursive Algorithm ◽

Packet Classification ◽

Experimental Results ◽

The Core ◽

Core Function ◽

Field Programmable ◽

Bit Vector

Field Programmable Gate Array (FPGA) is widely used in real-time network processing such as Software-Defined Networking (SDN) switch due to high performance and programmability. Bit-Vector (BV)-based approaches can implement high-performance multi-field packet classification, on FPGA, which is the core function of the SDN switch. However, the SDN switch requires not only high performance but also low update latency to avoid controller failure. Unfortunately, the update latency of BV-based approaches is inversely proportional to the number of rules, which means can hardly support the SDN switch effectively. It is reasonable to split the ruleset into sub-rulesets that can be performed in parallel, thereby reducing update latency. We thus present SplitBV for the efficient update by using several distinguishable exact-bits to split the ruleset. SplitBV consists of a constrained recursive algorithm for selecting the bit positions that can minimize the latency and a hybrid lookup pipeline. It can achieve a significant reduction in update latency with negligible memory growth and comparable high performance. We implement SplitBV and evaluate its performance by simulation and FPGA prototype. Experimental results show that our approach can reduce 73% and 36% update latency on average for synthetic 5-tuple rules and OpenFlow rules respectively.

Download Full-text

In the land of data streams where synopses are missing, one framework to bring them all

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467871 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1818-1831

Author(s):

Rudi Poepsel-Lemaitre ◽

Martin Kiefer ◽

Joscha von Hein ◽

Jorge-Arnulfo Quiané-Ruiz ◽

Volker Markl

Keyword(s):

Data Analysis ◽

Real Time ◽

Data Streams ◽

High Performance ◽

Stream Processing ◽

Divide And Conquer ◽

Time Data ◽

Real Time Data ◽

Internal Processing ◽

Aggregate Functions

In pursuit of real-time data analysis, approximate summarization structures, i.e., synopses, have gained importance over the years. However, existing stream processing systems, such as Flink, Spark, and Storm, do not support synopses as first class citizens, i.e., as pipeline operators. Synopses' implementation is upon users. This is mainly because of the diversity of synopses, which makes a unified implementation difficult. We present Condor, a framework that supports synopses as first class citizens. Condor facilitates the specification and processing of synopsis-based streaming jobs while hiding all internal processing details. Condor's key component is its model that represents synopses as a particular case of windowed aggregate functions. An inherent divide and conquer strategy allows Condor to efficiently distribute the computation, allowing for high-performance and linear scalability. Our evaluation shows that Condor outperforms existing approaches by up to a factor of 75x and that it scales linearly with the number of cores.

Download Full-text

Real-time High Performance Anomaly Detection over Data Streams

Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems - DEBS '17 ◽

10.1145/3093742.3095102 ◽

2017 ◽

Cited By ~ 7

Author(s):

Dimitrije Jankov ◽

Sourav Sikdar ◽

Rohan Mukherjee ◽

Kia Teymourian ◽

Chris Jermaine

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Data Streams ◽

High Performance ◽

Performance Anomaly

Download Full-text

Scalable, High-Performance, and Generalized Subtree Data Anonymization Approach for Apache Spark

Electronics ◽

10.3390/electronics10050589 ◽

2021 ◽

Vol 10 (5) ◽

pp. 589

Author(s):

Sibghat Ullah Bazai ◽

Julian Jang-Jaccard ◽

Hooman Alavizadeh

Keyword(s):

High Performance ◽

State Of The Art ◽

Privacy Preserving ◽

Experimental Results ◽

Apache Spark ◽

Memory Usage ◽

Top Down ◽

Data Utility ◽

Data Anonymization ◽

Algorithm Implementation

Data anonymization strategies such as subtree generalization have been hailed as techniques that provide a more efficient generalization strategy compared to full-tree generalization counterparts. Many subtree-based generalizations strategies (e.g., top-down, bottom-up, and hybrid) have been implemented on the MapReduce platform to take advantage of scalability and parallelism. However, MapReduce inherent lack support for iteration intensive algorithm implementation such as subtree generalization. This paper proposes Distributed Dataset (RDD)-based implementation for a subtree-based data anonymization technique for Apache Spark to address the issues associated with MapReduce-based counterparts. We describe our RDDs-based approach that offers effective partition management, improved memory usage that uses cache for frequently referenced intermediate values, and enhanced iteration support. Our experimental results provide high performance compared to the existing state-of-the-art privacy preserving approaches and ensure data utility and privacy levels required for any competitive data anonymization techniques.

Download Full-text

Adaptive-Neuro-Fuzzy-Based Information Fusion for the Attitude Prediction of TBMs

Sensors ◽

10.3390/s21010061 ◽

2020 ◽

Vol 21 (1) ◽

pp. 61

Author(s):

Boning He ◽

Guoli Zhu ◽

Lei Han ◽

Dailin Zhang

Keyword(s):

Measurement Error ◽

Real Time ◽

Information Fusion ◽

High Performance ◽

Fuzzy Inference ◽

Experimental Results ◽

Inference System ◽

Neuro Fuzzy ◽

Rate Information ◽

Attitude Prediction

In a tunneling boring machine (TBM), to obtain the attitude in real time is very important for a driver. However, the current laser targeting system has a large delay before obtaining the attitude. So, an adaptive-neuro-fuzzy-based information fusion method is proposed to predict the attitude of a laser targeting system in real time. In the proposed method, a dual-rate information fusion is used to fuse the information of a laser targeting system and a two-axis inclinometer, and then obtain roll and pitch angles with a higher rate and provide a smoother attitude prediction. Considering that a measurement error exists, the adaptive neuro-fuzzy inference system (ANFIS) is proposed to model the measurement error, and then the ANFIS-based model is combined with the dual-rate information fusion to achieve high performance. Experimental results show the ANFIS-based information fusion can provide higher real-time performance and accuracy of the attitude prediction. Experimental results also verify that the ANFIS-based information fusion can solve the problem of the laser targeting system losing signals.

Download Full-text

Low-Cost and Programmable CRC Implementation based on FPGA (Extended Version)

10.36227/techrxiv.12181494.v3 ◽

2020 ◽

Author(s):

Huan Liu ◽

Zhiliang Qiu ◽

Weitao Pan ◽

Jun Li ◽

Ling Zheng ◽

...

Keyword(s):

Resource Utilization ◽

Error Detection ◽

High Performance ◽

State Of The Art ◽

Low Cost ◽

Source Code ◽

Experimental Results ◽

Extended Version ◽

Cyclic Redundancy Check

Cyclic redundancy check (CRC) is a well-known error detection code that is widely used in Ethernet, PCIe, and other transmission protocols. The existing FPGA-based implementation solutions are faced with the problem of excessive resource utilization in high-performance scenarios. The padding zeros problem and the introduction of programmability further exacerbate this problem. In this brief, the stride-by-5 algorithm is proposed to achieve the optimal utilization of FPGA resources. The pipelining go back algorithm is proposed to solve the padding zeros problem. The method of reprogramming by HWICAP is proposed to realize programmability with a small and constant resource utilization. The experimental results show that the resource utilization of proposed non-segmented architecture is 80.7%-87.5% and 25.1%-46.2% lower than those of two state-of-the-art FPGA-based CRC implementations, and the proposed segmented architecture has a lower resource utilization by 81.7%-85.9% and 2.9%-20.8% compared wtih the two state-of-the-art architectures; meanwhile, the throughput and programmability are guaranteed. We made the source code available on GitHub.

Download Full-text

Low-Cost and Programmable CRC Implementation based on FPGA (Extended Version)

10.36227/techrxiv.12181494.v2 ◽

2020 ◽

Author(s):

Huan Liu ◽

Zhiliang Qiu ◽

Weitao Pan ◽

Jun Li ◽

Ling Zheng ◽

...

Keyword(s):

Resource Utilization ◽

Error Detection ◽

High Performance ◽

State Of The Art ◽

Low Cost ◽

Source Code ◽

Experimental Results ◽

Extended Version ◽

Cyclic Redundancy Check

Cyclic redundancy check (CRC) is a well-known error detection code that is widely used in Ethernet, PCIe, and other transmission protocols. The existing FPGA-based implementation solutions are faced with the problem of excessive resource utilization in high-performance scenarios. The padding zeros problem and the introduction of programmability further exacerbate this problem. In this brief, the stride-by-5 algorithm is proposed to achieve the optimal utilization of FPGA resources. The pipelining go back algorithm is proposed to solve the padding zeros problem. The method of reprogramming by HWICAP is proposed to realize programmability with a small and constant resource utilization. The experimental results show that the resource utilization of proposed non-segmented architecture is 84.1% and 37.6% lower than those of two state-of-the-art FPGA-based CRC implementations, and the proposed segmented architecture has a lower resource utilization by 83.9% and 8.9% compared wtih the two state-of-the-art architectures; meanwhile, the throughput and programmability are guaranteed. We made the source code available on GitHub.

Download Full-text

Target-Aspect-Sentiment Joint Detection for Aspect-Based Sentiment Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6447 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9122-9129

Author(s):

Hai Wan ◽

Yufei Yang ◽

Jianfeng Du ◽

Yanan Liu ◽

Kunxun Qi ◽

...

Keyword(s):

Sentiment Analysis ◽

High Performance ◽

State Of The Art ◽

Language Model ◽

The State ◽

Experimental Results ◽

Joint Detection ◽

Sentiment Detection ◽

Novel Method ◽

The Given

Aspect-based sentiment analysis (ABSA) aims to detect the targets (which are composed by continuous words), aspects and sentiment polarities in text. Published datasets from SemEval-2015 and SemEval-2016 reveal that a sentiment polarity depends on both the target and the aspect. However, most of the existing methods consider predicting sentiment polarities from either targets or aspects but not from both, thus they easily make wrong predictions on sentiment polarities. In particular, where the target is implicit, i.e., it does not appear in the given text, the methods predicting sentiment polarities from targets do not work. To tackle these limitations in ABSA, this paper proposes a novel method for target-aspect-sentiment joint detection. It relies on a pre-trained language model and can capture the dependence on both targets and aspects for sentiment prediction. Experimental results on the SemEval-2015 and SemEval-2016 restaurant datasets show that the proposed method achieves a high performance in detecting target-aspect-sentiment triples even for the implicit target cases; moreover, it even outperforms the state-of-the-art methods for those subtasks of target-aspect-sentiment detection that they are competent to.

Download Full-text