scholarly journals A novel approach for big data classification based on hybrid parallel dimensionality reduction using spark cluster

2019 ◽  
Vol 20 (4) ◽  
Author(s):  
Ahmed Hussein Ali ◽  
Mahmood Zaki Abdullah

The big data concept has elicited studies on how to accurately and efficiently extract valuable information from such huge dataset. The major problem during big data mining is data dimensionality due to a large number of dimensions in such datasets. This major consequence of high data dimensionality is that it affects the accuracy of machine learning (ML) classifiers; it also results in time wastage due to the presence of several redundant features in the dataset. This problem can be possibly solved using a fast feature reduction method. Hence, this study presents a fast HP-PL which is a new hybrid parallel feature reduction framework that utilizes spark to facilitate feature reduction on shared/distributed-memory clusters. The evaluation of the proposed HP-PL on KDD99 dataset showed the algorithm to be significantly faster than the conventional feature reduction techniques. The proposed technique required 1 minute to select 4 dataset features from over 79 features and 3,000,000 samples on a 3-node cluster (total of 21 cores). For the comparative algorithm, more than 2 hours was required to achieve the same feat. In the proposed system, Hadoop’s distributed file system (HDFS) was used to achieve distributed storage while Apache Spark was used as the computing engine. The model development was based on a parallel model with full consideration of the high performance and throughput of distributed computing. Conclusively, the proposed HP-PL method can achieve good accuracy with less memory and time compared to the conventional methods of feature reduction. This tool can be publicly accessed at https://github.com/ahmed/Fast-HP-PL.

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Saad Ahmed Dheyab ◽  
Mohammed Najm Abdullah ◽  
Buthainah Fahran Abed

AbstractThe analysis and processing of big data are one of the most important challenges that researchers are working on to find the best approaches to handle it with high performance, low cost and high accuracy. In this paper, a novel approach for big data processing and management was proposed that differed from the existing ones; the proposed method employs not only the memory space to reads and handle big data, it also uses space of memory-mapped extended from memory storage. From a methodological viewpoint, the novelty of this paper is the segmentation stage of big data using memory mapping and broadcasting all segments to a number of processors using a parallel message passing interface. From an application viewpoint, the paper presents a high-performance approach based on a homogenous network which works parallelly to encrypt-decrypt big data using AES algorithm. This approach can be done on Windows Operating System using .NET libraries.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2039 ◽  
Author(s):  
Hwajeong Seo ◽  
Hyeokdong Kwon ◽  
Yongbeen Kwon ◽  
Kyungho Kim ◽  
Seungju Choi ◽  
...  

In this paper, we optimized Number Theoretic Transform (NTT) and random sampling operations on low-end 8-bit AVR microcontrollers. We focused on the optimized modular multiplication with secure countermeasure (i.e., constant timing), which ensures high performance and prevents timing attack and simple power analysis. In particular, we presented combined Look-Up Table (LUT)-based fast reduction techniques in a regular fashion. This novel approach only requires two times of LUT access to perform the whole modular reduction routine. The implementation is carefully written in assembly language, which reduces the number of memory access and function call routines. With LUT-based optimization techniques, proposed NTT implementations outperform the previous best results by 9.0% and 14.6% for 128-bit security level and 256-bit security level, respectively. Furthermore, we adopted the most optimized AES software implementation to improve the performance of pseudo random number generation for random sampling operation. The encryption of AES-256 counter (CTR) mode used for random number generator requires only 3184 clock cycles for 128-bit data input, which is 9.5% faster than previous state-of-art results. Finally, proposed methods are applied to the whole process of Ring-LWE key scheduling and encryption operations, which require only 524,211 and 659,603 clock cycles for 128-bit security level, respectively. For the key generation of 256-bit security level, 1,325,171 and 1,775,475 clock cycles are required for H/W and S/W AES-based implementations, respectively. For the encryption of 256-bit security level, 1,430,601 and 2,042,474 clock cycles are required for H/W and S/W AES-based implementations, respectively.


Author(s):  
Shangzhu Jin ◽  
Jun Peng

Currently, big data and its applications have become emergent topics. To deal with the uncertainty in data sets, fuzzy system-based models were explored and stand out for many applications. However, when a given observation has no overlap with antecedent values, no rule can be invoked, or even the invoked rules with missing values in classical fuzzy inference can also appear in big data environment, and therefore, no consequence can be derived. Fortunately, fuzzy rule interpolation techniques can support inference in such cases. Combining traditional fuzzy reasoning technique and fuzzy interpolation method may promote the accuracy of inference conclusion. Therefore, in this chapter, an initial investigation into the framework of MapReduce with dynamic fuzzy inference/interpolation for big data applications (BigData-DFRI) is reported. The results of an experimental investigation of this method are represented, demonstrating the potential and efficacy of the proposed approach.


Author(s):  
Mohammed R. Elkobaisi ◽  
Fadi Al Machot

AbstractThe use of IoT-based Emotion Recognition (ER) systems is in increasing demand in many domains such as active and assisted living (AAL), health care and industry. Combining the emotion and the context in a unified system could enhance the human support scope, but it is currently a challenging task due to the lack of a common interface that is capable to provide such a combination. In this sense, we aim at providing a novel approach based on a modeling language that can be used even by care-givers or non-experts to model human emotion w.r.t. context for human support services. The proposed modeling approach is based on Domain-Specific Modeling Language (DSML) which helps to integrate different IoT data sources in AAL environment. Consequently, it provides a conceptual support level related to the current emotional states of the observed subject. For the evaluation, we show the evaluation of the well-validated System Usability Score (SUS) to prove that the proposed modeling language achieves high performance in terms of usability and learn-ability metrics. Furthermore, we evaluate the performance at runtime of the model instantiation by measuring the execution time using well-known IoT services.


Author(s):  
Denys Rozumnyi ◽  
Jan Kotera ◽  
Filip Šroubek ◽  
Jiří Matas

AbstractObjects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects travel a considerable distance during exposure time of a single frame, and therefore, their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by general trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. By postprocessing, non-causal Tracking by Deblatting estimates continuous, complete, and accurate object trajectories for the whole sequence. Tracked objects are precisely localized with higher temporal resolution than by conventional trackers. Energy minimization by dynamic programming is used to detect abrupt changes of motion, called bounces. High-order polynomials are then fitted to smooth trajectory segments between bounces. The output is a continuous trajectory function that assigns location for every real-valued time stamp from zero to the number of frames. The proposed algorithm was evaluated on a newly created dataset of videos from a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms the baselines both in recall and trajectory accuracy. Additionally, we show that from the trajectory function precise physical calculations are possible, such as radius, gravity, and sub-frame object velocity. Velocity estimation is compared to the high-speed camera measurements and radars. Results show high performance of the proposed method in terms of Trajectory-IoU, recall, and velocity estimation.


Sign in / Sign up

Export Citation Format

Share Document