mapreduce framework
Recently Published Documents


TOTAL DOCUMENTS

379
(FIVE YEARS 84)

H-INDEX

23
(FIVE YEARS 3)

2021 ◽  
Author(s):  
SUDHANDRADEVI P

Abstract The growth of technology evaluation and the influence of smart gazettes, which have a very complex structure, the amount of data in an organization, E-Commerce, and ERP explodes. When data is processed as described, it becomes the engine of every individual. According to projections from 2025, social media, IoT, streaming data, and geodata will generate 80% of unstructured data, and there will be 4.8 billion tech enthusiasts. The most popular social media trend allows users to access publicly available data. Hackers are highly qualified in both the web space and the dark web, and the rise of complexity and digitization of this public access will cause loopholes in legislation. The major goal of this study is to gather information about the cyber vulnerability of electronic news. Data collection, text standardization, and feature extraction were all part of the initial step. In the second step, MapReduce was used to obtain demographic insights using a multi-layered categorization strategy. Cybercrime is classified using a classifier technique, and the model has a 53 percent accuracy rate. Phishing is a result of cyber weaknesses, and it has been discovered in a higher number of metropolitan cities. Men, rather than women, make up the majority of crime victims. Individuals should be made aware of secure access to websites and media, according to the findings of the study. People should be aware of cyber vulnerabilities, as well as cyber laws enacted under the IPC, the IT Act 2000, and CERT-In.


2021 ◽  
Vol 22 (4) ◽  
pp. 401-412
Author(s):  
Hrachya Astsatryan ◽  
Arthur Lalayan ◽  
Aram Kocharyan ◽  
Daniel Hagimont

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.


2021 ◽  
Vol 2 (2) ◽  
pp. 53-60
Author(s):  
Ajibade Lukuman Saheed ◽  
Abu Bakar Kamalrulnizam ◽  
Ahmed Aliyu ◽  
Tasneem Darwish

Processing huge and complex data to obtain useful information is challenging, even though several big data processing frameworks have been proposed and further enhanced. One of the prominent big data processing frameworks is MapReduce. The main concept of MapReduce framework relies on distributed and parallel processing. However, MapReduce framework is facing serious performance degradations due to the slow execution of certain tasks type called stragglers. Failing to handle stragglers causes delay and affects the overall job execution time. Meanwhile, several straggler reduction techniques have been proposed to improve the MapReduce performance. This study provides a comprehensive and qualitative review of the different existing straggler mitigation solutions. In addition, a taxonomy of the available straggler mitigation solutions is presented. Critical research issues and future research directions are identified and discussed to guide researchers and scholars


2021 ◽  

Abstract The full text of this preprint has been withdrawn by the authors as it was submitted and made public without the full consent of all the authors. Therefore, the authors do not wish this work to be cited as a reference. Questions should be directed to the corresponding author.


2021 ◽  
Author(s):  
Hemn Barzan Abdalla

Abstract The increasing demand for information and rapid growth of big data has dramatically increased textual data. The amount of different kinds of data has led to the overloading of information. For obtaining useful text information, the classification of texts is considered an imperative task. This paper develops a technique for text classification in big data using the MapReduce model. The goal is to design a hybrid optimization algorithm for classifying the text. Here, the pre-processing is done with the steaming process and stop word removal. In addition, the Extraction of imperative features is performed wherein SentiWordNet features, contextual features, and thematic features are generated. Furthermore, the selection of optimal features is performed using Tanimoto similarity. The Tanimoto similarity method estimates the similarity between the features and selects the relevant features with higher feature selection accuracy. After that, a deep residual network is utilized for dynamic text classification. The Adam algorithm trains the deep residual network. In addition, the dynamic learning is performed with the proposed Rider invasive weed optimization (RIWO)-based deep residual network along with fuzzy theory. The proposed RIWO algorithm combines Invasive weed optimization (IWO) and the Rider optimization algorithm (ROA). The method mentioned above is solved under the MapReduce framework. The proposed RIWO-based deep residual network outperformed other techniques with the highest True positive rate (TPR) of 85%, True negative rate (TNR) of 94%, and accuracy of 88.7%.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Philippe Fournier-Viger

In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The k -means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.


Sign in / Sign up

Export Citation Format

Share Document