Novel Framework for Quality Retention on Educational Big Data

2020 ◽  
Vol 17 (9) ◽  
pp. 4248-4254
Author(s):  
Ganeshayya Shidaganti ◽  
Prakash Sheelvanthmath

Adoption of big data concept towards managing increasing data from the educational sector will have various benefits in terms of knowledge sharing process. The existing research approaches towards managing educational data in terms of high end analytics is either missing or is quite narrowed that doesn’t encapsulate the concept of big data implementation with practical problem. Therefore, this paper presents a discussion of a distributed framework that is capable of performing transformation of raw educational data arriving from distributed sources followed by applying a novel mining approach in order to ensure the final data with highest quality. The term quality interprets as a dynamic educational big data to be highly structured and free from any form of artifacts. Result analysis shows that proposed system offer faster transformation time and better data purity over synthetic educational big data.

Author(s):  
Aakriti Shukla ◽  
◽  
Dr Damodar Prasad Tiwari ◽  

Dimension reduction or feature selection is thought to be the backbone of big data applications in order to improve performance. Many scholars have shifted their attention in recent years to data science and analysis for real-time applications using big data integration. It takes a long time for humans to interact with big data. As a result, while handling high workload in a distributed system, it is necessary to make feature selection elastic and scalable. In this study, a survey of alternative optimizing techniques for feature selection are presented, as well as an analytical result analysis of their limits. This study contributes to the development of a method for improving the efficiency of feature selection in big complicated data sets.


Author(s):  
Nancy Victor ◽  
Daphne Lopez

Data privacy plays a noteworthy part in today's digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter.


2020 ◽  
Vol 11 (10) ◽  
pp. 32-51

Virtual Community (VC) is regarded as the best platform for professionals in various fields to share their expertise and knowledge. Since the escalation of web 2.0 and the internet within the last decade and the booming interest in big data and expansion of industry 4.0, VC is deemed as an ideal proxy for practitioners to share and earned instant knowledge that can be implemented within business activities and day to day application. Despite this emerging interest, there has been no comprehensive study on the overall antecedents of KS in VC. Applying for a systematic review, a total of 68 relevant articles that discusses knowledge sharing (KS) via VC are evaluated. Several central themes of theories applied in this field within the literature are discussed on its importance and relevance. Important antecedents are also reviewed on its practicality and implementation in understanding the role of KS in VC. The implication of this review would benefit stakeholders in maintaining the sustainability of VC as the platform for a knowledge-based society.


Algorithms ◽  
2019 ◽  
Vol 12 (8) ◽  
pp. 166
Author(s):  
Md. Anisuzzaman Siddique ◽  
Hao Tian ◽  
Mahboob Qaosar ◽  
Yasuhiko Morimoto

The skyline query and its variant queries are useful functions in the early stages of a knowledge-discovery processes. The skyline query and its variant queries select a set of important objects, which are better than other common objects in the dataset. In order to handle big data, such knowledge-discovery queries must be computed in parallel distributed environments. In this paper, we consider an efficient parallel algorithm for the “K-skyband query” and the “top-k dominating query”, which are popular variants of skyline query. We propose a method for computing both queries simultaneously in a parallel distributed framework called MapReduce, which is a popular framework for processing “big data” problems. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithm on both real and synthetic datasets.


2020 ◽  
Vol 34 (5) ◽  
pp. 845-858
Author(s):  
Johannes C. Eichstaedt ◽  
Aaron C. Weidman

Personality psychologists are increasingly documenting dynamic, within–person processes. Big data methodologies can augment this endeavour by allowing for the collection of naturalistic and personality–relevant digital traces from online environments. Whereas big data methods have primarily been used to catalogue static personality dimensions, here we present a case study in how they can be used to track dynamic fluctuations in psychological states. We apply a text–based, machine learning prediction model to Facebook status updates to compute weekly trajectories of emotional valence and arousal. We train this model on 2895 human–annotated Facebook statuses and apply the resulting model to 303 575 Facebook statuses posted by 640 US Facebook users who had previously self–reported their Big Five traits, yielding an average of 28 weekly estimates per user. We examine the correlations between model–predicted emotion and self–reported personality, providing a test of the robustness of these links when using weekly aggregated data, rather than momentary data as in prior work. We further present dynamic visualizations of weekly valence and arousal for every user, while making the final data set of 17 937 weeks openly available. We discuss the strengths and drawbacks of this method in the context of personality psychology's evolution into a dynamic science. © 2020 European Association of Personality Psychology


2017 ◽  
Vol 27 (4) ◽  
pp. 737-748 ◽  
Author(s):  
Abraham Itzhak Weinberg ◽  
Mark Last

AbstractWhen running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.


Author(s):  
V. A. Ayma ◽  
R. S. Ferreira ◽  
P. Happ ◽  
D. Oliveira ◽  
R. Feitosa ◽  
...  

Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of <i>InterIMAGE Cloud Platform (ICP)</i>, which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named <i>ICP: Data Mining Package</i>, is able to perform supervised classification procedures on huge amounts of data, usually referred as <i>big data</i>, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.


Sign in / Sign up

Export Citation Format

Share Document