Frontiers in Big Data
Latest Publications


TOTAL DOCUMENTS

234
(FIVE YEARS 228)

H-INDEX

5
(FIVE YEARS 5)

Published By Frontiers Media Sa

2624-909x

2022 ◽  
Vol 4 ◽  
Author(s):  
Teng Guo ◽  
Xiaomei Bai ◽  
Xue Tian ◽  
Selena Firmin ◽  
Feng Xia

Anomalies in education affect the personal careers of students and universities' retention rates. Understanding the laws behind educational anomalies promotes the development of individual students and improves the overall quality of education. However, the inaccessibility of educational data hinders the development of the field. Previous research in this field used questionnaires, which are time- and cost-consuming and hardly applicable to large-scale student cohorts. With the popularity of educational management systems and the rise of online education during the prevalence of COVID-19, a large amount of educational data is available online and offline, providing an unprecedented opportunity to explore educational anomalies from a data-driven perspective. As an emerging field, educational anomaly analytics rapidly attracts scholars from a variety of fields, including education, psychology, sociology, and computer science. This paper intends to provide a comprehensive review of data-driven analytics of educational anomalies from a methodological standpoint. We focus on the following five types of research that received the most attention: course failure prediction, dropout prediction, mental health problems detection, prediction of difficulty in graduation, and prediction of difficulty in employment. Then, we discuss the challenges of current related research. This study aims to provide references for educational policymaking while promoting the development of educational anomaly analytics as a growing field.


2022 ◽  
Vol 4 ◽  
Author(s):  
Michael Rapp ◽  
Moritz Kulessa ◽  
Eneldo Loza Mencía ◽  
Johannes Fürnkranz

Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. As a first step toward the goal to support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an preliminary experimental study, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results show the potential of the proposed approach to find patterns that correlate with the reported infections and to identify indicators that are related to the respective diseases. It also motivates the need for additional measures to overcome practical limitations, such as the requirement to deal with noisy and unbalanced data, and demonstrates the importance of incorporating feedback of domain experts into the learning procedure.


2022 ◽  
Vol 4 ◽  
Author(s):  
Qasem Abu Al-Haija

With the prompt revolution and emergence of smart, self-reliant, and low-power devices, Internet of Things (IoT) has inconceivably expanded and impacted almost every real-life application. Nowadays, for example, machines and devices are now fully reliant on computer control and, instead, they have their own programmable interfaces, such as cars, unmanned aerial vehicles (UAVs), and medical devices. With this increased use of IoT, attack capabilities have increased in response, which became imperative that new methods for securing these systems be developed to detect attacks launched against IoT devices and gateways. These attacks are usually aimed at accessing, changing, or destroying sensitive information; extorting money from users; or interrupting normal business processes. In this research, we present new efficient and generic top-down architecture for intrusion detection, and classification in IoT networks using non-traditional machine learning is proposed in this article. The proposed architecture can be customized and used for intrusion detection/classification incorporating any IoT cyber-attack datasets, such as CICIDS Dataset, MQTT dataset, and others. Specifically, the proposed system is composed of three subsystems: feature engineering (FE) subsystem, feature learning (FL) subsystem, and detection and classification (DC) subsystem. All subsystems have been thoroughly described and analyzed in this article. Accordingly, the proposed architecture employs deep learning models to enable the detection of slightly mutated attacks of IoT networking with high detection/classification accuracy for the IoT traffic obtained from either real-time system or a pre-collected dataset. Since this work employs the system engineering (SE) techniques, the machine learning technology, the cybersecurity of IoT systems field, and the collective corporation of the three fields have successfully yielded a systematic engineered system that can be implemented with high-performance trajectories.


2022 ◽  
Vol 4 ◽  
Author(s):  
Yijun Tian ◽  
Chuxu Zhang ◽  
Ronald Metoyer ◽  
Nitesh V. Chawla

Recipe recommendation systems play an important role in helping people find recipes that are of their interest and fit their eating habits. Unlike what has been developed for recommending recipes using content-based or collaborative filtering approaches, the relational information among users, recipes, and food items is less explored. In this paper, we leverage the relational information into recipe recommendation and propose a graph learning approach to solve it. In particular, we propose HGAT, a novel hierarchical graph attention network for recipe recommendation. The proposed model can capture user history behavior, recipe content, and relational information through several neural network modules, including type-specific transformation, node-level attention, and relation-level attention. We further introduce a ranking-based objective function to optimize the model. Thorough experiments demonstrate that HGAT outperforms numerous baseline methods.


2022 ◽  
Vol 4 ◽  
Author(s):  
Sandipan Sikdar ◽  
Rachneet Sachdeva ◽  
Johannes Wachs ◽  
Florian Lemmerich ◽  
Markus Strohmaier

This work quantifies the effects of signaling gender through gender specific user names, on the success of reviews written on the popular amazon.com shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed next to products. Differences in reviews, perceived—consciously or unconsciously—with respect to gender signals, can lead to crucial biases in determining what content and perspectives are represented among top reviews. To investigate this, we extract signals of author gender from user names to select reviews where the author’s likely gender can be inferred. Using reviews authored by these gender-signaling authors, we train a deep learning classifier to quantify the gendered writing style (i.e., gendered performance) of reviews written by authors who do not send clear gender signals via their user name. We contrast the effects of gender signaling and performance on the review helpfulness ratings using matching experiments. This is aimed at understanding if an advantage is to be gained by (not) signaling one’s gender when posting reviews. While we find no general trend that gendered signals or performances influence overall review success, we find strong context-specific effects. For example, reviews in product categories such as Electronics or Computers are perceived as less helpful when authors signal that they are likely woman, but are received as more helpful in categories such as Beauty or Clothing. In addition to these interesting findings, we believe this general chain of tools could be deployed across various social media platforms.


2022 ◽  
Vol 4 ◽  
Author(s):  
Alessandro Di Girolamo ◽  
Federica Legger ◽  
Panos Paparrigopoulos ◽  
Jaroslava Schovancová ◽  
Thomas Beermann ◽  
...  

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, the Operational Intelligence project aims at increasing the level of automation in computing operations and reducing human interventions. The distributed computing systems currently deployed by the LHC experiments have proven to be mature and capable of meeting the experimental goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters, and operational teams is needed to efficiently manage such heterogenous infrastructures. Under the scope of the Operational Intelligence project, experts from several areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. In this community study contribution, we report on the development of a suite of operational intelligence services to cover various use cases: workload management, data management, and site operations.


2022 ◽  
Vol 4 ◽  
Author(s):  
Ying-Ying Zhang ◽  
Teng-Zhong Rong ◽  
Man-Man Li

For the normal model with a known mean, the Bayes estimation of the variance parameter under the conjugate prior is studied in Lehmann and Casella (1998) and Mao and Tang (2012). However, they only calculate the Bayes estimator with respect to a conjugate prior under the squared error loss function. Zhang (2017) calculates the Bayes estimator of the variance parameter of the normal model with a known mean with respect to the conjugate prior under Stein’s loss function which penalizes gross overestimation and gross underestimation equally, and the corresponding Posterior Expected Stein’s Loss (PESL). Motivated by their works, we have calculated the Bayes estimators of the variance parameter with respect to the noninformative (Jeffreys’s, reference, and matching) priors under Stein’s loss function, and the corresponding PESLs. Moreover, we have calculated the Bayes estimators of the scale parameter with respect to the conjugate and noninformative priors under Stein’s loss function, and the corresponding PESLs. The quantities (prior, posterior, three posterior expectations, two Bayes estimators, and two PESLs) and expressions of the variance and scale parameters of the model for the conjugate and noninformative priors are summarized in two tables. After that, the numerical simulations are carried out to exemplify the theoretical findings. Finally, we calculate the Bayes estimators and the PESLs of the variance and scale parameters of the S&P 500 monthly simple returns for the conjugate and noninformative priors.


2021 ◽  
Vol 4 ◽  
Author(s):  
Cornelia Herbert ◽  
Verena Marschin ◽  
Benjamin Erb ◽  
Dominik Meißner ◽  
Maria Aufheimer ◽  
...  

Digital interactions via the internet have become the norm rather than the exception in our global society. Concerns have been raised about human-centered privacy and the often unreflected self-disclosure behavior of internet users. This study on human-centered privacy follows two major aims: first, investigate the willingness of university students (as digital natives) to disclose private data and information about their person, social and academic life, their mental health as well as their health behavior habits, when taking part as a volunteer in a scientific online survey. Second, examine to what extent the participants’ self-disclosure behavior can be modulated by experimental induction of privacy awareness (PA) or trust in privacy (TIP) or a combination of both (PA and TIP). In addition, the role of human factors such as personality traits, gender or mental health (e.g., self-reported depressive symptoms) on self-disclosure behavior was explored. Participants were randomly assigned to four experimental groups. In group A (n = 50, 7 males), privacy awareness (PA) was induced implicitly by the inclusion of privacy concern items. In group B (n = 43, 6 males), trust in privacy (TIP) was experimentally induced by buzzwords and by visual TIP primes promising safe data storage. Group C (n = 79, 12 males) received both, PA and TIP induction, while group D (n = 55, 9 males) served as control group. Participants had the choice to answer the survey items by agreeing to one of a number of possible answers including the options to refrain from self-disclosure by choosing the response options “don’t know” or “no answer.” Self-disclosure among participants was high irrespective of experimental group and irrespective of psychological domains of the information provided. The results of this study suggest that willingness of volunteers to self-disclose private data in a scientific online study cannot simply be overruled or changed by any of the chosen experimental privacy manipulations. The present results extend the previous literature on human-centered privacy and despite limitations can give important insights into self-disclosure behavior of young people and the privacy paradox.


Sign in / Sign up

Export Citation Format

Share Document