scholarly journals Performance Evaluation of Policy-Based SQL Query Classification for Data-Privacy Compliance

Author(s):  
Peter K. Schwab ◽  
Jonas Röckl ◽  
Maximilian S. Langohr ◽  
Klaus Meyer-Wegener

AbstractData science must respect privacy in many situations. We have built a query repository with automatic SQL query classification according to data-privacy directives. It can intercept queries that violate the directives, since a JDBC proxy driver inserted between the end-users’ SQL tooling and the target data consults the repository for the compliance of each query. Still, this slows down query processing. This paper presents two optimizations implemented to increase classification performance and describes a measurement environment that allows quantifying the induced performance overhead. We present measurement results and show that our optimized implementation significantly reduces classification latency. The query metadata (QM) is stored in both relational and graph-based databases. Whereas query classification can be done in a few ms on average using relational QM, a graph-based classification is orders of magnitude more expensive at 137 ms on average. However, the graphs contain more precise information, and thus in some cases the final decision requires to check them, too. Our optimizations considerably reduce the number of graph-based classifications and, thus, decrease the latency to 0.35 ms in $$87\%$$ 87 % of the classification cases.

2021 ◽  
Vol 3 (6) ◽  
Author(s):  
César de Oliveira Ferreira Silva ◽  
Mariana Matulovic ◽  
Rodrigo Lilla Manzione

Abstract Groundwater governance uses modeling to support decision making. Therefore, data science techniques are essential. Specific difficulties arise because variables must be used that cannot be directly measured, such as aquifer recharge and groundwater flow. However, such techniques involve dealing with (often not very explicitly stated) ethical questions. To support groundwater governance, these ethical questions cannot be solved straightforward. In this study, we propose an approach called “open-minded roadmap” to guide data analytics and modeling for groundwater governance decision making. To frame the ethical questions, we use the concept of geoethical thinking, a method to combine geoscience-expertise and societal responsibility of the geoscientist. We present a case study in groundwater monitoring modeling experiment using data analytics methods in southeast Brazil. A model based on fuzzy logic (with high expert intervention) and three data-driven models (with low expert intervention) are tested and evaluated for aquifer recharge in watersheds. The roadmap approach consists of three issues: (a) data acquisition, (b) modeling and (c) the open-minded (geo)ethical attitude. The level of expert intervention in the modeling stage and model validation are discussed. A search for gaps in the model use is made, anticipating issues through the development of application scenarios, to reach a final decision. When the model is validated in one watershed and then extrapolated to neighboring watersheds, we found large asymmetries in the recharge estimatives. Hence, we can show that more information (data, expertise etc.) is needed to improve the models’ predictability-skill. In the resulting iterative approach, new questions will arise (as new information comes available), and therefore, steady recourse to the open-minded roadmap is recommended. Graphic abstract


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1714
Author(s):  
Mohamed Marey ◽  
Hala Mostafa

In this work, we propose a general framework to design a signal classification algorithm over time selective channels for wireless communications applications. We derive an upper bound on the maximum number of observation samples over which the channel response is an essential invariant. The proposed framework relies on dividing the received signal into blocks, and each of them has a length less than the mentioned bound. Then, these blocks are fed into a number of classifiers in a parallel fashion. A final decision is made through a well-designed combiner and detector. As a case study, we employ the proposed framework on a space-time block-code classification problem by developing two combiners and detectors. Monte Carlo simulations show that the proposed framework is capable of achieving excellent classification performance over time selective channels compared to the conventional algorithms.


2021 ◽  
Vol 11 (2) ◽  
pp. 807
Author(s):  
Llanos Tobarra ◽  
Alejandro Utrilla ◽  
Antonio Robles-Gómez ◽  
Rafael Pastor-Vargas ◽  
Roberto Hernández

The employment of modern technologies is widespread in our society, so the inclusion of practical activities for education has become essential and useful at the same time. These activities are more noticeable in Engineering, in areas such as cybersecurity, data science, artificial intelligence, etc. Additionally, these activities acquire even more relevance with a distance education methodology, as our case is. The inclusion of these practical activities has clear advantages, such as (1) promoting critical thinking and (2) improving students’ abilities and skills for their professional careers. There are several options, such as the use of remote and virtual laboratories, virtual reality and game-based platforms, among others. This work addresses the development of a new cloud game-based educational platform, which defines a modular and flexible architecture (using light containers). This architecture provides interactive and monitoring services and data storage in a transparent way. The platform uses gamification to integrate the game as part of the instructional process. The CyberScratch project is a particular implementation of this architecture focused on cybersecurity game-based activities. The data privacy management is a critical issue for these kinds of platforms, so the architecture is designed with this feature integrated in the platform components. To achieve this goal, we first focus on all the privacy aspects for the data generated by our cloud game-based platform, by considering the European legal context for data privacy following GDPR and ISO/IEC TR 20748-1:2016 recommendations for Learning Analytics (LA). Our second objective is to provide implementation guidelines for efficient data privacy management for our cloud game-based educative platform. All these contributions are not found in current related works. The CyberScratch project, which was approved by UNED for the year 2020, considers using the xAPI standard for data handling and services for the game editor, game engine and game monitor modules of CyberScratch. Therefore, apart from considering GDPR privacy and LA recommendations, our cloud game-based architecture covers all phases from game creation to the final users’ interactions with the game.


2021 ◽  
Author(s):  
Daisuke Matsuoka

Abstract Image data classification using machine learning is one of the effective methods for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance of data between the target class and the other classes. In order to build a highly accurate classification model, we held a data analysis competition to determine the best classification performance for two classes of cloud image data: tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the imbalanced classification performance. In particular, the best model out of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of low false alarm ratio.


2019 ◽  
Vol 11 (1) ◽  
pp. 36-40 ◽  
Author(s):  
Venky Shankar

AbstractBig data are taking center stage for decision-making in many retail organizations. Customer data on attitudes and behavior across channels, touchpoints, devices and platforms are often readily available and constantly recorded. These data are integrated from multiple sources and stored or warehoused, often in a cloud-based environment. Statistical, econometric and data science models are developed for enabling appropriate decisions. Computer algorithms and programs are created for these models. Machine learning based models, are particularly useful for learning from the data and making predictive decisions. These machine learning models form the backbone for the generation and development of AI-assisted decisions. In many cases, such decisions are automated using systems such as chatbots and robots.Of special interest are issues such as omnichannel shopping behavior, resource allocation across channels, the effects of the mobile channel and mobile apps on shopper behavior, dynamic pricing, data privacy and security. Research on these issues reveals several interesting insights on which retailers can build. To fully leverage big data in today’s retailing environment, CRM strategies must be location specific, time specific and channel specific in addition to being customer specific.


2019 ◽  
Vol 11 (3) ◽  
pp. 352 ◽  
Author(s):  
Baofeng Guo

To better classify remotely sensed hyperspectral imagery, we study hyperspectral signatures from a different view, in which the discriminatory information is divided as reflectance features and absorption features, respectively. Based on this categorization, we put forward an information fusion approach, where the reflectance features and the absorption features are processed by different algorithms. Their outputs are considered as initial decisions, and then fused by a decision-level algorithm, where the entropy of the classification output is used to balance between the two decisions. The final decision is reached by modifying the decision of the reflectance features via the results of the absorption features. Simulations are carried out to assess the classification performance based on two AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) hyperspectral datasets. The results show that the proposed method increases the classification accuracy against the state-of-the-art methods.


Electronics ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 860 ◽  
Author(s):  
Saeed ◽  
Mustafa ◽  
Sheikh ◽  
Jumani ◽  
Mirjat

Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.


2018 ◽  
Vol 6 (4) ◽  
pp. 84
Author(s):  
Panagiotis Serdaris ◽  
Konstantinos Spinthiropoulos ◽  
Michael Agrafiotis ◽  
Athanasios Zisopoulos

The poor Data Science support of agriculture brought us to our main idea of the research is to analyze all micro-works for every plant or tree. Then we proceed to specify targeted actions for harvest collection, micro spraying and hundreds similar simple actions. Initially we collect data from the farm. The airborne, land and underwater unmanned vehicles scan the field area with customized various sensors and cameras in various multi spectral modes. The result is minimum agro-chunk Four-Dimensional model. The unmanned vehicle on the field area receives target data. It is equipped with a general-purpose robotic arm, an absorbing bellow, a robotic pruner, a liquid spraying pipe, an underwater robotic arm and hundreds of others. It moves there and performs the commanded action. Action is flower or nuts collection, insect suction pruning and hundred more. All operations are high trainable by human intervention and the system stores its approach and logic for future action correction.


2021 ◽  
Vol 6 (Suppl 5) ◽  
pp. e005057
Author(s):  
Nivedita Saksena ◽  
Rahul Matthan ◽  
Anant Bhan ◽  
Satchit Balsari

In August 2020, India announced its vision for the National Digital Health Mission (NDHM), a federated national digital health exchange where digitised data generated by healthcare providers will be exported via application programme interfaces to the patient’s electronic personal health record. The NDHM architecture is initially expected to be a claims platform for the national health insurance programme ‘Ayushman Bharat’ that serves 500 million people. Such large-scale digitisation and mobility of health data will have significant ramifications on care delivery, population health planning, as well as on the rights and privacy of individuals. Traditional mechanisms that seek to protect individual autonomy through patient consent will be inadequate in a digitised ecosystem where processed data can travel near instantaneously across various nodes in the system and be combined, aggregated, or even re-identified.In this paper we explore the limitations of ‘informed’ consent that is sought either when data are collected or when they are ported across the system. We examine the merits and limitations of proposed alternatives like the fiduciary framework that imposes accountability on those that use the data; privacy by design principles that rely on technological safeguards against abuse; or regulations. Our recommendations combine complementary approaches in light of the evolving jurisprudence in India and provide a generalisable framework for health data exchange that balances individual rights with advances in data science.


Sign in / Sign up

Export Citation Format

Share Document