Performance Evaluation of Policy-Based SQL Query Classification for Data-Privacy Compliance

AbstractData science must respect privacy in many situations. We have built a query repository with automatic SQL query classification according to data-privacy directives. It can intercept queries that violate the directives, since a JDBC proxy driver inserted between the end-users’ SQL tooling and the target data consults the repository for the compliance of each query. Still, this slows down query processing. This paper presents two optimizations implemented to increase classification performance and describes a measurement environment that allows quantifying the induced performance overhead. We present measurement results and show that our optimized implementation significantly reduces classification latency. The query metadata (QM) is stored in both relational and graph-based databases. Whereas query classification can be done in a few ms on average using relational QM, a graph-based classification is orders of magnitude more expensive at 137 ms on average. However, the graphs contain more precise information, and thus in some cases the final decision requires to check them, too. Our optimizations considerably reduce the number of graph-based classifications and, thus, decrease the latency to 0.35 ms in $$87\%$$ 87 % of the classification cases.

Download Full-text

New dilemmas, old problems: advances in data analysis and its geoethical implications in groundwater management

SN Applied Sciences ◽

10.1007/s42452-021-04600-w ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

César de Oliveira Ferreira Silva ◽

Mariana Matulovic ◽

Rodrigo Lilla Manzione

Keyword(s):

Decision Making ◽

Data Analytics ◽

Data Science ◽

Final Decision ◽

Aquifer Recharge ◽

Groundwater Governance ◽

Ethical Attitude ◽

New Information ◽

Modeling Experiment ◽

Using Data

Abstract Groundwater governance uses modeling to support decision making. Therefore, data science techniques are essential. Specific difficulties arise because variables must be used that cannot be directly measured, such as aquifer recharge and groundwater flow. However, such techniques involve dealing with (often not very explicitly stated) ethical questions. To support groundwater governance, these ethical questions cannot be solved straightforward. In this study, we propose an approach called “open-minded roadmap” to guide data analytics and modeling for groundwater governance decision making. To frame the ethical questions, we use the concept of geoethical thinking, a method to combine geoscience-expertise and societal responsibility of the geoscientist. We present a case study in groundwater monitoring modeling experiment using data analytics methods in southeast Brazil. A model based on fuzzy logic (with high expert intervention) and three data-driven models (with low expert intervention) are tested and evaluated for aquifer recharge in watersheds. The roadmap approach consists of three issues: (a) data acquisition, (b) modeling and (c) the open-minded (geo)ethical attitude. The level of expert intervention in the modeling stage and model validation are discussed. A search for gaps in the model use is made, anticipating issues through the development of application scenarios, to reach a final decision. When the model is validated in one watershed and then extrapolated to neighboring watersheds, we found large asymmetries in the recharge estimatives. Hence, we can show that more information (data, expertise etc.) is needed to improve the models’ predictability-skill. In the resulting iterative approach, new questions will arise (as new information comes available), and therefore, steady recourse to the open-minded roadmap is recommended. Graphic abstract

Download Full-text

Signal Classification Algorithms over Time Selective Channels

Electronics ◽

10.3390/electronics10141714 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1714

Author(s):

Mohamed Marey ◽

Hala Mostafa

Keyword(s):

Block Code ◽

Classification Problem ◽

Classification Performance ◽

Signal Classification ◽

Final Decision ◽

Time Block ◽

Channel Response ◽

Over Time ◽

Parallel Fashion

In this work, we propose a general framework to design a signal classification algorithm over time selective channels for wireless communications applications. We derive an upper bound on the maximum number of observation samples over which the channel response is an essential invariant. The proposed framework relies on dividing the received signal into blocks, and each of them has a length less than the mentioned bound. Then, these blocks are fed into a number of classifiers in a parallel fashion. A final decision is made through a well-designed combiner and detector. As a case study, we employ the proposed framework on a space-time block-code classification problem by developing two combiners and detectors. Monte Carlo simulations show that the proposed framework is capable of achieving excellent classification performance over time selective channels compared to the conventional algorithms.

Download Full-text

A Cloud Game-Based Educative Platform Architecture: The CyberScratch Project

Applied Sciences ◽

10.3390/app11020807 ◽

2021 ◽

Vol 11 (2) ◽

pp. 807

Author(s):

Llanos Tobarra ◽

Alejandro Utrilla ◽

Antonio Robles-Gómez ◽

Rafael Pastor-Vargas ◽

Roberto Hernández

Keyword(s):

Data Storage ◽

Data Privacy ◽

Data Science ◽

Critical Issue ◽

Privacy Management ◽

Instructional Process ◽

Legal Context ◽

Flexible Architecture ◽

Efficient Data ◽

Implementation Guidelines

The employment of modern technologies is widespread in our society, so the inclusion of practical activities for education has become essential and useful at the same time. These activities are more noticeable in Engineering, in areas such as cybersecurity, data science, artificial intelligence, etc. Additionally, these activities acquire even more relevance with a distance education methodology, as our case is. The inclusion of these practical activities has clear advantages, such as (1) promoting critical thinking and (2) improving students’ abilities and skills for their professional careers. There are several options, such as the use of remote and virtual laboratories, virtual reality and game-based platforms, among others. This work addresses the development of a new cloud game-based educational platform, which defines a modular and flexible architecture (using light containers). This architecture provides interactive and monitoring services and data storage in a transparent way. The platform uses gamification to integrate the game as part of the instructional process. The CyberScratch project is a particular implementation of this architecture focused on cybersecurity game-based activities. The data privacy management is a critical issue for these kinds of platforms, so the architecture is designed with this feature integrated in the platform components. To achieve this goal, we first focus on all the privacy aspects for the data generated by our cloud game-based platform, by considering the European legal context for data privacy following GDPR and ISO/IEC TR 20748-1:2016 recommendations for Learning Analytics (LA). Our second objective is to provide implementation guidelines for efficient data privacy management for our cloud game-based educative platform. All these contributions are not found in current related works. The CyberScratch project, which was approved by UNED for the year 2020, considers using the xAPI standard for data handling and services for the game editor, game engine and game monitor modules of CyberScratch. Therefore, apart from considering GDPR privacy and LA recommendations, our cloud game-based architecture covers all phases from game creation to the final users’ interactions with the game.

Download Full-text

Classification of Imbalanced Cloud Image Data Using Deep Neural Networks –Performance Improvement Through a Data Science Competition–

10.21203/rs.3.rs-710989/v1 ◽

2021 ◽

Author(s):

Daisuke Matsuoka

Keyword(s):

Neural Networks ◽

Data Science ◽

Image Data ◽

Cost Effective ◽

Classification Performance ◽

Classification Model ◽

Extreme Weather Events ◽

Target Class ◽

Imbalanced Classification ◽

Classification Prediction

Abstract Image data classification using machine learning is one of the effective methods for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance of data between the target class and the other classes. In order to build a highly accurate classification model, we held a data analysis competition to determine the best classification performance for two classes of cloud image data: tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the imbalanced classification performance. In particular, the best model out of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of low false alarm ratio.

Download Full-text

Big Data and Analytics in Retailing

NIM Marketing Intelligence Review ◽

10.2478/nimmir-2019-0006 ◽

2019 ◽

Vol 11 (1) ◽

pp. 36-40 ◽

Cited By ~ 1

Author(s):

Venky Shankar

Keyword(s):

Machine Learning ◽

Big Data ◽

Dynamic Pricing ◽

Data Privacy ◽

Data Science ◽

Shopping Behavior ◽

Computer Algorithms ◽

Multiple Sources ◽

Privacy And Security ◽

Customer Data

AbstractBig data are taking center stage for decision-making in many retail organizations. Customer data on attitudes and behavior across channels, touchpoints, devices and platforms are often readily available and constantly recorded. These data are integrated from multiple sources and stored or warehoused, often in a cloud-based environment. Statistical, econometric and data science models are developed for enabling appropriate decisions. Computer algorithms and programs are created for these models. Machine learning based models, are particularly useful for learning from the data and making predictive decisions. These machine learning models form the backbone for the generation and development of AI-assisted decisions. In many cases, such decisions are automated using systems such as chatbots and robots.Of special interest are issues such as omnichannel shopping behavior, resource allocation across channels, the effects of the mobile channel and mobile apps on shopper behavior, dynamic pricing, data privacy and security. Research on these issues reveals several interesting insights on which retailers can build. To fully leverage big data in today’s retailing environment, CRM strategies must be location specific, time specific and channel specific in addition to being customer specific.

Download Full-text

Entropy-Mediated Decision Fusion for Remotely Sensed Image Classification

Remote Sensing ◽

10.3390/rs11030352 ◽

2019 ◽

Vol 11 (3) ◽

pp. 352 ◽

Cited By ~ 2

Author(s):

Baofeng Guo

Keyword(s):

Infrared Imaging ◽

Remotely Sensed ◽

Decision Fusion ◽

Classification Performance ◽

Final Decision ◽

Imaging Spectrometer ◽

Remotely Sensed Image ◽

Hyperspectral Signatures ◽

Absorption Features ◽

Fusion Approach

To better classify remotely sensed hyperspectral imagery, we study hyperspectral signatures from a different view, in which the discriminatory information is divided as reflectance features and absorption features, respectively. Based on this categorization, we put forward an information fusion approach, where the reflectance features and the absorption features are processed by different algorithms. Their outputs are considered as initial decisions, and then fused by a decision-level algorithm, where the entropy of the classification output is used to balance between the two decisions. The final decision is reached by modifying the decision of the reflectance features via the results of the absorption features. Simulations are carried out to assess the classification performance based on two AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) hyperspectral datasets. The results show that the proposed method increases the classification accuracy against the state-of-the-art methods.

Download Full-text

HARMONIC CORRELATIONS MATRICES TO PRESENT MEASUREMENT RESULTS FROM SINGLE AND MULTIPLE LOCATIONS

10.1049/icp.2021.1821 ◽

2021 ◽

Author(s):

N. Nakhodchi ◽

M. H. J. Bollen

Keyword(s):

Present Measurement ◽

Measurement Results

Download Full-text

Ensemble Bagged Tree Based Classification for Reducing Non-Technical Losses in Multan Electric Power Company of Pakistan

Electronics ◽

10.3390/electronics8080860 ◽

2019 ◽

Vol 8 (8) ◽

pp. 860 ◽

Cited By ~ 11

Author(s):

Saeed ◽

Mustafa ◽

Sheikh ◽

Jumani ◽

Mirjat

Keyword(s):

Electric Power ◽

Decision Trees ◽

Power Distribution ◽

Nearest Neighbor ◽

Classification Performance ◽

Support Vector ◽

Final Decision ◽

Power Company ◽

Consumption Data ◽

Electric Power Company

Non-technical losses (NTLs) have been a major concern for power distribution companies (PDCs). Billions of dollars are lost each year due to fraud in billing, metering, and illegal consumer activities. Various studies have explored different methodologies for efficiently identifying fraudster consumers. This study proposes a new approach for NTL detection in PDCs by using the ensemble bagged tree (EBT) algorithm. The bagged tree is an ensemble of many decision trees which considerably improves the classification performance of many individual decision trees by combining their predictions to reach a final decision. This approach relies on consumer energy usage data to identify any abnormality in consumption which could be associated with NTL behavior. The key motive of the current study is to provide assistance to the Multan Electric Power Company (MEPCO) in Punjab, Pakistan for its campaign against energy stealers. The model developed in this study generates the list of suspicious consumers with irregularities in consumption data to be further examined on-site. The accuracy of the EBT algorithm for NTL detection is found to be 93.1%, which is considerably higher compared to conventional techniques such as support vector machine (SVM), k-th nearest neighbor (KNN), decision trees (DT), and random forest (RF) algorithm.

Download Full-text

The Minimum Agriculture-Chunk as an Elementary Data Science Component in ADAM, a Micro Targeted, Trainable, Modular, Multipurpose System for Land Farming

Journal of Agricultural Studies ◽

10.5296/jas.v6i4.14116 ◽

2018 ◽

Vol 6 (4) ◽

pp. 84

Author(s):

Panagiotis Serdaris ◽

Konstantinos Spinthiropoulos ◽

Michael Agrafiotis ◽

Athanasios Zisopoulos

Keyword(s):

Data Science ◽

Main Idea ◽

General Purpose ◽

Unmanned Vehicles ◽

Robotic Arm ◽

Dimensional Model ◽

The Poor ◽

Land Farming ◽

Field Area ◽

Target Data

The poor Data Science support of agriculture brought us to our main idea of the research is to analyze all micro-works for every plant or tree. Then we proceed to specify targeted actions for harvest collection, micro spraying and hundreds similar simple actions. Initially we collect data from the farm. The airborne, land and underwater unmanned vehicles scan the field area with customized various sensors and cameras in various multi spectral modes. The result is minimum agro-chunk Four-Dimensional model. The unmanned vehicle on the field area receives target data. It is equipped with a general-purpose robotic arm, an absorbing bellow, a robotic pruner, a liquid spraying pipe, an underwater robotic arm and hundreds of others. It moves there and performs the commanded action. Action is flower or nuts collection, insect suction pruning and hundred more. All operations are high trainable by human intervention and the system stores its approach and logic for future action correction.

Download Full-text

Rebooting consent in the digital age: a governance framework for health data exchange

BMJ Global Health ◽

10.1136/bmjgh-2021-005057 ◽

2021 ◽

Vol 6 (Suppl 5) ◽

pp. e005057

Author(s):

Nivedita Saksena ◽

Rahul Matthan ◽

Anant Bhan ◽

Satchit Balsari

Keyword(s):

Data Privacy ◽

Data Exchange ◽

Data Science ◽

Care Delivery ◽

Digital Health ◽

Health Planning ◽

Personal Health Record ◽

Health Data ◽

Individual Autonomy ◽

Patient Consent

In August 2020, India announced its vision for the National Digital Health Mission (NDHM), a federated national digital health exchange where digitised data generated by healthcare providers will be exported via application programme interfaces to the patient’s electronic personal health record. The NDHM architecture is initially expected to be a claims platform for the national health insurance programme ‘Ayushman Bharat’ that serves 500 million people. Such large-scale digitisation and mobility of health data will have significant ramifications on care delivery, population health planning, as well as on the rights and privacy of individuals. Traditional mechanisms that seek to protect individual autonomy through patient consent will be inadequate in a digitised ecosystem where processed data can travel near instantaneously across various nodes in the system and be combined, aggregated, or even re-identified.In this paper we explore the limitations of ‘informed’ consent that is sought either when data are collected or when they are ported across the system. We examine the merits and limitations of proposed alternatives like the fiduciary framework that imposes accountability on those that use the data; privacy by design principles that rely on technological safeguards against abuse; or regulations. Our recommendations combine complementary approaches in light of the evolving jurisprudence in India and provide a generalisable framework for health data exchange that balances individual rights with advances in data science.

Download Full-text