Preserving Privacy in the Face of Clients with Different Data Needs Using the NGLMS

ABSTRACT ObjectivesSA NT DataLink’s Next Generation Linkage Management System (NGLMS) provides a novel approach to handling privileged or sensitive data for certain projects without having to replicate or duplicate databases and work to protect privacy. The NGLMS is a collection of records (nodes) and relationship (edges) that forms a graph (in the computer science sense) and is designed to support a mix-and-match and layered approach to data linkage projects. The NGLMS allows for the needs of different clients to be managed from the one graph data set while preserving privacy and honouring the requirement to protect sensitive information without having to relink or duplicate data. ApproachThe NGLMS uses a layer-based approach to project description and design. Projects, a specific data linkage request for example, are composed of various data layers. The data layers consist of data sets, link information in the form of pairwise relationships. These layers are coupled with quality information, e.g. acceptable similarity thresholds and/or the types of relationships to consider as ‘linking’ two records, to construct an effective virtual data set which may be different for each project. A project can be constructed by composing existing linkage data (where it already exists) without having to perform new linkage comparisons. ResultsA case study will be discussed where a data set containing extremely sensitive information (record pairings revealing name changes due to family court proceeding and protection orders) was received for incorporation into the data pool. This information is sensitive for which the particular data custodian who supplied the information would wish to have honoured by only incorporating their records for approved analysis, and otherwise excluded for other non-authorised analysis. By placing these data into a separate layer to be included in some projects and not others the sensitive nature of the data can be accommodated and its effects ‘turned on and off’ at will. ConclusionThe flexible on-demand nature of data extraction and late clustering in the NGLMS Graph based approach the linkage allows for ad-hoc project construction and the dynamic inclusion and exclusion of data without the overhead of relinking data.

Download Full-text

A Forensic Analysis of Home Automation Devices (FAHAD) Model: Kasa Smart Light Bulb and Eufy Floodlight Camera as Case Studies

International Journal of Cyber Forensics and Advanced Threat Investigations ◽

10.46386/ijcfati.v1i1-3.16 ◽

2021 ◽

Vol 1 (1-3) ◽

pp. 18-26

Author(s):

Fahad E. Salamh

Keyword(s):

Forensic Analysis ◽

Sensor Data ◽

Smart Devices ◽

Sensitive Information ◽

Home Automation ◽

Security Assessment ◽

Light Bulb ◽

Sensitive Data ◽

Data Set ◽

Iot Devices

The adoption of Internet of Things (IoT) devices is rapidly increasing with the advancement of network technology, these devices carry sensitive data that require adherence to minimum security practices. The adoption of smart devices to migrate homeowners from traditional homes to smart homes has been noticeable. These smart devices share value with and are of potential interest to digital forensic investigators, as well. Therefore, in this paper, we conduct comprehensive security and forensic analysis to contribute to both fields—targeting a security enhancement of the selected IoT devices and assisting the current IoT forensics approaches. Our work follows several techniques such as forensic analysis of identifiable information, including connected devices and sensor data. Furthermore, we perform security assessment exploring insecure communication protocols, plain text credentials, and sensitive information. This will include reverse engineering some binary files and manual analysis techniques. The analysis includes a data-set of home automation devices provided by the VTO labs: (1) the eufy floodlight camera, and (2) the Kasa smart light bulb. The main goal of the technical experiment in this research is to support the proposed model.

Download Full-text

A simple clustering technique to extract subsets of data for function approximation

Journal of Hydroinformatics ◽

10.2166/hydro.2015.065 ◽

2015 ◽

Vol 17 (5) ◽

pp. 719-732

Author(s):

Dulakshi Santhusitha Kumari Karunasingha ◽

Shie-Yui Liong

Keyword(s):

Function Approximation ◽

Prediction Models ◽

Data Extraction ◽

Single Parameter ◽

Subtractive Clustering ◽

Data Sets ◽

Clustering Methods ◽

Clustering Method ◽

Data Set ◽

Functional Relationships

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.

Download Full-text

Implementation of Web Application for Disease Prediction Using AI

10.54646/bijdmbd.002 ◽

2020 ◽

pp. 5-9

Author(s):

Manasvi Srivastava ◽

◽

Vikas Yadav ◽

Swati Singh ◽

◽

...

Keyword(s):

Web Application ◽

Ad Hoc ◽

Data Extraction ◽

Extraction Methods ◽

Web Page ◽

Web Based ◽

Web Extraction ◽

Web Scraping ◽

Audio Video ◽

Manual Extraction

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.

Download Full-text

Background Use of Sensitive Information to Aid in Analysis of Non-sensitive Data on Threats and Vulnerabilities

Intelligence and Security Informatics - Lecture Notes in Computer Science ◽

10.1007/11427995_92 ◽

2005 ◽

pp. 652-653

Author(s):

Richard. A. Smith

Keyword(s):

Sensitive Information ◽

Sensitive Data

Download Full-text

Pulmonary embolism and mortality following total ankle replacement: a data linkage study using the NJR data set

BMJ Open ◽

10.1136/bmjopen-2016-011947 ◽

2016 ◽

Vol 6 (6) ◽

pp. e011947 ◽

Cited By ~ 4

Author(s):

Razi Zaidi ◽

Alexander MacGregor ◽

Suzie Cro ◽

Andy Goldberg

Keyword(s):

Pulmonary Embolism ◽

Data Linkage ◽

Linkage Study ◽

Total Ankle Replacement ◽

Data Set ◽

Ankle Replacement

Download Full-text

Estimating relatedness between malaria parasites

10.1101/575985 ◽

2019 ◽

Cited By ~ 5

Author(s):

Aimee R. Taylor ◽

Pierre E. Jacob ◽

Daniel E. Neafsey ◽

Caroline O. Buckee

Keyword(s):

Genetic Epidemiology ◽

Ad Hoc ◽

Pathogen Transmission ◽

Identity By Descent ◽

Malaria Parasites ◽

Data Set ◽

Epidemiology Studies ◽

Diverse Data ◽

Prospective Study Design ◽

Identity By State

1.AbstractUnderstanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an ad hoc basis. It remains unclear how to compare different studies, therefore, and which measures to use. Here, we systematically compare measures based on identity-by-state and identity-by-descent using a globally diverse data set of malaria parasites,Plasmodium falciparumandPlasmodium vivax, and provide marker requirements for estimates based on identity-by-descent. We formally show that the informativeness of polyallelic markers for relatedness inference is maximised when alleles are equifrequent. Estimates based on identity-by-state are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on identity-by-descent. To generate reliable estimates, we recommend approximately 200 biallelic or 100 polyallelic markers. Confidence intervals illuminate inference across studies based on different sets of markers. These marker requirements, unlike many thus far reported, are immediately applicable to haploid malaria parasites and other haploid eukaryotes. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology, and will provide a basis for statistically informed prospective study design and surveillance strategies.

Download Full-text

Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1196.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1170-1173

Keyword(s):

Data Mining ◽

Information Gain ◽

Original Data ◽

Perturbation Approach ◽

Sensitive Information ◽

Functional Dependencies ◽

Inference Model ◽

Data Set ◽

Data Mining Techniques ◽

Original Dataset

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

Download Full-text

Accurate Filtering of Privacy-Sensitive Information in Raw Genomic Data

10.1101/292185 ◽

2018 ◽

Author(s):

Jérémie Decouchant ◽

Maria Fernandes ◽

Marcus Völp ◽

Francisco M Couto ◽

Paulo Esteves-Veríssimo

Keyword(s):

High Performance ◽

Genomic Data ◽

Sensitive Information ◽

Sensitive Data ◽

Variable Regions ◽

Fine Grained ◽

Sequencing Errors ◽

Sequencing Technologies ◽

Human Genomes ◽

Long Reads

AbstractSequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data entails enormous risks if not protected to the highest standards. In this article, we follow the position and argue that post-alignment privacy is not enough and that data should be automatically protected as early as possible in the genomics workflow, ideally immediately after the data is produced. We show that a previous approach for filtering short reads cannot extend to long reads and present a novel filtering approach that classifies raw genomic data (i.e., whose location and content is not yet determined) into privacy-sensitive (i.e., more affected by a successful privacy attack) and non-privacy-sensitive information. Such a classification allows the fine-grained and automated adjustment of protective measures to mitigate the possible consequences of exposure, in particular when relying on public clouds. We present the first filter that can be indistinctly applied to reads of any length, i.e., making it usable with any recent or future sequencing technologies. The filter is accurate, in the sense that it detects all known sensitive nucleotides except those located in highly variable regions (less than 10 nucleotides remain undetected per genome instead of 100,000 in previous works). It has far less false positives than previously known methods (10% instead of 60%) and can detect sensitive nucleotides despite sequencing errors (86% detected instead of 56% with 2% of mutations). Finally, practical experiments demonstrate high performance, both in terms of throughput and memory consumption.

Download Full-text

Unleashing The Power of Your Master Linkage Map – Is There A Role For Business Intelligence Tools In Supporting Data Linkage?

International Journal for Population Data Science ◽

10.23889/ijpds.v4i3.1235 ◽

2019 ◽

Vol 4 (3) ◽

Author(s):

Brian Stokes

Keyword(s):

Business Intelligence ◽

Ad Hoc ◽

Data Linkage ◽

Query Language ◽

Ease Of Use ◽

Size Estimation ◽

Cohort Size ◽

Data Services ◽

Technical Staff ◽

Software Applications

Background with rationaleBusiness Intelligence (BI) software applications collect and process large amounts of data from one or more sources, and for a variety of purposes. These can include generating operational or sales reports, developing dashboards and data visualisations, and for ad-hoc analysis and querying of enterprise databases. Main AimBusiness Intelligence (BI) software applications collect and process large amounts of data from one or more sources, and for a variety of purposes. These can include generating operational or sales reports, developing dashboards and data visualisations, and for ad-hoc analysis and querying of enterprise databases. Methods/ApproachIn deciding to develop a series of dashboards to visually represent data stored in its MLM, the TDLU identified routine requests for these data and critically examined existing techniques for extracting data from its MLM. Traditionally Structured Query Language (SQL) queries were developed and used for a single purpose. By critically analysing limitations with this approach, the TDLU identified the power of BI tools and ease of use for both technical and non-technical staff. ResultsImplementing a BI tool is enabling quick and accurate production of a comprehensive array of information. Such information assists with cohort size estimation, producing data for routine and ad-hoc reporting, identifying data quality issues, and to answer questions from prospective users of linked data services including instantly producing estimates of links stored across disparate datasets. Conclusion BI tools are not traditionally considered integral to the operations of data linkage units. However, the TDLU has successfully applied the use of a BI tool to enable a rich set of data locked in its MLM to be quickly made available in multiple, easy to use formats and by technical and non-technical staff.

Download Full-text

Context-Aware Driver’s Behaviour Monitoring System in Vehicular Ad-Hoc Network

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/351022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 701-709

Keyword(s):

Monitoring System ◽

Ad Hoc Network ◽

High Speed ◽

Ad Hoc ◽

Detection System ◽

Vehicular Ad Hoc Network ◽

Context Aware ◽

Road Accidents ◽

Road Transportation ◽

Data Set

The number of deaths resulting from road accidents and mishaps has increased at an alarming rate over the years. Road transportation is the most popularly used means of transportation in developing countries like Nigeria and most of these road accidents are associated with reckless driving habits. Context-aware systems provide intelligent recommendations allowing digital devices to make correct and timely recommendations when required. Furthermore, in a Vehicular Ad-hoc Network (VANET), communication links between vehicles and roadside units are improved thus enabling vehicle and road safety. Hence, a non-intrusive driver behaviour detection system that incorporates context-aware monitoring features in VANET is proposed in this study. By making use of a one-dimensional highway (1D) road with one-way traffic movement and incorporating GSM technology, irregular actions (high speed, alcohol while driving, and pressure) exhibited by drivers are monitored and alerts are sent to other nearby vehicles and roadside units to avoid accidents. The proposed system adopted a real-time VANET prototype with three entities involved in the context-aware driver’s behaviour monitoring system namely, the driver, vehicle, and environment. The analytical tests with actual data set indicate that, when detected, the model measures the pace of the vehicle, the level of alcohol in the breath, and the driver's heart rate in-breath per minute (BPM). Therefore, it can be used as an appropriate model for the Context-aware driver’s monitoring system in VANET.

Download Full-text