Internal and External Threat Analysis of Anonymized Dataset

Handbook of Research on Intrusion Detection Systems - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-2242-4.ch009 ◽

2020 ◽

pp. 172-185

Author(s):

Saurav Jindal ◽

Poonam Saini

Keyword(s):

Data Mining ◽

Data Collection ◽

Information Gain ◽

Threat Analysis ◽

External Threat ◽

Utility Of Information ◽

Cost Penalty ◽

Computational Processes ◽

Different Sources ◽

Unusual Situation

In recent years, data collection and data mining have emerged as fast-paced computational processes as the amount of data from different sources has increased manifold. With the advent of such technologies, major concern is exposure of an individual's self-contained information. To confront the unusual situation, anonymization of dataset is performed before being released into public for further usage. The chapter discusses various existing techniques of anonymization. Thereafter, a novel redaction technique is proposed for generalization to minimize the overall cost (penalty) of the process being inversely proportional to utility of generated dataset. To validate the proposed work, authors assume a pre-processed dataset and further compare our algorithm with existing techniques. Lastly, the proposed technique is made scalable thus ensuring further minimization of generalization cost and improving overall utility of information gain.

Download Full-text

Crash data reporting systems in fourteen Arab countries: challenges and improvement

Archives of Transport ◽

10.5604/01.3001.0014.5628 ◽

2020 ◽

Vol 56 (4) ◽

pp. 73-88

Author(s):

Zahira Abounoas ◽

Wassim Raphael ◽

Yarob Badr ◽

Rafic Faddoul ◽

Anne Guillaume

Keyword(s):

Data Mining ◽

Data Collection ◽

Information Gain ◽

Arab Countries ◽

Road User ◽

Handheld Devices ◽

Data Reporting ◽

Electronic Data Collection ◽

Crash Data ◽

Reporting Systems

Traffic crash fatalities and serious injuries still represent a big burden for most Arab countries because the actual policies, strategies, and interventions are based on poorly collected data. Through this paper, we assessed the crash data reporting systems in Fourteen Arab countries via a survey conducted to identify the fundamental dysfunctions at the management and data collection levels. Then, to address some of the dataset problems, we had applied data mining technics to select a minimum of variables (crash, vehicle, and road user) that should be collected for a better understanding of crash circumstances. For this raison, three methods of selection (correlation, information gain, and gain ratio) and seven classifiers (naive Bayes, nearest neighbour, random forest, random tree, J48, reduced error pruning tree, and bagging) were tested and compared to identify the variables that affect significantly the crashes severity. Decision trees family of classifiers showed the best performance based on the analysis of the area under the curve. The explanatory variables obtained from the data mining process were combined with other descriptive variables to maintain traceability. As a result, we produced hybrid lists of variables for the crash, vehicle, and road user, each contains 25 variables. Finally, in order to propose a cost-effective solution to switch from manual to electronic data collection, we got inspired by a tool used to track animals to create and customize a unified e-form for handheld devices, in order to ensure easy entering of the harmonized data for the entire region based on our selected lists of variables. The tool verified the countries requirements especially by enabling data collection and transfer with and without the internet, and by allowing data analysis thought its built-in Geographic Information System (GIS) capabilities.

Download Full-text

Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1196.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1170-1173

Keyword(s):

Data Mining ◽

Information Gain ◽

Original Data ◽

Perturbation Approach ◽

Sensitive Information ◽

Functional Dependencies ◽

Inference Model ◽

Data Set ◽

Data Mining Techniques ◽

Original Dataset

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

Download Full-text

Perceptions of Auditor Negligence: The Effects of Big Data Visualisations on Jurors’ Decisions

10.26686/wgtn.17144102 ◽

2021 ◽

Author(s):

◽

Travis Christensen

Keyword(s):

Social Media ◽

Big Data ◽

Data Collection ◽

Jury Verdicts ◽

Word Clouds ◽

Audit Litigation ◽

Different Types ◽

Different Sources ◽

Bar Graphs

<p>This study analyses the effects of Big Data visualisations on jurors’ decisions in audit litigation cases. Specifically, the study investigates the effects of different types of Big Data visualisations (word clouds or bar graphs) and different sources of Big Data (emails or social media posts) on jurors’ perceptions of auditors’ work and the size of the negligence awards that jurors recommend. The study theorises that the emotions elicited and the reliability of the data used to create visualisations such as word clouds will have dramatic effects on jury verdicts in audit negligence trials. There is considerable literature to support this assertion. However, after data collection, it was discovered that jurors are not influenced by the emotions elicited by visualisations. Rather, participants were very sceptical of more novel types of visualisations, such as word clouds, but could be persuaded by the inherent emotions elicited and the reliability of the data if they found the visualisation useful.</p>

Download Full-text

Behavioral Targeting Online Advertising

Advances in Multimedia and Interactive Technologies - Online Multimedia Advertising ◽

10.4018/978-1-60960-189-8.ch012 ◽

2011 ◽

pp. 213-232 ◽

Cited By ~ 2

Author(s):

Jun Yan ◽

Dou Shen ◽

Teresa Mah ◽

Ning Liu ◽

Zheng Chen ◽

...

Keyword(s):

Data Mining ◽

Data Collection ◽

Predictive Modeling ◽

Rapid Growth ◽

Online Advertising ◽

Research Challenges ◽

Behavioral Targeting ◽

The Us ◽

Collection Data ◽

Complex Technology

With the rapid growth of the online advertising market, Behavioral Targeting (BT), which delivers advertisements to users based on understanding of their needs through their behaviors, is attracting more attention. The amount of spend on behaviorally targeted ad spending in the US is projected to reach $4.4 billion in 2012 (Hallerman, 2008). BT is a complex technology, which involves data collection, data mining, audience segmentation, contextual page analysis, predictive modeling and so on. This chapter gives an overview of Behavioral Targeting by introducing the Behavioral Targeting business, followed by classic BT research challenges and solution proposals. We will also point out BT research challenges which are currently under-explored in both industry and academia.

Download Full-text

Prediction of Skin Diseases Using Machine Learning

10.4018/978-1-7998-7888-9.ch008 ◽

2022 ◽

pp. 154-178

Author(s):

Siddhartha Kumar Arjaria ◽

Vikas Raj ◽

Sunil Kumar ◽

Priyanshu Shrivastava ◽

Monu Kumar ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Skin Disease ◽

Skin Diseases ◽

Information Gain ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Chi Square ◽

Data Mining Techniques ◽

Disease Rates

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Download Full-text

Agricultural Data Mining in the 21st Century

Social Implications of Data Mining and Information Privacy ◽

10.4018/978-1-60566-196-4.ch013 ◽

2010 ◽

pp. 229-246

Author(s):

E. Arlin Torbett ◽

Tanya M. Candia

Keyword(s):

Public Health ◽

Data Mining ◽

Data Collection ◽

Radio Frequency Identification ◽

Fresh Produce ◽

Online Marketing ◽

Current State ◽

Working Together ◽

Frequency Identification ◽

Rich Information

Data on the production, sale, repackaging, and transportation of fresh produce is scarce, yet with recent threats to national safety and security, forward and backward traceability of produce is mandatory. Recent advances in online marketing of fresh produce, a new international codification system and use of advanced technologies such as Radio Frequency Identification (RFID) and bar coding are working together to fill the gap, building a solid database of rich information that can be mined. While agricultural data mining holds much promise for farmers, with better indications of what and when to plant, and for buyers, giving them access to improved food quality and availability information, it is the world’s health organizations and governments who stand to be the biggest beneficiaries. This chapter describes the current state of fresh produce data collection and access, new trends that fill important gaps, and emerging methods of mining fresh produce data for improved production, product safety and public health through traceability.

Download Full-text

PhycoMine: A Microalgae Data Warehouse

10.1101/2021.09.27.462046 ◽

2021 ◽

Author(s):

Rodrigo R. D. Goitia ◽

Diego M. Riaño-Pachón ◽

Alexandre Victor Fassio ◽

Flavia V. Winck

Keyword(s):

Gene Expression ◽

Data Mining ◽

Data Warehouse ◽

Rna Seq ◽

Metabolic Pathway Analysis ◽

Data Repositories ◽

Computational Environment ◽

Biological Network Analysis ◽

Group Data ◽

Different Sources

AbstractPhycoMine is data warehouse system created to fostering the analysis of complex and integrated data from microalgae species in a single computational environment. The PhycoMine was developed on top of the InterMine software system, and it has implemented an extended database model, containing a series of tools that help the users in the analysis and mining of individual data and group data. The platform has widgets created to facilitate simultaneous data mining of different datasets. Among the widgets implemented in PhycoMine, there are options for mining chromosome distribution, gene expression variation via transcriptomics, proteomics sets, Gene Onthology enrichment, KEGG enrichment, publication enrichment, EggNOG, Transcription factors and transcriptional regulators enrichment and phenotypical data. These widgets were created to facilitate data visualization of the gene expression levels in different experimental setups, for which RNA-seq experimental data is available in data repositories. For comparative purposes, we have reanalyzed 200 RNA-seq datasets from Chlamydomonas reinhardtii, a model unicellular microalga, for optimizing the performance and accuracy of data comparisons. We have also implemented widgets for metabolic pathway analysis of selected genes and proteins and options for biological network analysis. The option for analysis of orthologue genes was also included. With this platform, the users can perform data mining for a list of genes or proteins of interest in an integrated way through accessing the data from different sources and visualizing them in graphics and by exporting the data into table formats. The PhycoMine platform is freely available and can be visited through the URL https://PhycoMine.iq.usp.br.

Download Full-text

Entropy based C4.5-SHO algorithm with information gain optimization in data mining

PeerJ Computer Science ◽

10.7717/peerj-cs.424 ◽

2021 ◽

Vol 7 ◽

pp. e424

Author(s):

G Sekhar Reddy ◽

Suneetha Chittineni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Information Gain ◽

Characteristic Curve ◽

Cuckoo Search ◽

Computer Assisted ◽

Quadratic Entropy ◽

C4.5 Decision Tree ◽

Data Investigation ◽

Gain Optimization

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.

Download Full-text

Application Of Pizza Sales Data Mining Using Apriori Method

SinkrOn ◽

10.33395/sinkron.v4i2.10500 ◽

2020 ◽

Vol 4 (2) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Rusdiansyah Rusdiansyah ◽

Nining Suharyanti ◽

Triningsih Triningsih ◽

Muhammad Darussalam

Keyword(s):

Data Mining ◽

Data Collection ◽

Association Rules ◽

A Priori ◽

Apriori Algorithm ◽

Mining Method ◽

Processed Food ◽

Sales Data ◽

Business Opportunity ◽

Use Of Data

Pizza is a processed food originating from Italy and has been spread in various other countries including one of them in Indonesia. Pizza is a processed food that is currently sought after by various groups of people so as to make the pizza business opportunity very profitable, if it is run in a food business. Currently the pizza business has very favorable prospects when compared to other businesses. Moreover, the targeted target can be from all walks of life from children to adults. Pizza sales transactions that produce sales data every day, have not been able to maximize the use of sales data. Sales data is only stored as an archive, so it becomes a pile of data. Therefore the use of data mining is used to solve this problem. A priori algorithm is a data mining method by using minimum support parameters, minimum confidence and will analyze in the period of every month of sales transactions. This study produces data on the results of the process of association rules from the data collection of sales transactions. From the association rules it can be concluded that the pattern of pizza sales, where consumers more often buy Meatzza and Cheese Mania, as evidenced by the results of calculations using Apriori Algorithm and Rapidminer 5.3, with support of 30% and 60% confidence.

Download Full-text

Problematika Penerapan Metodologi Barat pada Pendidikan Dasar dalam Perspektif Islam

AL-ADABIYAH: Jurnal Pendidikan Agama Islam ◽

10.35719/adabiyah.v1i1.8 ◽

2020 ◽

Vol 1 (1) ◽

pp. 17-32

Author(s):

Muhamad Parhan ◽

Adilla Tieky I. D ◽

Ajeng Irma H. S ◽

Arnis Susnita ◽

Eva Fauziah K

Keyword(s):

Qualitative Research ◽

Data Collection ◽

Basic Education ◽

Islamic Education ◽

Research Approach ◽

Literature Study ◽

Educational Methods ◽

Quantitative And Qualitative Research ◽

The Right ◽

Different Sources

Penelitian ini merupakan sebuah upaya yang dilakukan peneliti dalam mengkaji tentang masalah-masalah penerapan metodologi Barat pada pendidikan dasar di Indonesia dalam perspektif Islam. Upaya yang dilakukan dengan cara studi pustaka yang bertujuan untuk mendapatkan metode pendidikan yang tepat dan sesuai dengan tujuan pendidikan di Indonesia. Pendekatan penelitian ini merupakan kombinasi antara penelitian kuantitatif dan kualitatif (mix research) dengan teknik pengumpulan data berupa sebar kuesioner, studi pustaka, dan wawancara yang dilakukan kepada beberapa narasumber yang berbeda guna mendapatkan jawaban-jawaban untuk disimpulkan dengan benar. Penelitian ini dapat dilihat dari hasil kuesioner dan wawancara. Masalah-masalah penerapan metodologi Barat pada pendidikan dasar di Indonesia dapat diatasi dengan menerapkan beberapa cara mendidik anak yang dicontohkan oleh Rasulullah saw., yaitu didasarkan kepada Alquran dan Hadits, serta mengambil metodologi Barat dengan terlebih dahulu melakukan pemilihan konten yang tepat dan relevan dengan pendidikan Islam. Kata Kunci: problematika, metodologi barat, pendidikan dasar This research is an attempt by the researcher in examining the problems of applying Western methodology to basic education in Indonesia in an Islamic perspective. Efforts are made by means of literature study that aims to get the right educational methods and in accordance with the objectives of education in Indonesia. This research approach is a combination of quantitative and qualitative research (mix research) with data collection techniques in the form of questionnaires, literature studies, and interviews conducted to several different sources to get answers to be concluded correctly. This research can be seen from the results of the questionnaire and interview. The problems of applying Western methodology to basic education in Indonesia can be overcome by applying several ways of educating children as exemplified by the Messenger of Allah, which is based on the Qur'an and Hadith, and taking Western methodologies by first selecting the right content and relevant to Islamic education.

Download Full-text