Big Data Analysis and Perturbation using Data Mining Algorithm

The advancement and introduction of computing technologies has proven to be highly effective and has resulted in the production of large amount of data that is to be analyzed. However, there is much concern on the privacy protection of the gathered data which suffers from the possibility of being exploited or exposed to the public. Hence, there are many methods of preserving this information they are not completely scalable or efficient and also have issues with privacy or data utility. Hence this proposed work provides a solution for such issues with an effective perturbation algorithm that uses big data by means of optimal geometric transformation. The proposed work has been examined and tested for accuracy, attack resistance, scalability and efficiency with the help of 5 classification algorithms and 9 datasets. Experimental analysis indicates that the proposed work is more successful in terms of attack resistance, scalability, execution speed and accuracy when compared with other algorithms that are used for privacy preservation.

Download Full-text

An Improvised Framework for Privacy Preservation in IoT

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch022 ◽

2021 ◽

pp. 475-491

Author(s):

Muzzammil Hussain ◽

Neha Kaliya

Keyword(s):

Big Data ◽

Internet Of Things ◽

Execution Time ◽

Data Privacy ◽

Privacy Preservation ◽

Encryption Algorithm ◽

The Public ◽

Attribute Based Encryption ◽

Key Length ◽

Different Levels

Data privacy is now-a-days a special issue in era of Internet of Things because of the big data stored and transmitted by the public/private devices. Different types and levels of privacy can be provided at different layers of IoT architecture, also different mechanisms operate at different layers of IoT architecture. This article presents the work being done towards the design of a generic framework to integrate these privacy preserving mechanisms at different layers of IoT architecture and can ensure privacy preservation in a heterogeneous IoT environment. The data is classified into different levels of secrecy and appropriate rules and mechanisms are applied to ensure this privacy. The proposed framework is implemented and evaluated for its performance with security and execution time or primary parameters. Various scenarios are also evaluated, and a comparison is done with an existing mechanism ABE (Attribute Based Encryption). It has been found that the proposed work takes less time and is more secure due to short key length and randomness of the parameters used in encryption algorithm.

Download Full-text

Anonymization Based Fisher–Yates Shuffle Method for Streaming of Twitter Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1397.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 408-411

Keyword(s):

Big Data ◽

Real Time ◽

Privacy Preservation ◽

Personal Data ◽

Large Data ◽

Information Loss ◽

Streaming Data ◽

Privacy Concern ◽

Data Utility ◽

Data Anonymization

In this era of Big Data, many organizations are functioning with personal data, that has to be preserved for privacy reason. There are hazards to identify the individual details by using Quasi Identifier (QI). So to preserve the privacy, anonymization points us to convert the personal data into unidentified personal data. There are many organizations that produce the large data in real time. With the help of Hadoop components like HDFS and MapReduce and with its ecosystems, large volume of data can be processed in real time. There are many basic data anonymization techniques like cryptographic, substitution, character masking, shuffling, nulling out, date variance and number variance. Here privacy preservation is achieved for streaming data by using one of the anonymization techniques called ‘shuffling’ with Big data concept. K-anonymity, t-closeness, l-diversity are usually used technique for privacy concern in a data. But in all these techniques information loss and data utility are not preserved very well. Dynamically Anonymizing Data Shuffling (DADS) technique is used to overcome this information loss and also to improve data utility in streaming data.

Download Full-text

Big Data and the Economics of TV Broadcasting. Where is the Public Value?

MedienJournal ◽

10.24989/medienjournal.v41i3.1492 ◽

2017 ◽

Vol 41 (3) ◽

pp. 15-28 ◽

Cited By ~ 2

Author(s):

Paul Clemens Murschetz

Keyword(s):

Big Data ◽

Public Value ◽

The Public ◽

Tv Broadcasting

Der vorliegende Beitrag untersucht Potenziale und Risiken von Big Data für das Leitmedium Fernsehen. Er nimmt dabei eine betont kritisch-normative Perspektive aus Sicht der Medienökonomie ein und analysiert diese anhand des Beispiels Konvergenzfernsehen. Eine der vielen Dimensionen von Big Data ist nämlich die Analyse des Nutzungsverhaltens einer Vielzahl von Konsumenten. Big Data-Dienste verwenden die Analyseergebnisse nicht nur dazu, individuelle Filmempfehlungen zu geben, sondern entscheiden vielmehr darüber, welche Inhalte überhaupt in das Portfolio eines Anbieters aufgenommen bzw. produziert werden. Auch wenn diese Dienste zu einer Optimierung von TV-Vermarktung führen, ist bis heute umstritten, inwiefern Big Data auch Mehrwert für Nutzer generiert. Auf der Sollseite stehen Überwachung, die Frageder Individualisierung und Rationalisierung des Konsums und generell die Kommodifizierung des Mediums.

Download Full-text

Performance analysis of classification Algorithms: A case study of Naïve Bayes and J48 in Big Data

Journal of Applied Mathematics and Computation ◽

10.26855/jamc.2018.02.003 ◽

2018 ◽

Vol 2 (2) ◽

Keyword(s):

Big Data ◽

Performance Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithms

Download Full-text

Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop

International Journal of Distributed and Cloud Computing ◽

10.21863/ijdcc/2015.3.2.009 ◽

2015 ◽

Vol 3 (2) ◽

Author(s):

Shalin Eliabeth S. ◽

Sarju S.

Keyword(s):

Big Data ◽

Data Privacy ◽

Privacy Preservation ◽

Experimental Result ◽

Map Reduce ◽

Distributed Environment ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Big Data Privacy

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

Navigating the Ethics of Big Data in Public Health

The Oxford Handbook of Public Health Ethics ◽

10.1093/oxfordhb/9780190245191.013.31 ◽

2019 ◽

pp. 353-367

Author(s):

Effy Vayena ◽

Lawrence Madoff

Keyword(s):

Public Health ◽

Big Data ◽

Credit Card ◽

Health Sector ◽

Public Health Research ◽

Ethical Challenges ◽

The Public ◽

Health Community ◽

Digital Disease Detection ◽

The Public Sphere

“Big data,” which encompasses massive amounts of information from both within the health sector (such as electronic health records) and outside the health sector (social media, search queries, cell phone metadata, credit card expenditures), is increasingly envisioned as a rich source to inform public health research and practice. This chapter examines the enormous range of sources, the highly varied nature of these data, and the differing motivations for their collection, which together challenge the public health community in ethically mining and exploiting big data. Ethical challenges revolve around the blurring of three previously clearer boundaries: between personal health data and nonhealth data; between the private and the public sphere in the online world; and, finally, between the powers and responsibilities of state and nonstate actors in relation to big data. Considerations include the implications for privacy, control and sharing of data, fair distribution of benefits and burdens, civic empowerment, accountability, and digital disease detection.

Download Full-text

Estimation of the Actual Incidence of Coronavirus Disease (COVID-19) in Emergent Hotspots: The Example of Hokkaido, Japan during February–March 2020

Journal of Clinical Medicine ◽

10.3390/jcm10112392 ◽

2021 ◽

Vol 10 (11) ◽

pp. 2392

Author(s):

Andrei R. Akhmetzhanov ◽

Kenji Mizumoto ◽

Sung-Mok Jung ◽

Natalie M. Linton ◽

Ryosuke Omori ◽

...

Keyword(s):

Cumulative Incidence ◽

Surveillance System ◽

Negative Binomial ◽

Actual Number ◽

Public Health Response ◽

The Public ◽

Health Response ◽

Using Data ◽

Insight Into ◽

Actual Incidence

Following the first report of the coronavirus disease 2019 (COVID-19) in Sapporo city, Hokkaido Prefecture, Japan, on 14 February 2020, a surge of cases was observed in Hokkaido during February and March. As of 6 March, 90 cases were diagnosed in Hokkaido. Unfortunately, many infected persons may not have been recognized due to having mild or no symptoms during the initial months of the outbreak. We therefore aimed to predict the actual number of COVID-19 cases in (i) Hokkaido Prefecture and (ii) Sapporo city using data on cases diagnosed outside these areas. Two statistical frameworks involving a balance equation and an extrapolated linear regression model with a negative binomial link were used for deriving both estimates, respectively. The estimated cumulative incidence in Hokkaido as of 27 February was 2,297 cases (95% confidence interval (CI): 382–7091) based on data on travelers outbound from Hokkaido. The cumulative incidence in Sapporo city as of 28 February was estimated at 2233 cases (95% CI: 0–4893) based on the count of confirmed cases within Hokkaido. Both approaches resulted in similar estimates, indicating a higher incidence of infections in Hokkaido than were detected by the surveillance system. This quantification of the gap between detected and estimated cases helped to inform the public health response at the beginning of the pandemic and provided insight into the possible scope of undetected transmission for future assessments.

Download Full-text

Punishment Strategies across Societies: Conventional Wisdoms Reconsidered

Games ◽

10.3390/g12030063 ◽

2021 ◽

Vol 12 (3) ◽

pp. 63

Author(s):

Ramzi Suleiman ◽

Yuval Samid

Keyword(s):

Public Goods ◽

Strong Predictor ◽

Social Environments ◽

Public Goods Game ◽

The Public ◽

Free Riders ◽

The World ◽

Emergence Of Cooperation ◽

Using Data ◽

The Cost

Experiments using the public goods game have repeatedly shown that in cooperative social environments, punishment makes cooperation flourish, and withholding punishment makes cooperation collapse. In less cooperative social environments, where antisocial punishment has been detected, punishment was detrimental to cooperation. The success of punishment in enhancing cooperation was explained as deterrence of free riders by cooperative strong reciprocators, who were willing to pay the cost of punishing them, whereas in environments in which punishment diminished cooperation, antisocial punishment was explained as revenge by low cooperators against high cooperators suspected of punishing them in previous rounds. The present paper reconsiders the generality of both explanations. Using data from a public goods experiment with punishment, conducted by the authors on Israeli subjects (Study 1), and from a study published in Science using sixteen participant pools from cities around the world (Study 2), we found that: 1. The effect of punishment on the emergence of cooperation was mainly due to contributors increasing their cooperation, rather than from free riders being deterred. 2. Participants adhered to different contribution and punishment strategies. Some cooperated and did not punish (‘cooperators’); others cooperated and punished free riders (‘strong reciprocators’); a third subgroup punished upward and downward relative to their own contribution (‘norm-keepers’); and a small sub-group punished only cooperators (‘antisocial punishers’). 3. Clear societal differences emerged in the mix of the four participant types, with high-contributing pools characterized by higher ratios of ‘strong reciprocators’, and ‘cooperators’, and low-contributing pools characterized by a higher ratio of ‘norm keepers’. 4. The fraction of ‘strong reciprocators’ out of the total punishers emerged as a strong predictor of the groups’ level of cooperation and success in providing the public goods.

Download Full-text

Paranoid styles and innumeracy: implications of a conspiracy mindset on Europeans' misperceptions about immigrants

Italian Political Science Review/Rivista Italiana di Scienza Politica ◽

10.1017/ipo.2021.26 ◽

2021 ◽

pp. 1-17

Author(s):

Sergio Martini ◽

Mattia Guidi ◽

Francesco Olmastroni ◽

Linda Basile ◽

Rossella Borri ◽

...

Keyword(s):

Demographic Factors ◽

European Countries ◽

Democratic Accountability ◽

Common Phenomenon ◽

Political Issues ◽

The Public ◽

Using Data ◽

Socio Demographic Factors ◽

Influence Perceptions ◽

Western Democracies

Abstract Innumeracy, that is, the inability to deal with numbers and provide correct estimates about political issues, is reported to be widespread among the public. Yet, despite the recognition that a conspiracy mindset is an increasingly common phenomenon in Western democracies, this has not been considered as a potential correlate of innumeracy. Using data from an online sample of respondents across 10 European countries, we show that those with a higher propensity to hold a conspiracy worldview tend to overestimate the actual share of the immigrant population living in their own country. This association holds true when accounting for country heterogeneity and other cognitive, affective and socio-demographic factors. Employing a comparative design and refined measurements, the article contributes to our understanding of how a conspiracy mentality may influence perceptions of relevant political facts, questioning basic processes of democratic accountability.

Download Full-text

Big Data in the philippines: How do we actually use them?

Statistical Journal of the IAOS ◽

10.3233/sji-210826 ◽

2021 ◽

pp. 1-30

Author(s):

Lisa Grace S. Bersales ◽

Josefina V. Almeda ◽

Sabrina O. Romasoc ◽

Marie Nadeen R. Martinez ◽

Dannela Jann B. Galias

Keyword(s):

Big Data ◽

High Speed ◽

The Philippines ◽

Complex Data ◽

Official Statistics ◽

Group Discussions ◽

The Public ◽

The Government ◽

The Many ◽

Current Utilization

With the advancement of technology, digitalization, and the internet of things, large amounts of complex data are being produced daily. This vast quantity of various data produced at high speed is referred to as Big Data. The utilization of Big Data is being implemented with success in the private sector, yet the public sector seems to be falling behind despite the many potentials Big Data has already presented. In this regard, this paper explores ways in which the government can recognize the use of Big Data for official statistics. It begins by gathering and presenting Big Data-related initiatives and projects across the globe for various types and sources of Big Data implemented. Further, this paper discusses the opportunities, challenges, and risks associated with using Big Data, particularly in official statistics. This paper also aims to assess the current utilization of Big Data in the country through focus group discussions and key informant interviews. Based on desk review, discussions, and interviews, the paper then concludes with a proposed framework that provides ways in which Big Data may be utilized by the government to augment official statistics.

Download Full-text