Big Data and Privacy State of the Art

Author(s):  
Amine Rahmani

The phenomenon of big data (massive data mining) refers to the exponential growth of the volume of data available on the web. This new concept has become widely used in recent years, enabling scalable, efficient, and fast access to data anytime, anywhere, helping the scientific community and companies identify the most subtle behaviors of users. However, big data has its share of the limits of ethical issues and risks that cannot be ignored. Indeed, new risks in terms of privacy are just beginning to be perceived. Sometimes simply annoying, these risks can be really harmful. In the medium term, the issue of privacy could become one of the biggest obstacles to the growth of big data solutions. It is in this context that a great deal of research is under way to enhance security and develop mechanisms for the protection of privacy of users. Although this area is still in its infancy, the list of possibilities continues to grow.

Author(s):  
Amine Rahmani

The phenomenon of big data (massive data mining) refers to the exponential growth of the volume of data available on the web. This new concept has become widely used in recent years, enabling scalable, efficient, and fast access to data anytime, anywhere, helping the scientific community and companies identify the most subtle behaviors of users. However, big data has its share of the limits of ethical issues and risks that cannot be ignored. Indeed, new risks in terms of privacy are just beginning to be perceived. Sometimes simply annoying, these risks can be really harmful. In the medium term, the issue of privacy could become one of the biggest obstacles to the growth of big data solutions. It is in this context that a great deal of research is under way to enhance security and develop mechanisms for the protection of privacy of users. Although this area is still in its infancy, the list of possibilities continues to grow.


Author(s):  
Sunny Sharma ◽  
Manisha Malhotra

Web usage mining is the use of data mining techniques to analyze user behavior in order to better serve the needs of the user. This process of personalization uses a set of techniques and methods for discovering the linking structure of information on the web. The goal of web personalization is to improve the user experience by mining the meaningful information and presented the retrieved information in a way the user intends. The arrival of big data instigated novel issues to the personalization community. This chapter provides an overview of personalization, big data, and identifies challenges related to web personalization with respect to big data. It also presents some approaches and models to fill the gap between big data and web personalization. Further, this research brings additional opportunities to web personalization from the perspective of big data.


2017 ◽  
Vol 12 (01) ◽  
Author(s):  
Shweta Kaushik

Internet assumes an essential part in giving different learning sources to the world, which encourages numerous applications to give quality support of the customers. As the years go on the web is over-burden with parcel of data and it turns out to be difficult to extricate the applicable data from the web. This offers path to the advancement of the Big Data and the volume of the information continues expanding quickly step by step. Enormous Data has increased much consideration from the scholarly world and the IT business. In the advanced and figuring world, data is produced and gathered at a rate that quickly surpasses the limit go. Data mining procedures are utilized to locate the concealed data from the huge information. This Technique is utilized store, oversee, and investigate high speed of information and this information can be in any shape organized or unstructured frame. It is hard to handle substantial volume of information utilizing information base strategy like RDBMS. From one perspective, Big Data is amazingly important to deliver efficiency in organizations and transformative achievements in logical controls, which give us a considerable measure of chances to make incredible advances in many fields. There is most likely the future rivalries in business profitability and advances will without a doubt merge into the Big Data investigations. Then again, Big Data likewise emerges with many difficulties, for example, troubles in information catch, information stockpiling, information investigation and information perception. In this paper we concentrate on the audit of Big Data, its information order techniques and the way it can be mined utilizing different mining strategies.


2015 ◽  
Vol 4 (3) ◽  
pp. 143-152
Author(s):  
Lidong Wang ◽  
Guanghui Wang

Data mining is a process of extracting hidden, unknown, but potentially useful information from massive data. Big Data has great impacts on scientific discoveries and value creation. This paper introduces methods in data mining and technologies in Big Data. Challenges of data mining and data mining with big data are discussed. Some technology progress of data mining and data mining with big data are also presented.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jui-Chan Huang ◽  
Po-Chang Ko ◽  
Cher-Min Fong ◽  
Sn-Man Lai ◽  
Hsin-Hung Chen ◽  
...  

With the increase in the number of online shopping users, customer loyalty is directly related to product sales. This research mainly explores the statistical modeling and simulation of online shopping customer loyalty based on machine learning and big data analysis. This research mainly uses machine learning clustering algorithm to simulate customer loyalty. Call the k-means interactive mining algorithm based on the Hash structure to perform data mining on the multidimensional hierarchical tree of corporate credit risk, continuously adjust the support thresholds for different levels of data mining according to specific requirements and select effective association rules until satisfactory results are obtained. After conducting credit risk assessment and early warning modeling for the enterprise, the initial preselected model is obtained. The information to be collected is first obtained by the web crawler from the target website to the temporary web page database, where it will go through a series of preprocessing steps such as completion, deduplication, analysis, and extraction to ensure that the crawled web page is correctly analyzed, to avoid incorrect data due to network errors during the crawling process. The correctly parsed data will be stored for the next step of data cleaning or data analysis. For writing a Java program to parse HTML documents, first set the subject keyword and URL and parse the HTML from the obtained file or string by analyzing the structure of the website. Secondly, use the CSS selector to find the web page list information, retrieve the data, and store it in Elements. In the overall fit test of the model, the root mean square error approximation (RMSEA) value is 0.053, between 0.05 and 0.08. The results show that the model designed in this study achieves a relatively good fitting effect and strengthens customers’ perception of shopping websites, and relationship trust plays a greater role in maintaining customer loyalty.


Author(s):  
Mohammad Hossein Tekieh ◽  
Bijan Raahemi ◽  
Eric I. Benchimol

Big data analytics has been introduced as a set of scalable, distributed algorithms optimized for analysis of massive data in parallel. There are many prospective applications of data mining in healthcare. In this chapter, the authors investigate whether health data exhibits characteristics of big data, and accordingly, whether big data analytics can leverage the data mining applications in healthcare. To answer this interesting question, potential applications are divided into four categories, and each category into sub-categories in a tree structure. The available types of health data are specified, with a discussion of the applicable dimensions of big data for each sub-category. The authors conclude that big data analytics can provide more advantages for the quality of analysis in particular categories of applications of data mining in healthcare, while having less efficacy for other categories.


2015 ◽  
Vol 14 (03) ◽  
pp. 1550019
Author(s):  
Amina Madani ◽  
Omar Boussaid ◽  
Djamel Eddine Zegour

Twitter is a popular micro-blogging service, and one of the main means of spreading ideas and information throughout the web. In this system, participants post short status messages called tweets that are often available publicly. Recently, the exponential growth of tweets has started to draw the attention of researchers from various disciplines. Numerous research approaches in the data mining field have examined Twitter. How to automatically extract useful information from tweets has therefore become an important research topic. The aim of this paper is to bring up what's up which is a new approach of tweets mining. It is a more general approach that discovers many different trending topics from tweets in real-time. Trending topics have generated big interest not only for the users of Twitter but also for information seekers. Our trending topics are detected for a specific geographic town and compared with the top trending topics shown on Twitter. They are presented by labelled clusters that constitute an accurate description of each trending topic. Each cluster is labelled by an emerging trending topic and is composed of keywords that represent the properties of the trending topic.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Jiajia Chen ◽  
Fuliang Qian ◽  
Wenying Yan ◽  
Bairong Shen

Next generation sequencing and other high-throughput experimental techniques of recent decades have driven the exponential growth in publicly available molecular and clinical data. This information explosion has prepared the ground for the development of translational bioinformatics. The scale and dimensionality of data, however, pose obvious challenges in data mining, storage, and integration. In this paper we demonstrated the utility and promise of cloud computing for tackling the big data problems. We also outline our vision that cloud computing could be an enabling tool to facilitate translational bioinformatics research.


2019 ◽  
pp. 703-717
Author(s):  
Mohammad Hossein Tekieh ◽  
Bijan Raahemi ◽  
Eric I. Benchimol

Big data analytics has been introduced as a set of scalable, distributed algorithms optimized for analysis of massive data in parallel. There are many prospective applications of data mining in healthcare. In this chapter, the authors investigate whether health data exhibits characteristics of big data, and accordingly, whether big data analytics can leverage the data mining applications in healthcare. To answer this interesting question, potential applications are divided into four categories, and each category into sub-categories in a tree structure. The available types of health data are specified, with a discussion of the applicable dimensions of big data for each sub-category. The authors conclude that big data analytics can provide more advantages for the quality of analysis in particular categories of applications of data mining in healthcare, while having less efficacy for other categories.


Sign in / Sign up

Export Citation Format

Share Document