distributed data mining Latest Research Papers

An ensemble random forest algorithm for privacy preserving distributed medical data mining

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101oa06 ◽

2021 ◽

Vol 12 (6) ◽

pp. 0-0

Keyword(s):

Data Mining ◽

Random Forest ◽

Imbalanced Data ◽

Distributed Data Mining ◽

Sensitive Information ◽

Distributed Data ◽

Sensitive Data ◽

Health Records ◽

Data Mining Approach ◽

Proposed Model

As the voluminous amount of data is generated because of inexorably widespread proliferation of electronic data maintained using the Electronic Health Records (EHRs). Medical health facilities have great potential to discern the patterns from this data and utilize them in diagnosing a specific disease or predicting outbreak of an epidemic etc. This discern of patterns might reveal sensitive information about individuals and this information is vulnerable to misuse. This is, however, a challenging task to share such sensitive data as it compromises the privacy of patients. In this paper, a random forest-based distributed data mining approach is proposed. Performance of the proposed model is evaluated using accuracy, f-measure and appa statistics analysis. Experimental results reveal that the proposed model is efficient and scalable enough in both performance and accuracy within the imbalanced data and also in maintaining the privacy by sharing only useful healthcare knowledge in the form of local models without revealing and sharing of sensitive data.

An Ensemble Random Forest Algorithm for Privacy Preserving Distributed Medical Data Mining

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa8 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-23

Author(s):

Musavir Hassan ◽

Muheet Ahmed Butt ◽

Majid Zaman

Keyword(s):

Data Mining ◽

Random Forest ◽

Imbalanced Data ◽

Distributed Data Mining ◽

Sensitive Information ◽

Distributed Data ◽

Sensitive Data ◽

Health Records ◽

Data Mining Approach ◽

Proposed Model

As the voluminous amount of data is generated because of inexorably widespread proliferation of electronic data maintained using the Electronic Health Records (EHRs). Medical health facilities have great potential to discern the patterns from this data and utilize them in diagnosing a specific disease or predicting outbreak of an epidemic etc. This discern of patterns might reveal sensitive information about individuals and this information is vulnerable to misuse. This is, however, a challenging task to share such sensitive data as it compromises the privacy of patients. In this paper, a random forest-based distributed data mining approach is proposed. Performance of the proposed model is evaluated using accuracy, f-measure and appa statistics analysis. Experimental results reveal that the proposed model is efficient and scalable enough in both performance and accuracy within the imbalanced data and also in maintaining the privacy by sharing only useful healthcare knowledge in the form of local models without revealing and sharing of sensitive data.

A systematic review on privacy-preserving distributed data mining

Data Science ◽

10.3233/ds-210036 ◽

2021 ◽

Vol 4 (2) ◽

pp. 121-150

Author(s):

Chang Sun ◽

Lianne Ippel ◽

Andre Dekker ◽

Michel Dumontier ◽

Johan van Soest

Keyword(s):

Systematic Review ◽

Data Mining ◽

Real Life ◽

Past Research ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Sensitive Information ◽

Distributed Data ◽

Multiple Sources ◽

Privacy And Security

Combining and analysing sensitive data from multiple sources offers considerable potential for knowledge discovery. However, there are a number of issues that pose problems for such analyses, including technical barriers, privacy restrictions, security concerns, and trust issues. Privacy-preserving distributed data mining techniques (PPDDM) aim to overcome these challenges by extracting knowledge from partitioned data while minimizing the release of sensitive information. This paper reports the results and findings of a systematic review of PPDDM techniques from 231 scientific articles published in the past 20 years. We summarize the state of the art, compare the problems they address, and identify the outstanding challenges in the field. This review identifies the consequence of the lack of standard criteria to evaluate new PPDDM methods and proposes comprehensive evaluation criteria with 10 key factors. We discuss the ambiguous definitions of privacy and confusion between privacy and security in the field, and provide suggestions of how to make a clear and applicable privacy description for new PPDDM techniques. The findings from our review enhance the understanding of the challenges of applying theoretical PPDDM methods to real-life use cases, and the importance of involving legal-ethical and social experts in implementing PPDDM methods. This comprehensive review will serve as a helpful guide to past research and future opportunities in the area of PPDDM.

A Survey of Data Mining Activities in Distributed Systems

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i430267 ◽

2021 ◽

pp. 1-18

Author(s):

Waleed A. Mohammad ◽

Hajar Maseeh Yasin ◽

Azar Abid Salih ◽

Adel AL-Zebari ◽

Naaman Omar ◽

...

Keyword(s):

Data Mining ◽

Distributed Systems ◽

Distributed Data Mining ◽

Distributed Data ◽

Sequence Mining ◽

Distributed Clustering ◽

Usable Information ◽

Single Location ◽

Processing Ability ◽

Mining Methods

Distributed systems, which may be utilized to do computations, are being developed as a result of the fast growth of sharing resources. Data mining, which has a huge range of real applications, provides significant techniques for extracting meaningful and usable information from massive amounts of data. Traditional data mining methods, on the other hand, suppose that the data is gathered centrally, stored in memory, and is static. Managing massive amounts of data and processing them with limited resources is difficult. Large volumes of data, for instance, are swiftly generated and stored in many locations. This becomes increasingly costly to centralize them at a single location. Furthermore, traditional data mining methods typically have several issues and limitations, such as memory restrictions, limited processing ability, and insufficient hard drive space, among others. To overcome the following issues, distributed data mining's have emerged as a beneficial option in several applications According to several authors, this research provides a study of state-of-the-art distributed data mining methods, such as distributed common item-set mining, distributed frequent sequence mining, technical difficulties with distributed systems, distributed clustering, as well as privacy-protection distributed data mining. Furthermore, each work is evaluated and compared to the others.

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques

2021 7th International Conference on Electrical Energy Systems (ICEES) ◽

10.1109/icees51510.2021.9383673 ◽

2021 ◽

Author(s):

Venkatesh Kumar Marimuthu ◽

C. Lakshmi

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data ◽

Cryptographic Techniques

A differentially private distributed data mining scheme with high efficiency for edge computing

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00225-3 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Xianwen Sun ◽

Ruzhi Xu ◽

Longfei Wu ◽

Zhitao Guan

Keyword(s):

Data Mining ◽

Differential Privacy ◽

Edge Computing ◽

Distributed Data Mining ◽

Good Prediction ◽

Distributed Data ◽

Good Prediction Accuracy ◽

Wide Range ◽

Ensemble Strategy ◽

Mining Scheme

AbstractA wide range of data mining applications benefit from the low latency offered by edge computing. However, edge computing suffers from limited computing resources, which inhibits the applications of the computationally expensive data mining methods. In the edge-cloud environment, usually, the participants turn to collaboratively train machine-learning models that yield more accurate prediction results. However, data owners may not be willing to sharing the own data for the privacy concerns. To handle such disparate goals, we focus on tree-based distributed data mining scheme with differential privacy, which is computationally friendly. The basic idea of our approach is based on a distributed ensemble strategy. Each participant builds an elegant decision model based on their own data, which has a good tradeoff between the computation and the accuracy of the data distribution, and shares it with other participants after being injected with the elaborate noise. Then the useful knowledge transferred from the decision models is acquired by other participants in an adaptive ensemble strategy. Both the theoretical analysis and the experiments show that our scheme provides an efficient data mining manner that can achieve a good prediction accuracy while providing rigorous privacy guarantee over the distributed data.

Multi-Agent System Combined with Distributed Data Mining for Mutual Collaboration Classification

IEEE Access ◽

10.1109/access.2021.3074125 ◽

2021 ◽

pp. 1-1

Author(s):

Mais Haj Qasem ◽

Nadim Obeid ◽

Amjad Hudaib ◽

Mohammed Amin Almaiah ◽

Ali Al-Zahrani ◽

...

Keyword(s):

Data Mining ◽

Distributed Data Mining ◽

Distributed Data ◽

Multi Agent System ◽

Agent System ◽

Multi Agent

A hybrid-security model for privacy-enhanced distributed data mining

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2020.06.010 ◽

2020 ◽

Author(s):

Tanzeela Javid ◽

Manoj Kumar Gupta ◽

Abhishek Gupta

Keyword(s):

Data Mining ◽

Distributed Data Mining ◽

Distributed Data ◽

Security Model

Windowing as a Sub-Sampling Method for Distributed Data Mining

Mathematical and Computational Applications ◽

10.3390/mca25030039 ◽

2020 ◽

Vol 25 (3) ◽

pp. 39

Author(s):

David Martínez-Galicia ◽

Alejandro Guerra-Hernández ◽

Nicandro Cruz-Ramírez ◽

Xavier Limón ◽

Francisco Grimaldo

Keyword(s):

Data Mining ◽

Decision Trees ◽

Negative Correlation ◽

Sampling Method ◽

Minimum Description Length ◽

Distributed Data Mining ◽

Distributed Data ◽

Strong Negative Correlation ◽

Original Dataset ◽

Leibler Divergence

Windowing is a sub-sampling method, originally proposed to cope with large datasets when inducing decision trees with the ID3 and C4.5 algorithms. The method exhibits a strong negative correlation between the accuracy of the learned models and the number of examples used to induce them, i.e., the higher the accuracy of the obtained model, the fewer examples used to induce it. This paper contributes to a better understanding of this behavior in order to promote windowing as a sub-sampling method for Distributed Data Mining. For this, the generalization of the behavior of windowing beyond decision trees is established, by corroborating the observed negative correlation when adopting inductive algorithms of different nature. Then, focusing on decision trees, the windows (samples) and the obtained models are analyzed in terms of Minimum Description Length (MDL), Area Under the ROC Curve (AUC), Kulllback–Leibler divergence, and the similitude metric Sim1; and compared to those obtained when using traditional methods: random, balanced, and stratified samplings. It is shown that the aggressive sampling performed by windowing, up to 3% of the original dataset, induces models that are significantly more accurate than those obtained from the traditional sampling methods, among which only the balanced sampling is comparable in terms of AUC. Although the considered informational properties did not correlate with the obtained accuracy, they provide clues about the behavior of windowing and suggest further experiments to enhance such understanding and the performance of the method, i.e., studying the evolution of the windows over time.

Deep learning with LSTM based distributed data mining model for energy efficient wireless sensor networks

Physical Communication ◽

10.1016/j.phycom.2020.101097 ◽

2020 ◽

Vol 40 ◽

pp. 101097 ◽

Cited By ~ 6

Author(s):

Sachi Nandan Mohanty ◽

E. Laxmi Lydia ◽

Mohamed Elhoseny ◽

Majid M. Gethami Al Otaibi ◽

K. Shankar

Keyword(s):

Data Mining ◽

Wireless Sensor Networks ◽

Deep Learning ◽

Sensor Networks ◽

Energy Efficient ◽

Wireless Sensor ◽

Distributed Data Mining ◽

Distributed Data ◽

Mining Model

distributed data mining
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

An ensemble random forest algorithm for privacy preserving distributed medical data mining

An Ensemble Random Forest Algorithm for Privacy Preserving Distributed Medical Data Mining

A systematic review on privacy-preserving distributed data mining

A Survey of Data Mining Activities in Distributed Systems

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques

A differentially private distributed data mining scheme with high efficiency for edge computing

Multi-Agent System Combined with Distributed Data Mining for Mutual Collaboration Classification

A hybrid-security model for privacy-enhanced distributed data mining

Windowing as a Sub-Sampling Method for Distributed Data Mining

Deep learning with LSTM based distributed data mining model for energy efficient wireless sensor networks

Export Citation Format

distributed data miningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

An ensemble random forest algorithm for privacy preserving distributed medical data mining

An Ensemble Random Forest Algorithm for Privacy Preserving Distributed Medical Data Mining

A systematic review on privacy-preserving distributed data mining

A Survey of Data Mining Activities in Distributed Systems

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques

A differentially private distributed data mining scheme with high efficiency for edge computing

Multi-Agent System Combined with Distributed Data Mining for Mutual Collaboration Classification

A hybrid-security model for privacy-enhanced distributed data mining

Windowing as a Sub-Sampling Method for Distributed Data Mining

Deep learning with LSTM based distributed data mining model for energy efficient wireless sensor networks

distributed data mining
Recently Published Documents