scholarly journals Differential privacy based classification model for mining medical data stream using adaptive random forest

2021 ◽  
Vol 13 (1) ◽  
pp. 1-20
Author(s):  
Hayder K. Fatlawi ◽  
Attila Kiss

Abstract Most typical data mining techniques are developed based on training the batch data which makes the task of mining the data stream represent a significant challenge. On the other hand, providing a mechanism to perform data mining operations without revealing the patient’s identity has increasing importance in the data mining field. In this work, a classification model with differential privacy is proposed for mining the medical data stream using Adaptive Random Forest (ARF). The experimental results of applying the proposed model on four medical datasets show that ARF mostly has a more stable performance over the other six techniques.

Symmetry ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 984
Author(s):  
Sheenam Jain ◽  
Vijay Kumar

The apparel industry houses a huge amount and variety of data. At every step of the supply chain, data is collected and stored by each supply chain actor. This data, when used intelligently, can help with solving a good deal of problems for the industry. In this regard, this article is devoted to the application of data mining on the industry’s product data, i.e., data related to a garment, such as fabric, trim, print, shape, and form. The purpose of this article is to use data mining and symmetry-based learning techniques on product data to create a classification model that consists of two subsystems: (1) for predicting the garment category and (2) for predicting the garment sub-category. Classification techniques, such as Decision Trees, Naïve Bayes, Random Forest, and Bayesian Forest were applied to the ‘Deep Fashion’ open-source database. The data contain three garment categories, 50 garment sub-categories, and 1000 garment attributes. The two subsystems were first trained individually and then integrated using soft classification. It was observed that the performance of the random forest classifier was comparatively better, with an accuracy of 86%, 73%, 82%, and 90%, respectively, for the garment category, and sub-categories of upper body garment, lower body garment, and whole-body garment.


2020 ◽  
Author(s):  
Huanhuan Wang ◽  
Xiang Wu ◽  
Yongqi Tan ◽  
Hongsheng Yin ◽  
Xiaochun Cheng ◽  
...  

BACKGROUND Medical data mining and sharing is an important process to realize the value of medical big data in E-Health applications. However, medical data contains a large amount of personal private information of patients, there is a risk of privacy disclosure when sharing and mining. Therefore, how to ensure the security of medical big data in the process of publishing, sharing and mining has become the focus of current researches. OBJECTIVE The objective of our study is to design a framework based on differential privacy protection mechanism to ensure the security sharing of medical data. We developed a privacy Protection Query Language (PQL) that can integrate multiple machine mining methods and provide secure sharing functions for medical data. METHODS This paper adopts a modular design method with three sub-modules, including parsing module, mining module and noising module. Each module encapsulates different computing devices, such as composite parser, noise jammer, etc. In the PQL framework, we apply the differential privacy mechanism to the results of the module collaborative calculation to optimize the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. RESULTS Designed and developed a query language framework that provides medical data mining, sharing and privacy preserving functions. We theoretically proved the performance of the PQL framework. The experimental results showed that the PQL framework can ensure the security of each mining result, and the average usefulness of the output results is above 97%. CONCLUSIONS We presented a security framework that enables medical data providers to securely share the health data or treatment data, and developed a usable query language based on differential privacy mechanism that enables researchers to mine potential information securely using data mining algorithms. CLINICALTRIAL


2020 ◽  
Vol 9 (1) ◽  
pp. 37-49
Author(s):  
Hugo Peixoto ◽  
Lara Silva ◽  
Soraia Pereira ◽  
Tiago Jesus ◽  
Vitor Neves Lopes ◽  
...  

Peptic ulcers are not the most common complication in gastrointestinal mucosa, but these defects stand out as being the complication with the highest mortality rate. Several scoring systems based on clinical and biochemical parameters, such as the Boey and PULP scoring system have been developed to predict the probability of mortality. In this study, a data mining process is performed in the medical data available, in order to evaluate how the scoring systems perform when trying to predict mortality and patients' state complication. Furthermore, the presented paper studies the two scoring systems presented to define which one outperforms the other. On one hand PULP scoring allows a better mortality prediction achieving, above a 90% accuracy. One the other hand, regarding complications, the Boey system achieves better results leading to a better prediction when it comes to predicting patients' state complication.


Author(s):  
Hanane Menad ◽  
Abdelmalek Amine

Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. Bio-inspired algorithms is a new field of research. Its main advantage is knitting together subfields related to the topics of connectionism, social behavior, and emergence. Briefly put, it is the use of computers to model living phenomena and simultaneously the study of life to improve the usage of computers. In this chapter, the authors present an application of four bio-inspired algorithms and meta heuristics for classification of seven different real medical data sets. Two of these algorithms are based on similarity calculation between training and test data while the other two are based on random generation of population to construct classification rules. The results showed a very good efficiency of bio-inspired algorithms for supervised classification of medical data.


Author(s):  
Alice Constance Mensah ◽  
Isaac Ofori Asare

Breast cancer is the most common of all cancers and is the leading cause of cancer deaths in women worldwide. The classification of breast cancer data can be useful to predict the outcome of some diseases or discover the genetic behavior of tumors. Data mining technology helps in classifying cancer patients and this technique helps to identify potential cancer patients by simply analyzing the data. This study examines the determinant factors of breast cancer and measures the breast cancer patient data to build a useful classification model using a data mining approach. In this study of 2397 women, 1022 (42.64%) were diagnosed with breast cancer. Among the four main learning techniques such as: Random Forest, Naive Bayes, Classification and Regression Model (CART), and Boosted Tree model were used for the study. The Random Forest technique had the better accuracy value of 0.9892(95%CI,0.9832 -0.9935) and a sensitivity value of about 92%. This means that the Random Forest learning model is the best model to classify and predict breast cancer based on associated factors.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Zhe Ding ◽  
Zhen Qin ◽  
Zhiguang Qin

Data mining techniques are applied to identify hidden patterns in large amounts of patient data. These patterns can assist physicians in making more accurate diagnosis. For different physical conditions of patients, the same physiological index corresponds to a different symptom association probability for each patient. Data mining technologies based on certain data cannot be directly applied to these patients’ data. Patient data are sensitive data. An adversary with sufficient background information can make use of the patterns mined from uncertain medical data to obtain the sensitive information of patients. In this paper, a new algorithm is presented to determine the top K most frequent itemsets from uncertain medical data and to protect data privacy. Based on traditional algorithms for mining frequent itemsets from uncertain data, our algorithm applies sparse vector algorithm and the Laplace mechanism to ensure differential privacy for the top K most frequent itemsets for uncertain medical data and the expected supports of these frequent itemsets. We prove that our algorithm can guarantee differential privacy in theory. Moreover, we carry out experiments with four real-world scenario datasets and two synthetic datasets. The experimental results demonstrate the performance of our algorithm.


Author(s):  
Yuto Omae ◽  
Tatsuro Furuya ◽  
Kazutaka Mizukoshi ◽  
Takayuki Oshima ◽  
Norihisa Sakakibara ◽  
...  

We aim to develop a real-time feedback system of learning strategies during lesson time to improve academic achievement. It has been known that mutual viewing-based learning is an effective educational method. However, even though mutual viewing is an effective lesson style, there are effective or ineffective learning strategies in the learners’ individual activities. In general, the method of evaluating learning strategies is a questionnaire survey. However, the questionnaire cannot measure the learning strategies in real time. Thus, it is difficult to detect the students who use ineffective learning strategies during lesson time in real time. Recently, a system that can measure the learning strategies in real time has been developed. Using this system, it is possible to detect students who use ineffective learning strategies during lesson time on the mutual viewing-based learning. From this point of view, we aim to develop a recommendation system for real-time learning strategies for teachers and students to achieve a highly educational effect. For this purpose, we must know the features of effective or ineffective learning strategies via a system that can measure learning strategies. In this paper, we report the discovery of features of effective or ineffective learning strategies based on the data-mining approach using thek-means method, transition diagram, and random forest. We classified the time-series learning strategies over 40 min into 216 strategies and surveyed the improvement probability of academic achievement via a random-forest-based classification model. By embedding our results into the system, we may be able to automatically detect students who use ineffective learning strategies and recommend effective learning strategies.


2021 ◽  
Vol 11 (18) ◽  
pp. 8596
Author(s):  
Swetha Chittam ◽  
Balakrishna Gokaraju ◽  
Zhigang Xu ◽  
Jagannathan Sankar ◽  
Kaushik Roy

There is a high need for a big data repository for material compositions and their derived analytics of metal strength, in the material science community. Currently, many researchers maintain their own excel sheets, prepared manually by their team by tabulating the experimental data collected from scientific journals, and analyzing the data by performing manual calculations using formulas to determine the strength of the material. In this study, we propose a big data storage for material science data and its processing parameters information to address the laborious process of data tabulation from scientific articles, data mining techniques to retrieve the information from databases to perform big data analytics, and a machine learning prediction model to determine material strength insights. Three models are proposed based on Logistic regression, Support vector Machine SVM and Random Forest Algorithms. These models are trained and tested using a 10-fold cross validation approach. The Random Forest classification model performed better on the independent dataset, with 87% accuracy in comparison to Logistic regression and SVM with 72% and 78%, respectively.


2018 ◽  
Vol 5 (1) ◽  
pp. 47-55
Author(s):  
Florensia Unggul Damayanti

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.


Sign in / Sign up

Export Citation Format

Share Document