An Intellectual Methodology for Secure Health Record Mining and Risk Forecasting Using Clustering and Graph-Based Classification

Author(s):  
D. Shiny Irene ◽  
V. Surya ◽  
D. Kavitha ◽  
R. Shankar ◽  
S. John Justin Thangaraj

The objective of the research work is to analyze and validate health records and securing the personal information of patients is a challenging issue in health records mining. The risk prediction task was formulated with the label Cause of Death (COD) as a multi-class classification issue, which views health-related death as the “biggest risk.” This unlabeled data particularly describes the health conditions of the participants during the health examinations. It can differ tremendously between healthy and highly ill. Besides, the problems of distributed secure data management over privacy-preserving are considered. The proposed health record mining is in the following stages. In the initial stage, effective features such as fisher score, Pearson correlation, and information gain is calculated from the health records of the patient. Then, the average values are calculated for the extracted features. In the second stage, feature selection is performed from the average features by applying the Euclidean distance measure. The chosen features are clustered in the third stage using distance adaptive fuzzy c-means clustering algorithm (DAFCM). In the fourth stage, an entropy-based graph is constructed for the classification of data and it categorizes the patient’s record. At the last stage, for security, privacy preservation is applied to the personal information of the patient. This performance is matched against the existing methods and it gives better performance than the existing ones.

In the generic sense, Gene Selection methods are implemented upon a huge gene bank to decisively corner and expose certain genes that are indicative of say, diseases with their own set of classifications. The lightening surge about the DNA microarray dataset and its huge influence in the scientific realm has led different fields with the likes of Ecology, Bioinformatics, Computer Science, etc., making giant strides in their respective researches. DNA microarray research field threw open a desirable scope for path-breaking methods to be employed for gene selection, aimed at classifying those informative genes. Gene expression data classification is realized and aspired at the wake of a huge data size, boasting a usually miscellaneous yet a dissuasive composition that serves a challenge for data miners. The ideas and research work expressed below is a cohesive approach where a hybrid method linking every filter method (Information Gain / Pearson Correlation Coefficient / Relief-F) with that of wrapper (Genetic Algorithm / Forward Selection Backward Elimination / Practical Swam Optimization), through all permutation and combination, the accuracy of gene data (after being put through Support Vector Machine (SVM) model classifier) is optimized to the maximum and authenticated, yielding the optimum results in accordance with the requirements. Comparison between all filter, wrapper and hybrid methods are done by applying it on three microarray cancerous dataset.


Author(s):  
Ahmed Faeq Hussein ◽  
Abbas K. AlZubaidi ◽  
Qais Ahmed Habash ◽  
Mustafa Musa Jaber

A crucial role is played by personal biomedical data when it comes to maintaining proficient access to health records by patients as well as health professionals. However, it is difficult to get a unified view pertaining to health data that have been scattered across various health center/hospital sections. To be specific, health records are distributed across many places and cannot be found integrated easily. In recent years, blockchain is regarded as a promising explanation that helps to achieve individual biomedical information sharing in a secured way along with privacy preservation, because of its benefit of immutability. This research work put forwards a blockchain-based managing scheme that helps to establish interpretation improvements pertaining to electronic biomedical systems. In this scheme, two blockchain were employed to construct the base of it, where the second blockchain algorithm is used to generate a secure sequence for the hash key that generated in first blockchain algorithm. The adaptively feature enable the algorithm to use multiple data types and combine between various biomedical images and text records as well. All the data, including keywords, digital records as well as the identity of patients are private key encrypted along with keyword searching capability so as to maintain data privacy preservation, access control and protected search. The obtained results which show the low latency (less than 750 ms) at 400 requests / second indicate the ability to use it within several health care units such as hospitals and clinics.


2020 ◽  
Author(s):  
Andrea Sottani ◽  
Mara Meggiorin ◽  
Luís Ribeiro ◽  
Andrea Rinaldo

<p>In the presence of a groundwater monitoring network (GMN) of sensors aimed at measuring the hydraulic head in a given domain, the statistical analysis of time series not only provides insight into the general aquifer behaviour, but it can also return parameters useful to optimize and enhance the GMN’s efficiency.</p><p>Several methods to design new GMNs are available, but few of them are useful for optimizing existing networks. This study compares two methods in order to define pros and cons of their applicability and effectiveness.</p><p>They are carried out for the case study of the alluvial basin of the Bacchiglione river, near Vicenza (Veneto, Italy). The existing network comprises 92 groundwater data-loggers, installed in wells screening mostly the unconfined aquifer.</p><p>The first simple method, here proposed, is based on the Pearson correlation coefficient and the microscale parameter, which shows the time interval in which data are perfectly correlated. The coefficients were calculated between detrended time series. Firstly, based on the correlation coefficient threshold of 0.95, areas of intercorrelated couples are defined. They are characterized by similar hydrological behaviour, therefore it is sufficient to constantly monitor only one location in each area, while other interesting correlated points can be measured manually at longer sampling time. The microscale can be used to estimate this sampling time in order to see the water table trend (between 7 and 78 days in this domain), even if shorter oscillations are obviously missed and some peaks could remain unseen. This way, extra sensors can be moved to other critical areas, in order to improve the system knowledge.</p><p>The second method defines the seasonal Mann Kendall (sMK) test for detecting monotonic trends, that are used into Principal Component Analysis (PCA). Finally, a Hierarchical Clustering Analysis is carried out to group sensors with similar factors of the PCA. This method is more articulated than the previous one and entails some informed choices to be made about the distance measure and the clustering algorithm. Thanks to the sMK test and the PCA, a high insight of the system is achieved, however the clustering result may strongly variate depending on the expert’s knowledge and expectation.</p><p>The two proposed statistical analyses of hydrogeological data provide integrative decision support to improve representativeness and effectiveness of monitoring networks aimed at both qualitative and quantitative groundwater control.</p>


2020 ◽  
Author(s):  
Tamadur Shudayfat ◽  
Çağdaş Akyürek ◽  
Noha Al-Shdayfat ◽  
Hatem Alsaqqa

BACKGROUND Acceptance of Electronic Health Record systems is considered an essential factor for an effective implementation among the Healthcare providers. In an attempt to understand the healthcare providers’ perceptions on the Electronic Health Record systems implementation and evaluate the factors influencing healthcare providers’ acceptance of Electronic Health Records, the current research examines the effects of individual (user) context factors, and organizational context factors, using Technology Acceptance Model. OBJECTIVE The current research examines the effects of individual (user) context factors, and organizational context factors, using Technology Acceptance Model. METHODS A quantitative cross-sectional survey design was used, in which 319 healthcare providers from five public hospital participated in the present study. Data was collected using a self-administered questionnaire, which was based on the Technology Acceptance Model. RESULTS Jordanian healthcare providers demonstrated positive perceptions of the usefulness and ease of use of Electronic Health Record systems, and subsequently, they accepted the technology. The results indicated that they had a significant effect on the perceived usefulness and perceived ease of use of Electronic Health Record, which in turn was related to positive attitudes towards Electronic Health Record systems as well as the intention to use them. CONCLUSIONS User attributes, organizational competency, management support and training and education are essential variables in predicting healthcare provider’s acceptance toward Electronic Health records. These findings should be considered by healthcare organizations administration to introduce effective system to other healthcare organizations.


2021 ◽  
pp. 016555152110184
Author(s):  
Gunjan Chandwani ◽  
Anil Ahlawat ◽  
Gaurav Dubey

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Irene Pérez-Díez ◽  
Raúl Pérez-Moraga ◽  
Adolfo López-Cerdán ◽  
Jose-Maria Salinas-Serrano ◽  
María de la Iglesia-Vayá

Abstract Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.


2020 ◽  
Vol 17 (1) ◽  
Author(s):  
Vânia Rodrigues ◽  
Sérgio Deusdado

AbstractThe discovery of diagnostic or prognostic biomarkers is fundamental to optimize therapeutics for patients. By enhancing the interpretability of the prediction model, this work is aimed to optimize Leukemia diagnosis while retaining a high-performance evaluation in the identification of informative genes. For this purpose, we used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier. Pearson correlation and chi-squared statistic were the attribute evaluators applied on metalearners, having information gain as single-attribute evaluator. The implemented models relied on 10-fold cross-validation. The metalearners approach identified 12 common genes, with highest average merit of 0.999. The practical work was developed using the public datamining software WEKA.


2021 ◽  
Vol 15 (10) ◽  
pp. 2977-2981
Author(s):  
Merve Uca ◽  
Kenan Sivrikaya ◽  
Canatan Taşdemir

Aim: The purpose of this study was to explore the effects of exercise and smoking history of the COVID-19 patients on their recovery course and time. Methods: In this respect, as the data source, we observed a total of 310 patients, 176 males 134 females, who tested positive for COVID-19, had no chronic disease, and received inpatient or outpatient treatment. The patients also filled out a personal information form covering their demographic background, including smoking and exercise history. All participants received favipiravir as the standard medication, and their symptoms and the durations of these symptoms were evaluated using the focus group interview method. We analyzed the data on SPSS 17.0 utilizing Independent T-Test, one-way ANOVA, Chi-Square, and Pearson Correlation tests. Results: The results revealed significant differences between former smokers and those who never smoked and quitted smoking by recovery time (p<0.01). There were also significant differences between those doing exercises actively and those who never did or quitted exercise (p<0.01). Again, with regard to recovery time, we found significant differences between groups that quitted exercise in different periods (p<0.05) and between those with different weights (p<0.05). In addition, we reached smoking cessation time and exercise history had positive relationships with recovery time. Conclusion: Considering the results, we concluded that non-smoking and exercise had a positive impact on avoiding adverse effects of the COVID-19 disease. Keywords: Covid-19, exercise, smoking, sports, acute respiratory syndrome


2021 ◽  
Vol 13 (1) ◽  
pp. 20-39
Author(s):  
Ahmed Aloui ◽  
Okba Kazar

In mobile business (m-business), a client sends its exact locations to service providers. This data may involve sensitive and private personal information. As a result, misuse of location information by the third party location servers creating privacy issues for clients. This paper provides an overview of the privacy protection techniques currently applied by location-based mobile business. The authors first identify different system architectures and different protection goals. Second, this article provides an overview of the basic principles and mechanisms that exist to protect these privacy goals. In a third step, the authors provide existing privacy protection measures.


Sign in / Sign up

Export Citation Format

Share Document