scholarly journals Efficient Learning From Two-Class Categorical Imbalanced Healthcare Data

Author(s):  
Lincy Mathews ◽  
HariSeetha

When data classes are differently represented in one v. other data segment to be mined, it generates the imbalanced two-class data challenge. Many health-related datasets comprising categorical data are faced with the class imbalance challenge. This paper aims to address the limitations of imbalanced two-class categorical data and presents a re-sampling solution known as ‘Syn_Gen_Min' (SGM) to improve the class imbalance ratio. SGM involves finding the greedy neighbors for a given minority sample. To the best of one's knowledge, the accepted approach for a classifier is to find the numeric equivalence for categorical attributes, resulting in the loss of information. The novelty of this contribution is that the categorical attributes are kept in their raw form. Five distinct categorical similarity measures are employed and tested against six real-world datasets derived within the healthcare sector. The application of these similarity methods leads to the generation of different synthetic samples, which has significantly improved the performance measures of the classifier. This work further proves that there is no generic similarity measure that fits all datasets.

Author(s):  
S. Karthiga Devi ◽  
B. Arputhamary

Today the volume of healthcare data generated increased rapidly because of the number of patients in each hospital increasing.  These data are most important for decision making and delivering the best care for patients. Healthcare providers are now faced with collecting, managing, storing and securing huge amounts of sensitive protected health information. As a result, an increasing number of healthcare organizations are turning to cloud based services. Cloud computing offers a viable, secure alternative to premise based healthcare solutions. The infrastructure of Cloud is characterized by a high volume storage and a high throughput. The privacy and security are the two most important concerns in cloud-based healthcare services. Healthcare organization should have electronic medical records in order to use the cloud infrastructure. This paper surveys the challenges of cloud in healthcare and benefits of cloud techniques in health care industries.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
S Svanholm ◽  
E Viitasara ◽  
H Carlerby

Abstract Background Previous research has indicated that migrants risk facing inequities both internationally and in Sweden; integration policies are therefore important to study. How health is described in policies affects how health interventions are approached. A discourse analysis offers a way of understanding how health is framed within the integration policies of the Establishment Program. The aim was to critically analyse the health discourses used in Swedish and European Union (EU) integration policies. Methods A critical discourse analysis, inspired by Fairclough, was performed on integration policies related to Sweden, on local, regional, national and the EU level. The policies of the Establishment Program, which focuses on newly arrived migrants (refugees, persons of subsidiary protection and their relatives who arrived through family reunification), were chosen for the analysis, and 17 documents were analysed in total. Results The analysis of the documents showed that although no definition of health was presented, health discourses were expressed in the form of the medicalization of health and the individualization of health. This not only by the terminology used, but also in how the healthcare sector was considered responsible for any health related issue and how individual health behaviours were of focus in interventions to promote health. Conclusions A pathogenic approach to health was visible in the policies and individual disease prevention was the main health focus. The results showed similarities to previous research highlighting how a particular understanding of health in a neoliberal context is formed. Key messages Health as a resource is missing in the integration policy documents. Viewing health as an individual quality puts the responsibility of promoting health on the individual.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 184
Author(s):  
Xia Que ◽  
Siyuan Jiang ◽  
Jiaoyun Yang ◽  
Ning An

Many mixed datasets with both numerical and categorical attributes have been collected in various fields, including medicine, biology, etc. Designing appropriate similarity measurements plays an important role in clustering these datasets. Many traditional measurements treat various attributes equally when measuring the similarity. However, different attributes may contribute differently as the amount of information they contained could vary a lot. In this paper, we propose a similarity measurement with entropy-based weighting for clustering mixed datasets. The numerical data are first transformed into categorical data by an automatic categorization technique. Then, an entropy-based weighting strategy is applied to denote the different importances of various attributes. We incorporate the proposed measurement into an iterative clustering algorithm, and extensive experiments show that this algorithm outperforms OCIL and K-Prototype methods with 2.13% and 4.28% improvements, respectively, in terms of accuracy on six mixed datasets from UCI.


2021 ◽  
Author(s):  
Antonios Makris ◽  
Camila Leite da Silva ◽  
Vania Bogorny ◽  
Luis Otavio Alvares ◽  
Jose Antonio Macedo ◽  
...  

AbstractDuring the last few years the volumes of the data that synthesize trajectories have expanded to unparalleled quantities. This growth is challenging traditional trajectory analysis approaches and solutions are sought in other domains. In this work, we focus on data compression techniques with the intention to minimize the size of trajectory data, while, at the same time, minimizing the impact on the trajectory analysis methods. To this extent, we evaluate five lossy compression algorithms: Douglas-Peucker (DP), Time Ratio (TR), Speed Based (SP), Time Ratio Speed Based (TR_SP) and Speed Based Time Ratio (SP_TR). The comparison is performed using four distinct real world datasets against six different dynamically assigned thresholds. The effectiveness of the compression is evaluated using classification techniques and similarity measures. The results showed that there is a trade-off between the compression rate and the achieved quality. The is no “best algorithm” for every case and the choice of the proper compression algorithm is an application-dependent process.


Author(s):  
Pijush Kanti Dutta Pramanik ◽  
Saurabh Pal ◽  
Moutan Mukhopadhyay

Like other fields, the healthcare sector has also been greatly impacted by big data. A huge volume of healthcare data and other related data are being continually generated from diverse sources. Tapping and analysing these data, suitably, would open up new avenues and opportunities for healthcare services. In view of that, this paper aims to present a systematic overview of big data and big data analytics, applicable to modern-day healthcare. Acknowledging the massive upsurge in healthcare data generation, various ‘V's, specific to healthcare big data, are identified. Different types of data analytics, applicable to healthcare, are discussed. Along with presenting the technological backbone of healthcare big data and analytics, the advantages and challenges of healthcare big data are meticulously explained. A brief report on the present and future market of healthcare big data and analytics is also presented. Besides, several applications and use cases are discussed with sufficient details.


In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.


Author(s):  
Sheik Abdullah A. ◽  
Selvakumar S. ◽  
Parkavi R. ◽  
Suganya R. ◽  
Abirami A. M.

The importance of big data over analytics made the process of solving various real-world problems simpler. The big data and data science tool box provided a realm of data preparation, data analysis, implementation process, and solutions. Data connections over any data source, data preparation for analysis has been made simple with the availability of tremendous tools in data analytics package. Some of the analytical tools include R programming, python programming, rapid analytics, and weka. The patterns and the granularity over the observed data can be fetched with the visualizations and data observations. This chapter provides an insight regarding the types of analytics in a big data perspective with the realm in applicability towards healthcare data. Also, the processing paradigms and techniques can be clearly observed through the chapter contents.


2020 ◽  
pp. 1989-2001
Author(s):  
Wafaa Faisal Mukhtar ◽  
Eltayeb Salih Abuelyaman

Healthcare big data streams from multiple information sources at an alarming volume, velocity, and variety. The challenge that faces the healthcare industry is extracting meaningful value from such sources. This chapter investigates the diversity and forms of data in the healthcare sector, reviews the methods used to search and analyze these data throughout the past years, and the use of machine learning and data mining techniques to mine useful knowledge from such data. The chapter will also highlight innovations of particular systems and tools which spot the fine approaches for different healthcare data, raise the standard of care and recap the tools and data collection methods. The authors emphasize some of ethical issues regarding processing these records and some data privacy issues.


Author(s):  
Hongzuo Xu ◽  
Yongjun Wang ◽  
Zhiyue Wu ◽  
Yijie Wang

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.


2020 ◽  
Vol 34 (04) ◽  
pp. 6518-6525
Author(s):  
Xiao Xu ◽  
Fang Dong ◽  
Yanghua Li ◽  
Shaojian He ◽  
Xin Li

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.


Sign in / Sign up

Export Citation Format

Share Document