data skewness Latest Research Papers

Abstract Background Prediction of length of stay (LOS) at admission time can provide physicians and nurses insight into the illness severity of patients and aid them in avoiding adverse events and clinical deterioration. It also assists hospitals with more effectively managing their resources and manpower. Methods In this field of research, there are some important challenges, such as missing values and LOS data skewness. Moreover, various studies use a binary classification which puts a wide range of patients with different conditions into one category. To address these shortcomings, first multivariate imputation techniques are applied to fill incomplete records, then two proper resampling techniques, namely Borderline-SMOTE and SMOGN, are applied to address data skewness in the classification and regression domains, respectively. Finally, machine learning (ML) techniques including neural networks, extreme gradient boosting, random forest, support vector machine, and decision tree are implemented for both approaches to predict LOS of patients admitted to the Emergency Department of Odense University Hospital between June 2018 and April 2019. The ML models are developed based on data obtained from patients at admission time, including pulse rate, arterial blood oxygen saturation, respiratory rate, systolic blood pressure, triage category, arrival ICD-10 codes, age, and gender. Results The performance of predictive models before and after addressing missing values and data skewness is evaluated using four evaluation metrics namely receiver operating characteristic, area under the curve (AUC), R-squared score (R2), and normalized root mean square error (NRMSE). Results show that the performance of predictive models is improved on average by 15.75% for AUC, 32.19% for R2 score, and 11.32% for NRMSE after addressing the mentioned challenges. Moreover, our results indicate that there is a relationship between the missing values rate, data skewness, and illness severity of patients, so it is clinically essential to take incomplete records of patients into account and apply proper solutions for interpolation of missing values. Conclusion We propose a new method comprised of three stages: missing values imputation, data skewness handling, and building predictive models based on classification and regression approaches. Our results indicated that addressing these challenges in a proper way enhanced the performance of models significantly, which led to a more valid prediction of LOS.

Download Full-text

Pengaruh Latihan Bercirikan Ketenteraan Keatas Pembangunan Prestasi Pekerja: Organisasi Sektor Awam di Malaysia

Malaysian Journal of Social Sciences and Humanities (MJSSH) ◽

10.47405/mjssh.v6i2.631 ◽

2021 ◽

Vol 6 (2) ◽

pp. 53-64

Author(s):

Mohd Safix Lamsah ◽

Rosniza Aznie Che Rose ◽

Muhammad Daud Johari ◽

Rogis Baker ◽

Mohd Sani Ismail

Keyword(s):

Social Sciences ◽

Data Skewness

Latihan di tempat kerja merupakan faktor kritikal dalam meningkatkan kekuatan sumber manusia khususnya pembentukan prestasi pekerja. Pengurusan sumber manusia yang optimis akan memastikan latihan tersebut memberi impak yang positif. Namun begitu, kebanyakan latihan yang dilaksanakan di kebanyakan organisasi sektor awam masih di tahap yang memuaskan. Justeru, kajian ini dijalankan bagi mengkaji pengaruh latihan bercirikan modul ketenteraan keatas pembangunan modal insan di Kementerian Pertahanan, Malaysia. Kaedah pengumpulan data daripada borang soal selidik yang diedarkan kepada 92 responden menggunakan formula Taro Yamanae. Prestasi pekerja berasaskan modul ketenteraan dianalisis berdasarkan analisis kebolehpercayaan, ujian kenormalan, analisis deskriptif, analisis kolerasi dan analisis regresi linear menggunakan Statistical Package for Social Sciences. Daripada ujian kenormalan, data Skewness dan Kurtosis bertabur secara normal yang menunjukkan nilai kepencongan berada dalam lingkungan 2.0 titik cut-off. Analisis deskriptif menunjukkan taburan kekerapan tahap prestasi pekerja pada tahap tertinggi bersamaan 90.32 peratus. Melalui analisis kolerasi, penggunaan modul ketenteraan mempunyai kolerasi yang tinggi. Analisis regresi linear turut menunjukkan terdapat hubungan yang signifikan antara modul ketenteraan terhadap prestasi pekerja khususnya dari aspek kerjasama sepasukan. Justeru, latihan berasaskan modul ketenteraan berupaya melatih mental dan fizikal pekerja awam khususnya mengurangkan tekanan bekerja dalam organisasi. Malah, perkembangan kognitif dan psikomotor pekerja amat bersesuaian dengan skop kerja dengan latihan tersebut. Pelaksanaan latihan bercirikan ketenteraan secara holistik telah memberi impak yang positif dalam pencapaian matlamat, misi dan visi organisasi sekali gus mengoptimumkan sumbangan modal insan memacu wawasan negara yang berjaya.

Download Full-text

Handling data-skewness in character based string similarity join using Hadoop

Applied Computing and Informatics ◽

10.1016/j.aci.2018.11.001 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 1

Author(s):

Kanak Meena ◽

Devendra K. Tayal ◽

Oscar Castillo ◽

Amita Jain

Keyword(s):

Scientific Data ◽

Distribution Law ◽

Similarity Join ◽

String Similarity ◽

Zipf Distribution ◽

Imbalance Problem ◽

Data Skewness ◽

Pair Generation ◽

Set Up ◽

Similarity Joins

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are applied to these datasets, skewness occurs exponentially. All the algorithms developed to date for the implementation of database joins are highly skew sensitive. This paper presents a new approach for handling data-skewness in a character- based string similarity join using the MapReduce framework. In the literature, no such work exists to handle data skewness in character-based string similarity join, although work for set based string similarity joins exists. Proposed work has been divided into three stages, and every stage is further divided into mapper and reducer phases, which are dedicated to a specific task. The first stage is dedicated to finding the length of strings from a dataset. For valid candidate pair generation, MR-Pass Join framework has been suggested in the second stage. MRFA concepts are incorporated for string similarity join, which is named as “MRFA-SSJ” (MapReduce Frequency Adaptive – String Similarity Join) in the third stage which is further divided into four MapReduce phases. Hence, MRFA-SSJ has been proposed to handle skewness in the string similarity join. The experiments have been implemented on three different datasets namely: DBLP, Query log and a real dataset of IP addresses & Cookies by deploying Hadoop framework. The proposed algorithm has been compared with three known algorithms and it has been noticed that all these algorithms fail when data is highly skewed, whereas our proposed method handles highly skewed data without any problem. A set-up of the 15-node cluster has been used in this experiment, and we are following the Zipf distribution law for the analysis of skewness factor. Also, a comparison among existing and proposed techniques has been shown. Existing techniques survived till Zipf factor 0.5 whereas the proposed algorithm survives up to Zipf factor 1. Hence the proposed algorithm is skew insensitive and ensures scalability with a reasonable query processing time for string similarity database join. It also ensures the even distribution of attributes.

Download Full-text