skewed data
Recently Published Documents


TOTAL DOCUMENTS

172
(FIVE YEARS 40)

H-INDEX

24
(FIVE YEARS 3)

Author(s):  
Mohammad Zoynul Abedin ◽  
Chi Guotai ◽  
Petr Hajek ◽  
Tong Zhang

AbstractIn small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ensemble classifier hybridizes WSMOTE and Bagging with sampling composite mixtures to guarantee the robustness and variability of the generated synthetic instances and, thus, minimize the small business class-skewed constraints linked to default and nondefault instances. The original small business dataset used in this study was taken from 3111 records from a Chinese commercial bank. By implementing a thorough experimental study of extensively skewed data-modeling scenarios, a multilevel experimental setting was established for a rare event domain. Based on the proper evaluation measures, this study proposes that the random forest classifier used in the WSMOTE-ensemble model provides a good trade-off between the performance on default class and that of nondefault class. The ensemble solution improved the accuracy of the minority class by 15.16% in comparison with its competitors. This study also shows that sampling methods outperform nonsampling algorithms. With these contributions, this study fills a noteworthy knowledge gap and adds several unique insights regarding the prediction of small business credit risk.


2021 ◽  
Vol 6 (2) ◽  
pp. 11-29
Author(s):  
Dr. Fazle Malik ◽  
Dr. Muhammad Junaid ◽  
Dr. Muhammad Asif ◽  
Ilyas Sharif

This study explores the effects of pharmaceutical marketing on patients and society in Pakistan. Pharmaceutical marketing is an integral part of the drug industry, which channels product-related information to healthcare professionals. Physicians are the target audience as they prescribe medicine to the users. The pharmaceutical industry mobilizes all resources to influence physicians’ prescriptions in favor of their brands. It is commendable from the organizational perspective, however; it leads to unintended negative consequences for society. The primary reason is the blind pursuit of commercial interest and near-total neglect of ethical behavior in marketing drugs. This study conducted open-ended 20 interviews from primary stakeholders of this issue that includes physicians, pharmaceutical managers, and officials of drug regulatory authority through purposive sampling. The findings show that misleading promotional strategies influencing physicians are responsible for the misuse and abuse of drugs and antibiotics. Pharmaceutical drug incentivization, the personal obligation for physicians, skewed data, and inappropriate promotions were the major categories developed during analysis. The study recommends various steps to minimize these ill effects.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shovan Chowdhury ◽  
Amarjit Kundu ◽  
Bidhan Modok

PurposeAs an alternative to the standard p and np charts along with their various modifications, beta control charts are used in the literature for monitoring proportion data. These charts in general use average of proportions to set up the control limits assuming in-control parameters known. The purpose of the paper is to propose a control chart for detecting shift(s) in the percentiles of a beta distributed process monitoring scheme when in-control parameters are unknown. Such situations arise when specific percentile of proportion of conforming or non-conforming units is the quality parameter of interest.Design/methodology/approachParametric bootstrap method is used to develop the control chart for monitoring percentiles of a beta distributed process when in-control parameters are unknown. Extensive Monte Carlo simulations are conducted for various combinations of percentiles, false-alarm rates and sample sizes to evaluate the in-control performance of the proposed bootstrap control charts in terms of average run lengths (ARL). The out-of-control behavior and performance of the proposed bootstrap percentile chart is thoroughly investigated for several choices of shifts in the parameters of beta distribution. The proposed chart is finally applied to two skewed data sets for illustration.FindingsThe simulated values of in-control ARL are found to be closer to the theoretical results implying that the proposed chart for percentiles performs well with both positively and negatively skewed data. Also, the out-of-control ARL values for the percentiles decrease sharply with both downward and upward small, medium and large shifts in the parameters. The phenomenon indicates that the chart is effective in detecting shifts in the parameters. However, the speed of detection of shifts varies depending on the type of shift, the parameters and the percentile being considered. The proposed chart is found to be effective in comparison to the Shewhart-type chart and bootstrap-based unit gamma chart.Originality/valueIt is worthwhile to mention that the beta control charts proposed in the literature use average of proportion to set up the control limits. However, in practice, specific percentile of proportion of conforming or non-conforming items should be more useful as the quality parameter of interest than average. To the best of our knowledge, no research addresses beta control chart for percentiles of proportion in the literature. Moreover, the proposed control chart assumes in-control parameters to be unknown, and hence captures additional variability introduced into the monitoring scheme through parameter estimation. In this sense, the proposed chart is original and unique.


2021 ◽  
Vol 11 (16) ◽  
pp. 7461
Author(s):  
Zheng Li ◽  
Jhon Galdames-Retamal

Machine learning techniques generally require or assume balanced datasets. Skewed data can make machine learning systems never function properly, no matter how carefully the parameter tuning is conducted. Thus, a common solution to the problem of high skewness is to pre-process data (e.g., log transformation) before applying machine learning to deal with real-world problems. Nevertheless, this pre-processing strategy cannot be employed for online machine learning, especially in the context of edge computing, because it is barely possible to foresee and store the continuous data flow on IoT devices on the edge. Thus, it will be crucial and valuable to enable skewness monitoring in real time. Unfortunately, there exists a surprising gap between practitioners’ needs and scientific research in running statistics for monitoring real-time skewness, not to mention the lack of suitable remedies for skewed data at runtime. Inspired by Welford’s algorithm, which is the most efficient approach to calculating running variance, this research developed efficient calculation methods for three versions of running skewness. These methods can conveniently be implemented as skewness monitoring modules that are affordable for IoT devices in different edge learning scenarios. Such an IoT-friendly skewness monitoring eventually acts a cornerstone for developing the research field of skewness-aware online edge learning. By initially validating the usefulness and significance of skewness awareness in edge learning implementations, we also argue that conjoint research efforts from relevant communities are needed to boost this promising research field.


Author(s):  
Santha Subbulaxmi S ◽  
Arumugam G

Skewed data distribution prevails in many real world applications. The skewedness is due to imbalance in the class distribution and it deteriorates the performance of the traditional classification algorithms. In this paper, we provide a Grey wolf optimized K-Means cluster based oversampling algorithm to handle the skewedness and solve the imbalanced data classification problem. Experiments are conducted on the proposed algorithm and compared it with the benchmarking popular algorithms. The results reveal that the proposed algorithm outperforms the other benchmarking algorithms.


2021 ◽  
Author(s):  
Thomas Hiessl

<div>Machine Learning (ML) is increasingly applied in industrial manufacturing, but often performance is limited due to insufficient training data. While ML models can benefit from collaboration, due to privacy concerns, individual manufacturers cannot share data directly. Federated Learning (FL) enables collaborative training of ML models without revealing raw data. However, current FL approaches fail to take the characteristics and requirements of industrial clients into account. In this work, we propose a FL system consisting of a process description and a software architecture to provide \acrfull{flaas} to industrial clients deployed to edge devices. Our approach deals with skewed data by organizing clients into cohorts with similar data distributions. We evaluated the system on two industrial datasets. We show how the FLaaS approach provides FL to client processes by considering their requests submitted to the Industrial Federated Learning (IFL) Services API. Experiments on both industrial datasets and different FL algorithms show that the proposed cohort building can increase the ML model performance notably.</div>


2021 ◽  
Author(s):  
Thomas Hiessl

<div>Machine Learning (ML) is increasingly applied in industrial manufacturing, but often performance is limited due to insufficient training data. While ML models can benefit from collaboration, due to privacy concerns, individual manufacturers cannot share data directly. Federated Learning (FL) enables collaborative training of ML models without revealing raw data. However, current FL approaches fail to take the characteristics and requirements of industrial clients into account. In this work, we propose a FL system consisting of a process description and a software architecture to provide \acrfull{flaas} to industrial clients deployed to edge devices. Our approach deals with skewed data by organizing clients into cohorts with similar data distributions. We evaluated the system on two industrial datasets. We show how the FLaaS approach provides FL to client processes by considering their requests submitted to the Industrial Federated Learning (IFL) Services API. Experiments on both industrial datasets and different FL algorithms show that the proposed cohort building can increase the ML model performance notably.</div>


Sign in / Sign up

Export Citation Format

Share Document