scholarly journals Addressing Class Overlap under Imbalanced Distribution: An Improved Method and Two Metrics

Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1649
Author(s):  
Zhuang Li ◽  
Jingyan Qin ◽  
Xiaotong Zhang ◽  
Yadong Wan

Class imbalance, as a phenomenon of asymmetry, has an adverse effect on the performance of most machine learning and overlap is another important factor that affects the classification performance of machine learning algorithms. This paper deals with the two factors simultaneously, addressing the class overlap under imbalanced distribution. In this paper, a theoretical analysis is firstly conducted on the existing class overlap metrics. Then, an improved method and the corresponding metrics to evaluate the class overlap under imbalance distributions are proposed based on the theoretical analysis. A well-known collection of the imbalanced datasets is used to compare the performance of different metrics and the performance is evaluated based on the Pearson correlation coefficient and the ξ correlation coefficient. The experimental results demonstrate that the proposed class overlap metrics outperform other compared metrics for the imbalanced datasets and the Pearson correlation coefficient with the AUC metric of eight algorithms can be improved by 34.7488% in average.

2020 ◽  
Author(s):  
Yuanyuan Peng ◽  
Xinjian Chen ◽  
Yibiao Rong ◽  
Chi Pui Pang ◽  
Xinjian Chen ◽  
...  

BACKGROUND Advanced prediction of the daily incidence of COVID-19 can aid policy making on the prevention of disease spread, which can profoundly affect people's livelihood. In previous studies, predictions were investigated for single or several countries and territories. OBJECTIVE We aimed to develop models that can be applied for real-time prediction of COVID-19 activity in all individual countries and territories worldwide. METHODS Data of the previous daily incidence and infoveillance data (search volume data via Google Trends) from 215 individual countries and territories were collected. A random forest regression algorithm was used to train models to predict the daily new confirmed cases 7 days ahead. Several methods were used to optimize the models, including clustering the countries and territories, selecting features according to the importance scores, performing multiple-step forecasting, and upgrading the models at regular intervals. The performance of the models was assessed using the mean absolute error (MAE), root mean square error (RMSE), Pearson correlation coefficient, and Spearman correlation coefficient. RESULTS Our models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Of the 215 countries and territories under study, 198 (92.1%) had MAEs <10 and 187 (87.0%) had Pearson correlation coefficients >0.8. For the 215 countries and territories, the mean MAE was 5.42 (range 0.26-15.32), the mean RMSE was 9.27 (range 1.81-24.40), the mean Pearson correlation coefficient was 0.89 (range 0.08-0.99), and the mean Spearman correlation coefficient was 0.84 (range 0.2-1.00). CONCLUSIONS By integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead.


2020 ◽  
Vol 19 (01) ◽  
pp. 2040016
Author(s):  
Fahad Alahmari

Data imbalance with respect to the class labels has been recognised as a challenging problem for machine learning techniques as it has a direct impact on the classification model’s performance. In an imbalanced dataset, most of the instances belong to one class, while far fewer instances are associated with the remaining classes. Most of the machine learning algorithms tend to favour the majority class and ignore the minority classes leading to classification models being generated that cannot be generalised. This paper investigates the problem of class imbalance for a medical application related to autism spectrum disorder (ASD) screening to identify the ideal data resampling method that can stabilise classification performance. To achieve the aim, experimental analyses to measure the performance of different oversampling and under-sampling techniques have been conducted on a real imbalanced ASD dataset related to adults. The results produced by multiple classifiers on the considered datasets showed superiority in terms of specificity, sensitivity, and precision, among others, when adopting oversampling techniques in the pre-processing phase.


2020 ◽  
Vol 16 (1) ◽  
pp. 47-53
Author(s):  
Vicente Benavides-Córdoba ◽  
Mauricio Palacios Gómez

Introduction: Animal models have been used to understand the pathophysiology of pulmonary hypertension, to describe the mechanisms of action and to evaluate promising active ingredients. The monocrotaline-induced pulmonary hypertension model is the most used animal model. In this model, invasive and non-invasive hemodynamic variables that resemble human measurements have been used. Aim: To define if non-invasive variables can predict hemodynamic measures in the monocrotaline-induced pulmonary hypertension model. Materials and Methods: Twenty 6-week old male Wistar rats weighing between 250-300g from the bioterium of the Universidad del Valle (Cali - Colombia) were used in order to establish that the relationships between invasive and non-invasive variables are sustained in different conditions (healthy, hypertrophy and treated). The animals were organized into three groups, a control group who was given 0.9% saline solution subcutaneously (sc), a group with pulmonary hypertension induced with a single subcutaneous dose of Monocrotaline 30 mg/kg, and a group with pulmonary hypertension with 30 mg/kg of monocrotaline treated with Sildenafil. Right ventricle ejection fraction, heart rate, right ventricle systolic pressure and the extent of hypertrophy were measured. The functional relation between any two variables was evaluated by the Pearson correlation coefficient. Results: It was found that all correlations were statistically significant (p <0.01). The strongest correlation was the inverse one between the RVEF and the Fulton index (r = -0.82). The Fulton index also had a strong correlation with the RVSP (r = 0.79). The Pearson correlation coefficient between the RVEF and the RVSP was -0.81, meaning that the higher the systolic pressure in the right ventricle, the lower the ejection fraction value. Heart rate was significantly correlated to the other three variables studied, although with relatively low correlation. Conclusion: The correlations obtained in this study indicate that the parameters evaluated in the research related to experimental pulmonary hypertension correlate adequately and that the measurements that are currently made are adequate and consistent with each other, that is, they have good predictive capacity.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 156
Author(s):  
Charles Carlson ◽  
Vanessa-Rose Turpin ◽  
Ahmad Suliman ◽  
Carl Ade ◽  
Steve Warren ◽  
...  

Background: The goal of this work was to create a sharable dataset of heart-driven signals, including ballistocardiograms (BCGs) and time-aligned electrocardiograms (ECGs), photoplethysmograms (PPGs), and blood pressure waveforms. Methods: A custom, bed-based ballistocardiographic system is described in detail. Affiliated cardiopulmonary signals are acquired using a GE Datex CardioCap 5 patient monitor (which collects ECG and PPG data) and a Finapres Medical Systems Finometer PRO (which provides continuous reconstructed brachial artery pressure waveforms and derived cardiovascular parameters). Results: Data were collected from 40 participants, 4 of whom had been or were currently diagnosed with a heart condition at the time they enrolled in the study. An investigation revealed that features extracted from a BCG could be used to track changes in systolic blood pressure (Pearson correlation coefficient of 0.54 +/− 0.15), dP/dtmax (Pearson correlation coefficient of 0.51 +/− 0.18), and stroke volume (Pearson correlation coefficient of 0.54 +/− 0.17). Conclusion: A collection of synchronized, heart-driven signals, including BCGs, ECGs, PPGs, and blood pressure waveforms, was acquired and made publicly available. An initial study indicated that bed-based ballistocardiography can be used to track beat-to-beat changes in systolic blood pressure and stroke volume. Significance: To the best of the authors’ knowledge, no other database that includes time-aligned ECG, PPG, BCG, and continuous blood pressure data is available to the public. This dataset could be used by other researchers for algorithm testing and development in this fast-growing field of health assessment, without requiring these individuals to invest considerable time and resources into hardware development and data collection.


Water ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 82
Author(s):  
Omolola M. Adisa ◽  
Muthoni Masinde ◽  
Joel O. Botai

This study examines the (dis)similarity of two commonly used indices Standardized Precipitation Index (SPI) computed over accumulation periods 1-month, 3-month, 6-month, and 12-month (hereafter SPI-1, SPI-3, SPI-6, and SPI-12, respectively) and Effective Drought Index (EDI). The analysis is based on two drought monitoring indicators (derived from SPI and EDI), namely, the Drought Duration (DD) and Drought Severity (DS) across the 93 South African Weather Service’s delineated rainfall districts over South Africa from 1980 to 2019. In the study, the Pearson correlation coefficient dissimilarity and periodogram dissimilarity estimates were used. The results indicate a positive correlation for the Pearson correlation coefficient dissimilarity and a positive value for periodogram of dissimilarity in both the DD and DS. With the Pearson correlation coefficient dissimilarity, the study demonstrates that the values of the SPI-1/EDI pair and the SPI-3/EDI pair exhibit the highest similar values for DD, while the SPI-6/EDI pair shows the highest similar values for DS. Moreover, dissimilarities are more obvious in SPI-12/EDI pair for DD and DS. When a periodogram of dissimilarity is used, the values of the SPI-1/EDI pair and SPI-6/EDI pair exhibit the highest similar values for DD, while SPI-1/EDI displayed the highest similar values for DS. Overall, the two measures show that the highest similarity is obtained in the SPI-1/EDI pair for DS. The results obtainable in this study contribute towards an in-depth knowledge of deviation between the EDI and SPI values for South Africa, depicting that these two drought indices values are replaceable in some rainfall districts of South Africa for drought monitoring and prediction, and this is a step towards the selection of the appropriate drought indices.


PEDIATRICS ◽  
1991 ◽  
Vol 87 (5) ◽  
pp. 708-711
Author(s):  
Matthew W. Gillman ◽  
Bernard Rosner ◽  
Denis A. Evans ◽  
Laurel A. Smith ◽  
James O. Taylor ◽  
...  

Previous studies of childhood blood pressure have shown tracking correlations, which estimate the magnitude of association between initial and subsequent measurements, to be lower than corresponding adult values. Inasmuch as this disparity could arise from failing to account for a larger week-to-week variability in children, blood pressure was measured for 4 successive years, on four weekly visits in each year, and with three measurements at each visit, using a random-zero sphygmomanometer, in a cohort of 333 schoolchildren aged 8 through 15 at entry. Ninety percent of subjects had measurements in 1 or more years of follow-up. For all follow-up periods (1, 2, and 3 years from baseline), the Pearson correlation coefficient (r) for both systolic and diastolic blood pressure rose substantially with the number of weekly visits used to calculate each subject's yearly blood pressure (P &lt; .0001). For systolic pressure, the 3-year r values for 1, 2, 3, and 4 visits were .45, .55, .64, and .69, respectively. For diastolic pressure (Korotkoff phase 4), the corresponding values were .28, .41, .47, and .54. These higher multiple-visit estimates of tracking approximate published adult values and raise the possibility that prediction of adult blood pressure from childhood measurements may be improved by averaging readings from multiple weekly visits.


2021 ◽  
Vol 58 (8) ◽  
pp. 0810025
Author(s):  
李硕 Li Shuo ◽  
韩迎东 Han Yingdong ◽  
王双 Wang Shuang ◽  
刘琨 Liu Kun ◽  
江俊峰 Jiang Junfeng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document