scholarly journals Monotonic classification: An overview on algorithms, performance measures and data sets

2019 ◽  
Vol 341 ◽  
pp. 168-182 ◽  
Author(s):  
José-Ramón Cano ◽  
Pedro Antonio Gutiérrez ◽  
Bartosz Krawczyk ◽  
Michał Woźniak ◽  
Salvador García
2011 ◽  
Author(s):  
Marina Altynova ◽  
Ed Wasser ◽  
Telford Berkey ◽  
Sanjay Boddhu ◽  
Tin Sa ◽  
...  

2011 ◽  
Author(s):  
John Talburt ◽  
Serhan Dagtas ◽  
Mariofanna Milanova ◽  
Mihail Tudoreanu ◽  
Brian Tsou

Nowadays, a huge amount of data is generated due to the growth in the technologies. There are different tools used to view this massive amount of data, and these tools contain different data mining techniques which can be applied for the obtained data sets. Classification is required to extract useful information or to predict the result from these enormous amounts of data. For this purpose, there are different classification algorithms. In this paper, we have compared Naive Bayes, K*, and random forest classification algorithm using Weka tool. To analyze the performance of these three algorithms we have considered three data sets. They are diabetes, supermarket and weather data set. In this work, an analysis is made based on the confusion matrix and different performance measures like RMSE, MAE, ROC, etc


Author(s):  
D. Rajakumari

Data mining is the process of analyzing enormous data and summarizing it into the useful knowledge discovery and the task of data mining approaches is growing quickly, particularly classification techniques very efficient, way to classifying the data, which is important in the decision-making process for medical practitioners. This study presents the quantization and validation (OQV) techniques for fast outlier detection in large size WDBC data sets. The distance metrics utilization makes the algorithm as the linear one for various objects and assures the sequential scanning. The inclusion of direct quantization technique and the cluster explicit discovery assures the simplicity and the economical. The comparative analysis of proposed OQV techniques with the triangular boundary-based classification and the Weighing-based Feature Selection and Monotonic Classification (WFSMC) regarding the accuracy, precision, recall and the number of attributes assures an effectiveness of OQV for large size datasets.


Author(s):  
Sherif Ishak ◽  
Ciprian Alecsandru

The characteristics of preincident, postincident, and nonincident traffic conditions on freeways are investigated. The characteristics are defined by second-order statistical measures derived from spatiotemporal speed contour maps. Four performance measures are used to quantify properties such as smoothness, homogeneity, and randomness in traffic conditions in a manner similar to texture characterization of digital images. With real-world incident and traffic data sets, statistical analysis was conducted to seek distinctive characteristics of three groups of traffic operating conditions: preincident, postincident, and nonincident. The study results showed that the spatiotemporal characteristics of each of the three groups were not discernible. Although the distributions of performance measures within each group are statistically different, no consistent pattern was detected to imply that certain characteristics could increase the likelihood of incidents or identify precursory conditions to incidents.


2019 ◽  
Author(s):  
Suranga N Kasthurirathne ◽  
Shaun Grannis ◽  
Paul K Halverson ◽  
Justin Morea ◽  
Nir Menachemi ◽  
...  

BACKGROUND Emerging interest in precision health and the increasing availability of patient- and population-level data sets present considerable potential to enable analytical approaches to identify and mitigate the negative effects of social factors on health. These issues are not satisfactorily addressed in typical medical care encounters, and thus, opportunities to improve health outcomes, reduce costs, and improve coordination of care are not realized. Furthermore, methodological expertise on the use of varied patient- and population-level data sets and machine learning to predict need for supplemental services is limited. OBJECTIVE The objective of this study was to leverage a comprehensive range of clinical, behavioral, social risk, and social determinants of health factors in order to develop decision models capable of identifying patients in need of various wraparound social services. METHODS We used comprehensive patient- and population-level data sets to build decision models capable of predicting need for behavioral health, dietitian, social work, or other social service referrals within a safety-net health system using area under the receiver operating characteristic curve (AUROC), sensitivity, precision, F1 score, and specificity. We also evaluated the value of population-level social determinants of health data sets in improving machine learning performance of the models. RESULTS Decision models for each wraparound service demonstrated performance measures ranging between 59.2%% and 99.3%. These results were statistically superior to the performance measures demonstrated by our previous models which used a limited data set and whose performance measures ranged from 38.2% to 88.3% (behavioural health: F1 score <i>P</i>&lt;.001, AUROC <i>P</i>=.01; social work: F1 score <i>P</i>&lt;.001, AUROC <i>P</i>=.03; dietitian: F1 score <i>P</i>=.001, AUROC <i>P</i>=.001; other: F1 score <i>P</i>=.01, AUROC <i>P</i>=.02); however, inclusion of additional population-level social determinants of health did not contribute to any performance improvements (behavioural health: F1 score <i>P</i>=.08, AUROC <i>P</i>=.09; social work: F1 score <i>P</i>=.16, AUROC <i>P</i>=.09; dietitian: F1 score <i>P</i>=.08, AUROC <i>P</i>=.14; other: F1 score <i>P</i>=.33, AUROC <i>P</i>=.21) in predicting the need for referral in our population of vulnerable patients seeking care at a safety-net provider. CONCLUSIONS Precision health–enabled decision models that leverage a wide range of patient- and population-level data sets and advanced machine learning methods are capable of predicting need for various wraparound social services with good performance.


2019 ◽  
Author(s):  
Anna C. Gilbert ◽  
Alexander Vargo

AbstractHere, we evaluate the performance of a variety of marker selection methods on scRNA-seq UMI counts data. We test on an assortment of experimental and synthetic data sets that range in size from several thousand to one million cells. In addition, we propose several performance measures for evaluating the quality of a set of markers when there is no known ground truth. According to these metrics, most existing marker selection methods show similar performance on experimental scRNA-seq data; thus, the speed of the algorithm is the most important consid-eration for large data sets. With this in mind, we introduce RANKCORR, a fast marker selection method with strong mathematical underpinnings that takes a step towards sensible multi-class marker selection.


10.2196/16129 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e16129 ◽  
Author(s):  
Suranga N Kasthurirathne ◽  
Shaun Grannis ◽  
Paul K Halverson ◽  
Justin Morea ◽  
Nir Menachemi ◽  
...  

Background Emerging interest in precision health and the increasing availability of patient- and population-level data sets present considerable potential to enable analytical approaches to identify and mitigate the negative effects of social factors on health. These issues are not satisfactorily addressed in typical medical care encounters, and thus, opportunities to improve health outcomes, reduce costs, and improve coordination of care are not realized. Furthermore, methodological expertise on the use of varied patient- and population-level data sets and machine learning to predict need for supplemental services is limited. Objective The objective of this study was to leverage a comprehensive range of clinical, behavioral, social risk, and social determinants of health factors in order to develop decision models capable of identifying patients in need of various wraparound social services. Methods We used comprehensive patient- and population-level data sets to build decision models capable of predicting need for behavioral health, dietitian, social work, or other social service referrals within a safety-net health system using area under the receiver operating characteristic curve (AUROC), sensitivity, precision, F1 score, and specificity. We also evaluated the value of population-level social determinants of health data sets in improving machine learning performance of the models. Results Decision models for each wraparound service demonstrated performance measures ranging between 59.2%% and 99.3%. These results were statistically superior to the performance measures demonstrated by our previous models which used a limited data set and whose performance measures ranged from 38.2% to 88.3% (behavioural health: F1 score P<.001, AUROC P=.01; social work: F1 score P<.001, AUROC P=.03; dietitian: F1 score P=.001, AUROC P=.001; other: F1 score P=.01, AUROC P=.02); however, inclusion of additional population-level social determinants of health did not contribute to any performance improvements (behavioural health: F1 score P=.08, AUROC P=.09; social work: F1 score P=.16, AUROC P=.09; dietitian: F1 score P=.08, AUROC P=.14; other: F1 score P=.33, AUROC P=.21) in predicting the need for referral in our population of vulnerable patients seeking care at a safety-net provider. Conclusions Precision health–enabled decision models that leverage a wide range of patient- and population-level data sets and advanced machine learning methods are capable of predicting need for various wraparound social services with good performance.


Author(s):  
Ana Gainaru ◽  
Hongyang Sun ◽  
Guillaume Aupy ◽  
Yuankai Huo ◽  
Bennett A Landman ◽  
...  

Scientific insights in the coming decade will clearly depend on the effective processing of large data sets generated by dynamic heterogeneous applications typical of workflows in large data centers or of emerging fields like neuroscience. In this article, we show how these big data workflows have a unique set of characteristics that pose challenges for leveraging HPC methodologies, particularly in scheduling. Our findings indicate that execution times for these workflows are highly unpredictable and are not correlated with the size of the data set involved or the precise functions used in the analysis. We characterize this inherent variability and sketch the need for new scheduling approaches by quantifying significant gaps in achievable performance. Through simulations, we show how on-the-fly scheduling approaches can deliver benefits in both system-level and user-level performance measures. On average, we find improvements of up to 35% in system utilization and up to 45% in average stretch of the applications, illustrating the potential of increasing performance through new scheduling approaches.


2020 ◽  
Vol 21 (4) ◽  
Author(s):  
Roman Dębski

One of the key elements of real-time $C^1$-continuous cubic spline interpolation of streaming data is an estimator of the first derivative of the interpolated function that is more accurate than the ones based on finite difference schemas.Two such greedy look-ahead heuristic estimators (denoted as MinBE and MinAJ2) based on Calculus of Variations are formally defined (in closed form) together with the corresponding cubic splines they generate, and then comparatively evaluated in a series of numerical experiments involving different types of performance measures. The results presented show that the cubic Hermite splines generated by heuristic MinAJ2 significantly outperformed these based on finite difference schemas in terms of all tested performance measures (including convergence).The proposed approach is quite general. It can be directly applied to streams of univariate functional data like time-series. Multidimensional curves defined parametrically, after splitting, can be handled as well. The streaming character of the algorithm means that it can also be useful in processing data sets that are too large to fit in memory (e.g., edge computing devices, embedded time-series databases).


Sign in / Sign up

Export Citation Format

Share Document