scholarly journals The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability

2020 ◽  
Vol 10 (11) ◽  
pp. 4014 ◽  
Author(s):  
Federico Cabitza ◽  
Andrea Campagner ◽  
Domenico Albano ◽  
Alberto Aliprandi ◽  
Alberto Bruno ◽  
...  

In this paper, we present and discuss a novel reliability metric to quantify the extent a ground truth, generated in multi-rater settings, as a reliable basis for the training and validation of machine learning predictive models. To define this metric, three dimensions are taken into account: agreement (that is, how much a group of raters mutually agree on a single case); confidence (that is, how much a rater is certain of each rating expressed); and competence (that is, how accurate a rater is). Therefore, this metric produces a reliability score weighted for the raters’ confidence and competence, but it only requires the former information to be actually collected, as the latter can be obtained by the ratings themselves, if no further information is available. We found that our proposal was both more conservative and robust to known paradoxes than other existing agreement measures, by virtue of a more articulated notion of the agreement due to chance, which was based on an empirical estimation of the reliability of the single raters involved. We discuss the above metric within a realistic annotation task that involved 13 expert radiologists in labeling the MRNet dataset. We also provide a nomogram by which to assess the actual accuracy of a classification model, given the reliability of its ground truth. In this respect, we also make the point that theoretical estimates of model performance are consistently overestimated if ground truth reliability is not properly taken into account.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Lobo ◽  
Rui Henriques ◽  
Sara C. Madeira

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.


2021 ◽  
Author(s):  
Ali Abdolali ◽  
Andre van der Westhuysen ◽  
Zaizhong Ma ◽  
Avichal Mehra ◽  
Aron Roland ◽  
...  

AbstractVarious uncertainties exist in a hindcast due to the inabilities of numerical models to resolve all the complicated atmosphere-sea interactions, and the lack of certain ground truth observations. Here, a comprehensive analysis of an atmospheric model performance in hindcast mode (Hurricane Weather and Research Forecasting model—HWRF) and its 40 ensembles during severe events is conducted, evaluating the model accuracy and uncertainty for hurricane track parameters, and wind speed collected along satellite altimeter tracks and at stationary source point observations. Subsequently, the downstream spectral wave model WAVEWATCH III is forced by two sets of wind field data, each includes 40 members. The first ones are randomly extracted from original HWRF simulations and the second ones are based on spread of best track parameters. The atmospheric model spread and wave model error along satellite altimeters tracks and at stationary source point observations are estimated. The study on Hurricane Irma reveals that wind and wave observations during this extreme event are within ensemble spreads. While both Models have wide spreads over areas with landmass, maximum uncertainty in the atmospheric model is at hurricane eye in contrast to the wave model.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Young-Gon Kim ◽  
Sungchul Kim ◽  
Cristina Eunbee Cho ◽  
In Hye Song ◽  
Hee Jin Lee ◽  
...  

AbstractFast and accurate confirmation of metastasis on the frozen tissue section of intraoperative sentinel lymph node biopsy is an essential tool for critical surgical decisions. However, accurate diagnosis by pathologists is difficult within the time limitations. Training a robust and accurate deep learning model is also difficult owing to the limited number of frozen datasets with high quality labels. To overcome these issues, we validated the effectiveness of transfer learning from CAMELYON16 to improve performance of the convolutional neural network (CNN)-based classification model on our frozen dataset (N = 297) from Asan Medical Center (AMC). Among the 297 whole slide images (WSIs), 157 and 40 WSIs were used to train deep learning models with different dataset ratios at 2, 4, 8, 20, 40, and 100%. The remaining, i.e., 100 WSIs, were used to validate model performance in terms of patch- and slide-level classification. An additional 228 WSIs from Seoul National University Bundang Hospital (SNUBH) were used as an external validation. Three initial weights, i.e., scratch-based (random initialization), ImageNet-based, and CAMELYON16-based models were used to validate their effectiveness in external validation. In the patch-level classification results on the AMC dataset, CAMELYON16-based models trained with a small dataset (up to 40%, i.e., 62 WSIs) showed a significantly higher area under the curve (AUC) of 0.929 than those of the scratch- and ImageNet-based models at 0.897 and 0.919, respectively, while CAMELYON16-based and ImageNet-based models trained with 100% of the training dataset showed comparable AUCs at 0.944 and 0.943, respectively. For the external validation, CAMELYON16-based models showed higher AUCs than those of the scratch- and ImageNet-based models. Model performance for slide feasibility of the transfer learning to enhance model performance was validated in the case of frozen section datasets with limited numbers.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4050
Author(s):  
Dejan Pavlovic ◽  
Christopher Davison ◽  
Andrew Hamilton ◽  
Oskar Marko ◽  
Robert Atkinson ◽  
...  

Monitoring cattle behaviour is core to the early detection of health and welfare issues and to optimise the fertility of large herds. Accelerometer-based sensor systems that provide activity profiles are now used extensively on commercial farms and have evolved to identify behaviours such as the time spent ruminating and eating at an individual animal level. Acquiring this information at scale is central to informing on-farm management decisions. The paper presents the development of a Convolutional Neural Network (CNN) that classifies cattle behavioural states (`rumination’, `eating’ and `other’) using data generated from neck-mounted accelerometer collars. During three farm trials in the United Kingdom (Easter Howgate Farm, Edinburgh, UK), 18 steers were monitored to provide raw acceleration measurements, with ground truth data provided by muzzle-mounted pressure sensor halters. A range of neural network architectures are explored and rigorous hyper-parameter searches are performed to optimise the network. The computational complexity and memory footprint of CNN models are not readily compatible with deployment on low-power processors which are both memory and energy constrained. Thus, progressive reductions of the CNN were executed with minimal loss of performance in order to address the practical implementation challenges, defining the trade-off between model performance versus computation complexity and memory footprint to permit deployment on micro-controller architectures. The proposed methodology achieves a compression of 14.30 compared to the unpruned architecture but is nevertheless able to accurately classify cattle behaviours with an overall F1 score of 0.82 for both FP32 and FP16 precision while achieving a reasonable battery lifetime in excess of 5.7 years.


Author(s):  
D. Gritzner ◽  
J. Ostermann

Abstract. Modern machine learning, especially deep learning, which is used in a variety of applications, requires a lot of labelled data for model training. Having an insufficient amount of training examples leads to models which do not generalize well to new input instances. This is a particular significant problem for tasks involving aerial images: often training data is only available for a limited geographical area and a narrow time window, thus leading to models which perform poorly in different regions, at different times of day, or during different seasons. Domain adaptation can mitigate this issue by using labelled source domain training examples and unlabeled target domain images to train a model which performs well on both domains. Modern adversarial domain adaptation approaches use unpaired data. We propose using pairs of semantically similar images, i.e., whose segmentations are accurate predictions of each other, for improved model performance. In this paper we show that, as an upper limit based on ground truth, using semantically paired aerial images during training almost always increases model performance with an average improvement of 4.2% accuracy and .036 mean intersection-over-union (mIoU). Using a practical estimate of semantic similarity, we still achieve improvements in more than half of all cases, with average improvements of 2.5% accuracy and .017 mIoU in those cases.


2021 ◽  
Vol 11 (1) ◽  
pp. 15-24
Author(s):  
Dequan Guo ◽  
Gexiang Zhang ◽  
Hui Peng ◽  
Jianying Yuan ◽  
Prithwineel Paul ◽  
...  

In recent years, diseases of cardiovascular and cerebrovascular have attracted much attention due to main causes in death in human beings. To reduce mortality, there are lots of efforts which are focused on early diagnosis and prevention. It is an important reference index for cardiovascular diseases through the endovascular membrane in carotid artery by medical ultrasound images. The paper proposes a method which finds the region of interest (ROI) by convolutional neural network, segments and measures intima-media membrane mainly using support vector machine (SVM). Essentially, the task of detecting the membrane is one target detection problem. This paper adopts the strategy, named Yon Only Look Once (YOLO), a new detection algorithm, and follows the convolution neural network algorithm based on end-to-end training. Firstly, sufficient samples are extracted according to certain characteristics in the special region. It can be trained by the SVM classification model. Then the ROI is processed and all the pixels are classified into boundary points and non-boundary points through the classification model. Thirdly, the boundary points are selected to obtain the accurate boundary and calculate the intima-media thickness (IMT). In experiments, two hundred ultrasound images are tested, and the results verify that our algorithm is consistent with the results by ground truth (GT). The detection speed of the algorithm in this paper is in real time, and it has high generalization characteristics. The algorithm computes the intima-media thickness in ultrasound images accurately and quickly with 95% consistence to ground truth.


Author(s):  
Jaqueline Pels ◽  
Tomás Andrés Kidd

Purpose – The purpose of this paper is to develop a framework that expands business model innovation literature by including a social goal, the emerging markets (EMs) environmental characteristics and adopting a bottom-up perspective. Design/methodology/approach – This paper draws on a single-case study. Sistema Ser/CEGIN (SER–CEGIN) is an Argentine social business that offers high-quality medical healthcare to BOP users. Findings – The paper presents a new conceptualization on business model innovation that includes three dimensions: firm-centric, environment and customer-centric. The framework incorporates to the traditional framework on business model innovation, the social profit equation, the general and task environment and the end-user, as well as the dynamics between them. Research limitations/implications – While the authors acknowledge the importance of studying the components of the business model operating levels (economic, operational and strategic) to determine the type of business model innovation (revenue, enterprise and industrial), the framework incoporates the environment and customer-centric dimension. The suggested framework opens new streams of research both for the innovation business model literature as well as for the EMs – bottom of the pyramid (BOP) literature. Practical implications – To achieve economic and social goals, particularly in the BOP, firms need to adopt a bottom-up approach to understand the components of their business model that need to be modified. Originality/value – The paper proposes a novel business model innovation conceptualization which is useful for both researches to better study business models in the BOP and for firms to successfully operate in the BOP.


Author(s):  
Lutz Hamel

Classification models and in particular binary classification models are ubiquitous in many branches of science and business. Consider, for example, classification models in bioinformatics that classify catalytic protein structures as being in an active or inactive conformation. As an example from the field of medical informatics we might consider a classification model that, given the parameters of a tumor, will classify it as malignant or benign. Finally, a classification model in a bank might be used to tell the difference between a legal and a fraudulent transaction. Central to constructing, deploying, and using classification models is the question of model performance assessment (Hastie, Tibshirani, & Friedman, 2001). Traditionally this is accomplished by using metrics derived from the confusion matrix or contingency table. However, it has been recognized that (a) a scalar is a poor summary for the performance of a model in particular when deploying non-parametric models such as artificial neural networks or decision trees (Provost, Fawcett, & Kohavi, 1998) and (b) some performance metrics derived from the confusion matrix are sensitive to data anomalies such as class skew (Fawcett & Flach, 2005). Recently it has been observed that Receiver Operating Characteristic (ROC) curves visually convey the same information as the confusion matrix in a much more intuitive and robust fashion (Swets, Dawes, & Monahan, 2000). Here we take a look at model performance metrics derived from the confusion matrix. We highlight their shortcomings and illustrate how ROC curves can be deployed for model assessment in order to provide a much deeper and perhaps more intuitive analysis of the models. We also briefly address the problem of model selection.


Author(s):  
Carla Sendra-Balcells ◽  
Ricardo Salvador ◽  
Juan B. Pedro ◽  
M C Biagi ◽  
Charlène Aubinet ◽  
...  

AbstractThe segmentation of structural MRI data is an essential step for deriving geometrical information about brain tissues. One important application is in transcranial electrical stimulation (e.g., tDCS), a non-invasive neuromodulatory technique where head modeling is required to determine the electric field (E-field) generated in the cortex to predict and optimize its effects. Here we propose a deep learning-based model (StarNEt) to automatize white matter (WM) and gray matter (GM) segmentation and compare its performance with FreeSurfer, an established tool. Since good definition of sulci and gyri in the cortical surface is an important requirement for E-field calculation, StarNEt is specifically designed to output masks at a higher resolution than that of the original input T1w-MRI. StarNEt uses a residual network as the encoder (ResNet) and a fully convolutional neural network with U-net skip connections as the decoder to segment an MRI slice by slice. Slice vertical location is provided as an extra input. The model was trained on scans from 425 patients in the open-access ADNI+IXI datasets, and using FreeSurfer segmentation as ground truth. Model performance was evaluated using the Dice Coefficient (DC) in a separate subset (N=105) of ADNI+IXI and in two extra testing sets not involved in training. In addition, FreeSurfer and StarNEt were compared to manual segmentations of the MRBrainS18 dataset, also unseen by the model. To study performance in real use cases, first, we created electrical head models derived from the FreeSurfer and StarNEt segmentations and used them for montage optimization with a common target region using a standard algorithm (Stimweaver) and second, we used StarNEt to successfully segment the brains of minimally conscious state (MCS) patients having suffered from brain trauma, a scenario where FreeSurfer typically fails. Our results indicate that StarNEt matches FreeSurfer performance on the trained tasks while reducing computation time from several hours to a few seconds, and with the potential to evolve into an effective technique even when patients present large brain abnormalities.


2021 ◽  
Author(s):  
Galina Wind ◽  
Arlindo M. da Silva ◽  
Kerry G. Meyer ◽  
Steven Platnick ◽  
Peter M. Norris

Abstract. The Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS) presently produces synthetic radiance data from Goddard Earth Observing System version 5 (GEOS-5) model output as if the Moderate Resolution Imaging Spectroradiometer (MODIS) was viewing a combination of atmospheric column inclusive of clouds, aerosols and a variety of gases and land/ocean surface at a specific location. In this paper we use MCARS to study the MODIS Above-Cloud AEROsol retrieval algorithm (MOD06ACAERO). MOD06ACAERO is presently a regional research algorithm able to retrieve aerosol optical thickness over clouds, in particular absorbing biomass burning aerosols overlying marine boundary layer clouds in the Southeastern Atlantic Ocean. The algorithm's ability to provide aerosol information in cloudy conditions makes it a valuable source of information for modeling and climate studies in an area where current clear sky-only operational MODIS aerosol retrievals effectively have a data gap between the months of June and October. We use MCARS for a verification and closure study of the MOD06ACAERO algorithm. Our simulations indicate that the MOD06ACAERO algorithm performs well for marine boundary layer clouds in the SE Atlantic provided some specific screening rules are observed. For the present study, a combination of five simulated MODIS data granules was used for a dataset of 13.5 million samples with known input conditions. When pixel retrieval uncertainty was less than 30 %, optical thickness of the underlying cloud layer was greater than 4 and scattering angle range within the cloud bow was excluded, MOD06ACAERO retrievals agreed with the underlying ground truth (GEOS-5 cloud and aerosol profiles used to generate the synthetic radiances) with a slope of 0.913, offset of 0.06, and RMSE = 0.107. When only near-nadir pixels were considered (view zenith angle within +/−20 degrees) the agreement with source data further improved (0.977, 0.051 and 0.096 respectively). Algorithm closure was examined using a single case out of the five used for verification. For closure, the MOD06ACAERO code was modified to use GEOS-5 temperature and moisture profiles as ancillary. Agreement of MOD06ACAERO retrievals with source data for the closure study had a slope of 0.996 with offset −0.007 and RMSE of 0.097 at pixel uncertainty level of less than 40 %, illustrating the benefits of high-quality ancillary atmospheric data for such retrievals.


Sign in / Sign up

Export Citation Format

Share Document