scholarly journals Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

2019 ◽  
Vol 3 (1) ◽  
pp. 13
Author(s):  
Allard van Altena ◽  
Perry Moerland ◽  
Aeilko Zwinderman ◽  
Sílvia Olabarriaga

In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes—such as `computational’, `mining’, and `challenges’—and also by terms that indicate the research field, such as `genomics’. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time.

1978 ◽  
Vol 17 (03) ◽  
pp. 157-161 ◽  
Author(s):  
F. T. De Dombal ◽  
Jane C. Horrocks

This paper uses simple receiver operating characteristic (ROC) curves (i) to study the effect of varying computer confidence of threshold levels and (ii) to evaluate clinical performance in the diagnosis of acute appendicitis. Over 1300 patients presenting to five centres with abdominal pain of short duration were studied in varying detail. Clinical and computer-aided diagnostic predictions were compared with the »final« diagnosis. From these studies it is concluded the simplistic setting of a 50/50 confidence threshold for the computer program is as »good« as any other. The proximity of a computer-aided system changed clinical behaviour patterns; a higher overall performance level was achieved and clinicians performance levels became associated with the »mildly conservative« end of the computers ROC curve. Prior forecasts of over-confidence or ultra-caution amongst clinicians using the computer-aided system have not been fulfilled.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 949
Author(s):  
Cecil J. Weale ◽  
Don M. Matshazi ◽  
Saarah F. G. Davids ◽  
Shanel Raghubeer ◽  
Rajiv T. Erasmus ◽  
...  

This cross-sectional study investigated the association of miR-1299, -126-3p and -30e-3p with and their diagnostic capability for dysglycaemia in 1273 (men, n = 345) South Africans, aged >20 years. Glycaemic status was assessed by oral glucose tolerance test (OGTT). Whole blood microRNA (miRNA) expressions were assessed using TaqMan-based reverse transcription quantitative-PCR (RT-qPCR). Receiver operating characteristic (ROC) curves assessed the ability of each miRNA to discriminate dysglycaemia, while multivariable logistic regression analyses linked expression with dysglycaemia. In all, 207 (16.2%) and 94 (7.4%) participants had prediabetes and type 2 diabetes mellitus (T2DM), respectively. All three miRNAs were significantly highly expressed in individuals with prediabetes compared to normotolerant patients, p < 0.001. miR-30e-3p and miR-126-3p were also significantly more expressed in T2DM versus normotolerant patients, p < 0.001. In multivariable logistic regressions, the three miRNAs were consistently and continuously associated with prediabetes, while only miR-126-3p was associated with T2DM. The ROC analysis indicated all three miRNAs had a significant overall predictive ability to diagnose prediabetes, diabetes and the combination of both (dysglycaemia), with the area under the receiver operating characteristic curve (AUC) being significantly higher for miR-126-3p in prediabetes. For prediabetes diagnosis, miR-126-3p (AUC = 0.760) outperformed HbA1c (AUC = 0.695), p = 0.042. These results suggest that miR-1299, -126-3p and -30e-3p are associated with prediabetes, and measuring miR-126-3p could potentially contribute to diabetes risk screening strategies.


2021 ◽  
Vol 13 (2) ◽  
pp. 1-27
Author(s):  
A. Khalemsky ◽  
R. Gelbard

In dynamic and big data environments the visualization of a segmentation process over time often does not enable the user to simultaneously track entire pieces. The key points are sometimes incomparable, and the user is limited to a static visual presentation of a certain point. The proposed visualization concept, called ExpanDrogram, is designed to support dynamic classifiers that run in a big data environment subject to changes in data characteristics. It offers a wide range of features that seek to maximize the customization of a segmentation problem. The main goal of the ExpanDrogram visualization is to improve comprehensiveness by combining both the individual and segment levels, illustrating the dynamics of the segmentation process over time, providing “version control” that enables the user to observe the history of changes, and more. The method is illustrated using different datasets, with which we demonstrate multiple segmentation parameters, as well as multiple display layers, to highlight points such as new trend detection, outlier detection, tracking changes in original segments, and zoom in/out for more/less detail. The datasets vary in size from a small one to one of more than 12 million records.


1990 ◽  
Vol 36 (7) ◽  
pp. 1317-1322 ◽  
Author(s):  
L V Galbraith ◽  
F Y Leung ◽  
G Jablonsky ◽  
A R Henderson

Abstract Using receiver-operating characteristic (ROC) curve and likelihood ratio analysis, we examined the diagnostic utility of total lactate dehydrogenase (LD; EC 1.1.1.27) activity (I). LD isoenzyme-1 activity (II), and the LD-1 percentage of total LD activity (III), LD-1 LD-2 (IV), and LD-1/LD-4 (V) in 347 persons admitted to the Cardiac Care Unit (of whom 173 were subsequently proven to have had myocardial infarction). Blood was sampled from these subjects at about 6-h intervals for up to 96 h from the onset of chest pain. Defining an "effective" test as one having an area under the ROC curve of greater than or equal to 0.9, we determined the ranked utility (greatest to least) of these tests as V = IV greater than III greater than II greater than I. Tests III, IV, and V had by this criterion, diagnostic effectiveness equivalent to measurements of creatine kinase-2 in serum but in samples obtained at later time intervals. The decision thresholds for both high (constant) test sensitivity and specificity varied with time, to differing extents, over the entire 96-h period, a finding with important diagnostic implications. We document positive and negative likelihood ratio values for each of these tests throughout the entire period of study.


Author(s):  
Ugo Indraccolo ◽  
Gennaro Scutiero ◽  
Pantaleo Greco

Objective Analyzing if the sonographic evaluation of the cervix (cervical shortening) is a prognostic marker for vaginal delivery. Methods Women who underwent labor induction by using dinoprostone were enrolled. Before the induction and three hours after it, the cervical length was measured by ultrasonography to obtain the cervical shortening. The cervical shortening was introduced in logistic regression models among independent variables and for calculating receiver operating characteristic (ROC) curves. Results Each centimeter in the cervical shortening increases the odds of vaginal delivery in 24.4% within 6 hours; in 16.1% within 24 hours; and in 10.5% within 48 hours. The best predictions for vaginal delivery are achieved for births within 6 and 24 hours, while the cervical shortening poorly predicts vaginal delivery within 48 hours. Conclusion The greater the cervical shortening 3 hours after labor induction, the higher the likelihood of vaginal delivery within 6, 24 and 48 hours.


1995 ◽  
Vol 12 (4) ◽  
pp. 723-741 ◽  
Author(s):  
W. Guido ◽  
S.-M. Lu ◽  
J.W. Vaughan ◽  
Dwayne W. Godwin ◽  
S. Murray Sherman

AbstractRelay cells of the lateral geniculate nucleus respond to visual stimuli in one of two modes: burst and tonic. The burst mode depends on the activation of a voltage-dependent, Ca2+ conductance underlying the low threshold spike. This conductance is inactivated at depolarized membrane potentials, but when activated from hyperpolarized levels, it leads to a large, triangular, nearly all-or-none depolarization. Typically, riding its crest is a high-frequency barrage of action potentials. Low threshold spikes thus provide a nonlinear amplification allowing hyperpolarized relay neurons to respond to depolarizing inputs, including retinal EPSPs. In contrast, the tonic mode is characterized by a steady stream of unitary action potentials that more linearly reflects the visual stimulus. In this study, we tested possible differences in detection between response modes of 103 geniculate neurons by constructing receiver operating characteristic (ROC) curves for responses to visual stimuli (drifting sine-wave gratings and flashing spots). Detectability was determined from the ROC curves by computing the area under each curve, known as the ROC area. Most cells switched between modes during recording, evidently due to small shifts in membrane potential that affected the activation state of the low threshold spike. We found that the more often a cell responded in burst mode, the larger its ROC area. This was true for responses to optimal and nonoptimal visual stimuli, the latter including nonoptimal spatial frequencies and low stimulus contrasts. The larger ROC areas associated with burst mode were due to a reduced spontaneous activity and roughly equivalent level of visually evoked response when compared to tonic mode. We performed a within-cell analysis on a subset of 22 cells that switched modes during recording. Every cell, whether tested with a low contrast or high contrast visual stimulus exhibited a larger ROC area during its burst response mode than during its tonic mode. We conclude that burst responses better support signal detection than do tonic responses. Thus, burst responses, while less linear and perhaps less useful in providing a detailed analysis of visual stimuli, improve target detection. The tonic mode, with its more linear response, seems better suited for signal analysis rather than signal detection.


2017 ◽  
Vol 20 (2) ◽  
pp. 122-127 ◽  
Author(s):  
Saverio Paltrinieri ◽  
Marco Fossati ◽  
Valentina Menaballi

Objectives The objective of this study was to evaluate the diagnostic performances of manual and instrumental measurement of reticulocyte percentage (Ret%), reticulocyte number (Ret#) and reticulocyte production index (RPI) to differentiate regenerative anaemia (RA) from non-regenerative anaemia (NRA) in cats. Methods Data from 106 blood samples from anaemic cats with manual counts (n = 74; 68 NRA, six RA) or instrumental counts of reticulocytes (n = 32; 25 NRA, seven RA) collected between 1995 and 2013 were retrospectively analysed. Sensitivity, specificity and positive likelihood ratio (LR+) were calculated using either cut-offs reported in the literature or cut-offs determined from receiver operating characteristic (ROC) curves. Results All the reticulocyte parameters were significantly higher in cats with RA than in cats with NRA. All the ROC curves were significantly different ( P <0.001) from the line of no discrimination, without significant differences between the three parameters. Using the cut-offs published in literature, the Ret% (cut-off: 0.5%) was sensitive (100%) but not specific (<75%), the RPI (cut-off: 1.0) was specific (>92%) but not sensitive (<15%), and the Ret# (cut-off: 50 × 10³/µl) had a sensitivity and specificity >80% and the highest LR+ (manual count: 14; instrumental count: 6). For all the parameters, sensitivity and specificity approached 100% using the cut-offs determined by the ROC curves. These cut-offs were higher than those reported in the literature for Ret% (manual: 1.70%; instrumental: 3.06%), lower for RPI (manual: 0.39; instrumental: 0.59) and variably different, depending on the method (manual: 41 × 10³/µl; instrumental: 57 × 10³/µl), for Ret#. Using these cut-offs, the RPI had the highest LR+ (manual: 22.7; instrumental: 12.5). Conclusions and relevance This study indicated that all the reticulocyte parameters may confirm regeneration when the pretest probability is high, while when this probability is moderate, RA should be identified using the RPI providing that cut-offs <1.0 are used.


2017 ◽  
Vol 30 (1) ◽  
pp. 36-41 ◽  
Author(s):  
Michał Czopowicz ◽  
Olga Szaluś-Jordanow ◽  
Agata Moroz ◽  
Marcin Mickiewicz ◽  
Lucjan Witkowski ◽  
...  

Roughly one-fourth of goats infected with small ruminant lentivirus (SRLV) develop caprine arthritis-encephalitis (CAE). We compared the profile of antibody response to surface glycoprotein (SU), and combined transmembrane glycoprotein and capsid protein (TM/CA) in SRLV-infected arthritic and asymptomatic goats, and determined the ability of 2 commercial ELISAs to distinguish between arthritic and asymptomatic goats. We used sera from 312 SRLV-seropositive dairy goats in a whole-virus ELISA; 222 were collected from arthritic goats and 90 from apparently healthy goats. Sera were screened with a competitive inhibition ELISA based on SU antigen (SU-ELISA) and an indirect ELISA based on TM and CA antigens (TM/CA-ELISA). Receiver operating characteristic (ROC) curves were prepared for both ELISAs, and areas under the ROC curves (AUC) were compared. The proportion of goats with antibody response stronger to SU antigen than to TM/CA antigen was significantly higher among arthritic than asymptomatic goats (58.1% vs. 28.9%; p < 0.001). Antibody response to SU antigen was a good predictor of the arthritic form of CAE: AUC for SU-ELISA was 89.7% (95% CI: 85.2%, 94.2%), compared to 59.3% (95% CI: 51.9%, 66.8%) for TM/CA-ELISA ( p < 0.001). With the cutoff set at percentage of inhibition of 56%, SU-ELISA had sensitivity of 86.9% (95% CI: 81.9%, 90.7%) and specificity of 84.4% (95% CI: 75.6%, 90.5%) in discriminating between arthritic and asymptomatic goats.


2021 ◽  
Vol 13 (8) ◽  
pp. 1487
Author(s):  
Peter Lanz ◽  
Armando Marino ◽  
Thomas Brinkhoff ◽  
Frank Köster ◽  
Matthias Möller

Countless numbers of people lost their lives at Europe’s southern borders in recent years in the attempt to cross to Europe in small rubber inflatables. This work examines satellite-based approaches to build up future systems that can automatically detect those boats. We compare the performance of several automatic vessel detectors using real synthetic aperture radar (SAR) data from X-band and C-band sensors on TerraSAR-X and Sentinel-1. The data was collected in an experimental campaign where an empty boat lies on a lake’s surface to analyse the influence of main sensor parameters (incidence angle, polarization mode, spatial resolution) on the detectability of our inflatable. All detectors are implemented with a moving window and use local clutter statistics from the adjacent water surface. Among tested detectors are well-known intensity-based (CA-CFAR), sublook-based (sublook correlation) and polarimetric-based (PWF, PMF, PNF, entropy, symmetry and iDPolRAD) approaches. Additionally, we introduced a new version of the volume detecting iDPolRAD aimed at detecting surface anomalies and compare two approaches to combine the volume and the surface in one algorithm, producing two new highly performing detectors. The results are compared with receiver operating characteristic (ROC) curves, enabling us to compare detectors independently of threshold selection.


Sign in / Sign up

Export Citation Format

Share Document