Novel Approaches to Smoothing and Comparing SELDI TOF Spectra

Cancer Informatics ◽

10.1177/117693510500100109 ◽

2005 ◽

Vol 1 ◽

pp. 117693510500100 ◽

Cited By ~ 4

Author(s):

Sreelatha Meleth ◽

Isam-Eldin Eltoum ◽

Liu Zhu ◽

Denise Oelschlager ◽

Chandrika Piyathilake ◽

...

Keyword(s):

Spectral Analysis ◽

Fourier Transforms ◽

Area Under The Curve ◽

Maximum Intensity ◽

Data Sets ◽

Intensity Level ◽

Prominent Feature ◽

Data Set ◽

The Third ◽

Novel Approaches

Background Most published literature using SELDI-TOF has used traditional techniques in Spectral Analysis such as Fourier transforms and wavelets for denoising. Most of these publications also compare spectra using their most prominent feature, ie, peaks or local maximums. Methods The maximum intensity value within each window of differentiable m/z values was used to represent the intensity level in that window. We also calculated the ‘Area under the Curve’ (AUC) spanned by each window. Results Keeping everything else constant, such as pre-processing of the data and the classifier used, the AUC performed much better as a metric of comparison than the peaks in two out of three data sets. In the third data set both metrics performed equivalently. Conclusions This study shows that the feature used to compare spectra can have an impact on the results of a study attempting to identify biomarkers using SELDI TOF data.

Get full-text (via PubEx)

Prediction of Progestin Affinity for the Human Progesterone Receptor Based on Corrected RBA Data

Biomedical Chemistry Research and Methods ◽

10.18097/bmcrm00080 ◽

2018 ◽

Vol 1 (4) ◽

pp. e00080

Author(s):

A.V. Mikurova ◽

V.S. Skvortsov

Keyword(s):

Progesterone Receptor ◽

Prediction Equation ◽

Binding Activity ◽

Data Sets ◽

Data Set ◽

The Third ◽

Relative Binding ◽

Human Progesterone Receptor ◽

Nuclear Progesterone Receptor

The modeling of complexes of 3 sets of steroid and nonsteroidal progestins with the ligand-binding domain of the nuclear progesterone receptor was performed. Molecular docking procedure, long-term simulation of molecular dynamics and subsequent analysis by MM-PBSA (MM-GBSA) were used to model the complexes. Using the characteristics obtained by the MM-PBSA method two data sets of steroid compounds obtained in different scientific groups a prediction equation for the value of relative binding activity (RBA) was constructed. The RBA value was adjusted so that in all samples the actual activity was compared with the progesterone activity. The third data set of nonsteroidal compounds was used as a test. The resulted equation showed that the prediction results could be applied to both steroid molecules and nonsteroidal progestins.

Get full-text (via PubEx)

Model Distribution Effects on Likelihood Ratios in Fire Debris Analysis

Separations ◽

10.3390/separations5030044 ◽

2018 ◽

Vol 5 (3) ◽

pp. 44 ◽

Cited By ~ 3

Author(s):

Alyssa Allen ◽

Mary Williams ◽

Nicholas Thurn ◽

Michael Sigman

Keyword(s):

Computational Models ◽

Area Under The Curve ◽

Ground Truth ◽

Data Sets ◽

Likelihood Ratios ◽

Data Set ◽

Discriminant Model ◽

Fire Debris ◽

Characteristic Area ◽

Ignitable Liquid

Computational models for determining the strength of fire debris evidence based on likelihood ratios (LR) were developed and validated against data sets derived from different distributions of ASTM E1618-14 designated ignitable liquid class and substrate pyrolysis contributions using in-silico generated data. The models all perform well in cross validation against the distributions used to generate the model. However, a model generated based on data that does not contain representatives from all of the ASTM E1618-14 classes does not perform well in validation with data sets that contain representatives from the missing classes. A quadratic discriminant model based on a balanced data set (ignitable liquid versus substrate pyrolysis), with a uniform distribution of the ASTM E1618-14 classes, performed well (receiver operating characteristic area under the curve of 0.836) when tested against laboratory-developed casework-relevant samples of known ground truth.

Get full-text (via PubEx)

Spectral analysis and inversion of experimental codas

Geophysics ◽

10.1190/1.1443424 ◽

1993 ◽

Vol 58 (3) ◽

pp. 408-418 ◽

Cited By ~ 3

Author(s):

L. R. Jannaud ◽

P. M. Adler ◽

C. G. Jacquin

Keyword(s):

Spectral Analysis ◽

Characteristic Length ◽

Gaussian Model ◽

Seismic Survey ◽

Data Sets ◽

Data Set ◽

Anisotropic Elastic ◽

And Inversion ◽

Good Agreement

A method developed for the determination of the characteristic lengths of an heterogeneous medium from the spectral analysis of codas is based on an extension of Aki’s theory to anisotropic elastic media. An equivalent Gaussian model is obtained and seems to be in good agreement with the two experimental data sets that illustrate the method. The first set was obtained in a laboratory experiment with an isotropic marble sample. This sample is characterized by a submillimetric length scale that can be directly observed on a thin section. The spectral analysis of codas and their inversion yields an equivalent correlation length that is in good agreement with the observed one. The second data set is obtained in a crosshole experiment at the usual scale of a seismic survey. The codas are recorded, analysed, and inverted. The analysis yields a vertical characteristic length for the studied subsurface that compares well with the characteristic length measured by seismic and stratigraphic logs.

Get full-text (via PubEx)

Effects of environmental factors on multiple ovulation of zebu donors

Arquivo Brasileiro de Medicina Veterinária e Zootecnia ◽

10.1590/s0102-09352006000400019 ◽

2006 ◽

Vol 58 (4) ◽

pp. 567-574 ◽

Cited By ~ 3

Author(s):

M.G.C.D. Peixoto ◽

J.A.G. Bergmann ◽

C.G. Fonseca ◽

V.M. Penna ◽

C.S. Pereira

Keyword(s):

Environmental Factors ◽

Least Squares ◽

Inbreeding Coefficient ◽

Corpora Lutea ◽

Data Sets ◽

Data Set ◽

The Third ◽

Multiple Ovulation ◽

The Mean ◽

Best Responses

Data on 1,294 superovulations of Brahman, Gyr, Guzerat and Nellore females were used to evaluate the effects of: breed; herd; year of birth; inbreeding coefficient and age at superovulation of the donor; month, season and year of superovulation; hormone source and dose; and the number of previous treatments on the superovulation results. Four data sets were considered to study the influence of donors’ elimination effect after each consecutive superovulation. Each one contained only records of the first, or of the two firsts, or three firsts or all superovulations. The average number of palpated corpora lutea per superovulation varied from 8.6 to 12.6. The total number of recovered structures and viable embryos ranged from 4.1 to 7.3 and from 7.3 to 13.8, respectively. Least squares means of the number of viable embryos at first superovulation were 7.8 ± 6.6 (Brahman), 3.7 ± 4.5 (Gyr), 6.1 ± 5.9 (Guzerat) and 5.2 ± 5.9 (Nellore). The numbers of viable embryos of the second and the third superovulations were not different from those of the first superovulation. The mean intervals between first and second superovulations were 91.8 days for Brahman, 101.8 days for Gyr, 93.1 days for Guzerat and 111.3 days for Nellore donors. Intervals between the second and the third superovulations were 134.3, 110.3, 116.4 and 108.5 days for Brahman, Gyr, Guzerat and Nellore donors, respectively. Effects of herd nested within breed and dose nested within hormone affected all traits. For some data sets, the effects of month and order of superovulation on three traits were importants. The maximum number of viable embryos was observed for 7-8 year-old donors. The best responses for corpora lutea and recovered structures were observed for 4-5 year-old donors. Inbreeding coefficient was positively associated to the number of recovered structures when data set on all superovulations was considered.

Get full-text (via PubEx)

An Integration of Cardiovascular Event Data and Machine Learning Models for Cardiac Arrest Predictions

International Journal of Health Sciences and Pharmacy ◽

10.47992/ijhsp.2581.6411.0061 ◽

2021 ◽

pp. 55-71

Author(s):

Krishna Prasad K ◽

Aithal P. S. ◽

Navin N. Bappalige ◽

Soumya S

Keyword(s):

Machine Learning ◽

Cardiac Arrest ◽

Area Under The Curve ◽

Computer Applications ◽

Data Sets ◽

Cardiovascular Risks ◽

Data Set ◽

Average Area ◽

Learning Classifier ◽

Tree Classifier

Purpose: Predicting and then preventing cardiac arrest of a patient in ICU is the most challenging phase even for a most highly skilled professional. The data been collected in ICU for a patient are huge, and the selection of a portion of data for preventing cardiac arrest in a quantum of time is highly decisive, analysing and predicting that large data require an effective system. An effective integration of computer applications and cardiovascular data is necessary to predict the cardiovascular risks. A machine learning technique is the right choice in the advent of technology to manage patients with cardiac arrest. Methodology: In this work we have collected and merged three data sets, Cleveland Dataset of US patients with total 303 records, Statlog Dataset of UK patients with 270 records, and Hungarian dataset of Hungary, Switzerland with 617 records. These data are the most comprehensive data set with a combination of all three data sets consisting of 11 common features with 1190 records. Findings/Results: Feature extraction phase extracts 7 features, which contribute to the event. In addition, extracted features are used to train the selected machine learning classifier models, and results are obtained and obtained results are then evaluated using test data and final results are drawn. Extra Tree Classifier has the highest value of 0.957 for average area under the curve (AUC). Originality: The originality of this combined Dataset analysis using machine learning classifier model results Extra Tree Classifier with highest value of 0.957 for average area under the curve (AUC). Paper Type: Experimental Research Keywords: Cardiac, Machine Learning, Random Forest, XBOOST, ROC AUC, ST Slope.

Get full-text (via PubEx)

Assessing Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists (Preprint)

10.2196/preprints.35391 ◽

2021 ◽

Author(s):

Ibukun Oloruntoba ◽

Toan D Nguyen ◽

Zongyuan Ge ◽

Tine Vestergaard ◽

Victoria Mar

Keyword(s):

Skin Cancer ◽

Area Under The Curve ◽

Image Data ◽

Skin Lesions ◽

Training Image ◽

Data Sets ◽

Image Capture ◽

Data Set ◽

Unseen Data ◽

Test Variability

BACKGROUND Convolutional neural networks (CNNs) are a type of artificial intelligence that show promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image data sets of varying quality and image capture standardization. OBJECTIVE The aim of our study is to use CNN models with the same architecture, but different training image sets, and test variability in performance when classifying skin cancer images in different populations, acquired with different devices. Additionally, we wanted to assess the performance of the models against Danish teledermatologists when tested on images acquired from Denmark. METHODS Three CNNs with the same architecture were trained. CNN-NS was trained on 25,331 nonstandardized images taken from the International Skin Imaging Collaboration using different image capture devices. CNN-S was trained on 235,268 standardized images, and CNN-S2 was trained on 25,331 standardized images (matched for number and classes of training images to CNN-NS). Both standardized data sets (CNN-S and CNN-S2) were provided by Molemap using the same image capture device. A total of 495 Danish patients with 569 images of skin lesions predominantly involving Fitzpatrick skin types II and III were used to test the performance of the models. Four teledermatologists independently diagnosed and assessed the images taken of the lesions. Primary outcome measures were sensitivity, specificity, and area under the curve of the receiver operating characteristic (AUROC). RESULTS A total of 569 images were taken from 495 patients (n=280, 57% women, n=215, 43% men; mean age 55, SD 17 years) for this study. On these images, CNN-S achieved an AUROC of 0.861 (95% CI 0.830-0.889; P<.001), and CNN-S2 achieved an AUROC of 0.831 (95% CI 0.798-0.861; P=.009), with both outperforming CNN-NS, which achieved an AUROC of 0.759 (95% CI 0.722-0.794; P<.001; P=.009). When the CNNs were matched to the mean sensitivity and specificity of the teledermatologists, the model’s resultant sensitivities and specificities were surpassed by the teledermatologists. However, when compared to CNN-S, the differences were not statistically significant (P=.10; P=.05). Performance across all CNN models and teledermatologists was influenced by the image quality. CONCLUSIONS CNNs trained on standardized images had improved performance and therefore greater generalizability in skin cancer classification when applied to an unseen data set. This is an important consideration for future algorithm development, regulation, and approval. Further, when tested on these unseen test images, the teledermatologists clinically outperformed all the CNN models; however, the difference was deemed to be statistically insignificant when compared to CNN-S.

Get full-text (via PubEx)

Analysis of Data Associated with Seemingly Temporal Clustering of a Rare Disease

Methods of Information in Medicine ◽

10.1055/s-0038-1634493 ◽

1998 ◽

Vol 37 (01) ◽

pp. 26-31 ◽

Cited By ~ 5

Author(s):

U. Goldbourt ◽

R. Chen

Keyword(s):

Relative Risk ◽

Cluster Size ◽

Statistical Tests ◽

Time Interval ◽

Intensity Level ◽

Average Increase ◽

Data Set ◽

The Third ◽

Temporal Clustering ◽

Power Curves

Abstract:Three statistical tests aimed at detecting temporal clustering within a given short series of diagnoses are presented. These tests are based on a standardized time interval between consecutive diagnoses. Two of the tests (the Cuscore and the Sets tests) are derived from sequential monitoring techniques which are sensitive to temporal clustering within the data set. The third test (R test) is not sequential and its sensitivity is focused on the average increase in the overall rate of the disease rather than on clustering within the series. Power curves are presented for conditions related to the intensity level of the subtle epidemic, the cluster size and the number of diagnoses. None of the techniques showed highest efficiency over all the specified conditions. The R test is the most efficient when the relative risk is 2 or less, and the Cuscore test is the most efficient method when the relative risk is ≥2.5.

Get full-text (via PubEx)

The 1919 eclipse results that verified general relativity and their later detractors: a story re-told

Notes and Records the Royal Society journal of the history of science ◽

10.1098/rsnr.2020.0040 ◽

2020 ◽

Author(s):

Gerard Gilmore FRS ◽

Gudrun Tausch-Pebody

Keyword(s):

General Relativity ◽

Theoretical Prediction ◽

Data Selection ◽

Experimental Result ◽

Data Sets ◽

The Sun ◽

Data Set ◽

The Third

Einstein became world famous on 7 November 1919, following press publication of a meeting held in London on 6 November 1919 where the results were announced of two British expeditions led by Eddington, Dyson and Davidson to measure how much background starlight is bent as it passes the Sun. Three data sets were obtained: two showed the measured deflection matched the theoretical prediction of Einstein's 1915 Theory of General Relativity, and became the official result; the third was discarded as defective. At the time, the experimental result was accepted by the expert astronomical community. However, in 1980 a study by philosophers of science Earman and Glymour claimed that the data selection in the 1919 analysis was flawed and that the discarded data set was fully valid and was not consistent with the Einstein prediction, and that, therefore, the overall result did not verify General Relativity. This claim, and the resulting accusation of Eddington's bias, was repeated with exaggeration in later literature and has become ubiquitous. The 1919 and 1980 analyses of the same data provide two discordant conclusions. We reanalyse the 1919 data, and identify the error that undermines the conclusions of Earman and Glymour.

Get full-text (via PubEx)

Comparison of the third-generation Japanese ocean flux data set J-OFURO3 with numerical simulations of Typhoon Dujuan (2015) traveling south of Okinawa

Journal of Oceanography ◽

10.1007/s10872-020-00554-6 ◽

2020 ◽

Vol 76 (6) ◽

pp. 419-437

Author(s):

Akiyoshi Wada ◽

Hiroyuki Tomita ◽

Shin’ichiro Kako

Keyword(s):

Inner Core ◽

Surface Wind ◽

Sea Surface ◽

Specific Humidity ◽

Data Sets ◽

Third Generation ◽

Data Set ◽

Wind Speeds ◽

The Third ◽

Simulated Surface

Abstract Insufficient in situ observations in high winds make it difficult to verify climatological data sets and the results of tropical cyclone (TC) simulations. Reliable data sets are necessary for developing numerical models that predict TCs more accurately. This study attempted to compare the third-generation Japanese Ocean Flux Data Sets with Use of Remote-Sensing Observations (J-OFURO3) data, with TC simulations conducted by a 2 km mesh coupled atmosphere-wave-ocean model. This is a case study of Typhoon Dujuan (2015) and the area of approximately 20̊N, 130̊E, south of Okinawa, was selected. The comparison reveals that J-OFURO3 data are reliable for verifying the atmospheric and oceanic components of TC simulations with two different initial sea surface temperature (SST) conditions, although the blank area remains within the inner core area for air temperature, specific humidity, and latent heat flux owing to issues with the construction method. Simulated maximum surface wind speeds (MSWs) are significantly correlated with J-OFURO3 MSWs. The asymmetrical distribution of simulated surface wind speeds within the inner core area can be reproduced well in the J-OFURO3 data set. In terms of the oceanic response to the TC, TC-induced sea surface cooling was reproduced well in the J-OFURO3 data set and is consistent with the simulation results. Unlike simulated SST, simulated surface wind speeds, surface air temperature, and surface specific humidity are still inconsistent with the J-OFURO3 data, even when the J-OFURO3 SST is used as the initial condition. New algorithms, more satellite data used, and model improvement are expected in the future.

Get full-text (via PubEx)

Assessing Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists

Iproceedings ◽

10.2196/35391 ◽

2021 ◽

Vol 6 (1) ◽

pp. e35391

Author(s):

Ibukun Oloruntoba ◽

Toan D Nguyen ◽

Zongyuan Ge ◽

Tine Vestergaard ◽

Victoria Mar

Keyword(s):

Skin Cancer ◽

Conflicts Of Interest ◽

Area Under The Curve ◽

Image Data ◽

Skin Lesions ◽

Training Image ◽

Data Sets ◽

Image Capture ◽

Data Set ◽

Unseen Data

Background Convolutional neural networks (CNNs) are a type of artificial intelligence that show promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image data sets of varying quality and image capture standardization. Objective The aim of our study is to use CNN models with the same architecture, but different training image sets, and test variability in performance when classifying skin cancer images in different populations, acquired with different devices. Additionally, we wanted to assess the performance of the models against Danish teledermatologists when tested on images acquired from Denmark. Methods Three CNNs with the same architecture were trained. CNN-NS was trained on 25,331 nonstandardized images taken from the International Skin Imaging Collaboration using different image capture devices. CNN-S was trained on 235,268 standardized images, and CNN-S2 was trained on 25,331 standardized images (matched for number and classes of training images to CNN-NS). Both standardized data sets (CNN-S and CNN-S2) were provided by Molemap using the same image capture device. A total of 495 Danish patients with 569 images of skin lesions predominantly involving Fitzpatrick skin types II and III were used to test the performance of the models. Four teledermatologists independently diagnosed and assessed the images taken of the lesions. Primary outcome measures were sensitivity, specificity, and area under the curve of the receiver operating characteristic (AUROC). Results A total of 569 images were taken from 495 patients (n=280, 57% women, n=215, 43% men; mean age 55, SD 17 years) for this study. On these images, CNN-S achieved an AUROC of 0.861 (95% CI 0.830-0.889; P<.001), and CNN-S2 achieved an AUROC of 0.831 (95% CI 0.798-0.861; P=.009), with both outperforming CNN-NS, which achieved an AUROC of 0.759 (95% CI 0.722-0.794; P<.001; P=.009). When the CNNs were matched to the mean sensitivity and specificity of the teledermatologists, the model’s resultant sensitivities and specificities were surpassed by the teledermatologists. However, when compared to CNN-S, the differences were not statistically significant (P=.10; P=.05). Performance across all CNN models and teledermatologists was influenced by the image quality. Conclusions CNNs trained on standardized images had improved performance and therefore greater generalizability in skin cancer classification when applied to an unseen data set. This is an important consideration for future algorithm development, regulation, and approval. Further, when tested on these unseen test images, the teledermatologists clinically outperformed all the CNN models; however, the difference was deemed to be statistically insignificant when compared to CNN-S. Conflicts of Interest VM received speakers fees from Merck, Eli Lily, Novartis and Bristol Myers Squibb. VM is the principal investigator for a clinical trial funded by the Victorian Department of Health and Human Services with 1:1 contribution from MoleMap.

Get full-text (via PubEx)