scholarly journals Comparing Findings From a Friends of Cancer Research Exploratory Analysis of Real-World End Points With the Cancer Analysis System in England

2021 ◽  
pp. 1155-1168
Author(s):  
Pia Horvat ◽  
Christen M. Gray ◽  
Alexandrina Lambova ◽  
Jennifer B. Christian ◽  
Laura Lasiter ◽  
...  

PURPOSE This study compared real-world end points extracted from the Cancer Analysis System (CAS), a national cancer registry with linkage to national mortality and other health care databases in England, with those from diverse US oncology data sources, including electronic health care records, insurance claims, unstructured medical charts, or a combination, that participated in the Friends of Cancer Research Real-World Evidence Pilot Project 1.0. Consistency between data sets and between real-world overall survival (rwOS) was assessed in patients with immunotherapy-treated advanced non–small-cell lung cancer (aNSCLC). PATIENTS AND METHODS Patients with aNSCLC, diagnosed between January 2013 and December 2017, who initiated treatment with approved programmed death ligand-1 (PD-[L]1) inhibitors until March 2018 were included. Real-world end points, including rwOS and real-world time to treatment discontinuation (rwTTD), were assessed using Kaplan-Meier analysis. A synthetic data set, Simulacrum, on the basis of conditional random sampling of the CAS data was used to develop and refine analysis scripts while protecting patient privacy. RESULTS Characteristics (age, sex, and histology) of the 2,035 patients with immunotherapy-treated aNSCLC included in the CAS study were broadly comparable with US data sets. In CAS, a higher proportion (46.7%) of patients received a PD-(L)1 inhibitor in the first line than in US data sets (18%-30%). Median rwOS (11.4 months; 95% CI, 10.4 to 12.7) and rwTTD (4.9 months; 95% CI, 4.7 to 5.1) were within the range of US-based data sets (rwOS, 8.6-13.5 months; rwTTD, 3.2-7.0 months). CONCLUSION The CAS findings were consistent with those from US-based oncology data sets. Such consistency is important for regulatory decision making. Differences observed between data sets may be explained by variation in health care settings, such as the timing of PD-(L)1 approval and reimbursement, and data capture.

2003 ◽  
Vol 21 (1) ◽  
pp. 123-135 ◽  
Author(s):  
S. Vignudelli ◽  
P. Cipollini ◽  
F. Reseghetti ◽  
G. Fusco ◽  
G. P. Gasparini ◽  
...  

Abstract. From September 1999 to December 2000, eXpendable Bathy-Thermograph (XBT) profiles were collected along the Genova-Palermo shipping route in the framework of the Mediterranean Forecasting System Pilot Project (MFSPP). The route is virtually coincident with track 0044 of the TOPEX/Poseidon satellite altimeter, crossing the Ligurian and Tyrrhenian basins in an approximate N–S direction. This allows a direct comparison between XBT and altimetry, whose findings are presented in this paper. XBT sections reveal the presence of the major features of the regional circulation, namely the eastern boundary of the Ligurian gyre, the Bonifacio gyre and the Modified Atlantic Water inflow along the Sicily coast. Twenty-two comparisons of steric heights derived from the XBT data set with concurrent realizations of single-pass altimetric heights are made. The overall correlation is around 0.55 with an RMS difference of less than 3 cm. In the Tyrrhenian Sea the spectra are remarkably similar in shape, but in general the altimetric heights contain more energy. This difference is explained in terms of oceanographic signals, which are captured with a different intensity by the satellite altimeter and XBTs, as well as computational errors. On scales larger than 100 km, the data sets are also significantly coherent, with increasing coherence values at longer wavelengths. The XBTs were dropped every 18–20 km along the track: as a consequence, the spacing scale was unable to resolve adequately the internal radius of deformation (< 20 km). Furthermore, few XBT drops were carried out in the Ligurian Sea, due to the limited north-south extent of this basin, so the comparison is problematic there. On the contrary, the major features observed in the XBT data in the Tyrrhenian Sea are also detected by TOPEX/Poseidon. The manuscript is completed by a discussion on how to integrate the two data sets, in order to extract additional information. In particular, the results emphasize their complementariety in providing a dynamically complete description of the observed structures. Key words. Oceanography: general (descriptive and regional oceanography) Oceanography: physical (sea level variations; instruments and techniques)


2019 ◽  
pp. 1-13 ◽  
Author(s):  
Sandra D. Griffith ◽  
Rebecca A. Miksad ◽  
Geoff Calkins ◽  
Paul You ◽  
Nicole G. Lipitz ◽  
...  

PURPOSE Large, generalizable real-world data can enhance traditional clinical trial results. The current study evaluates reliability, clinical relevance, and large-scale feasibility for a previously documented method with which to characterize cancer progression outcomes in advanced non–small-cell lung cancer from electronic health record (EHR) data. METHODS Patients who were diagnosed with advanced non–small-cell lung cancer between January 1, 2011, and February 28, 2018, with two or more EHR-documented visits and one or more systemic therapy line initiated were identified in Flatiron Health’s longitudinal EHR-derived database. After institutional review board approval, we retrospectively characterized real-world progression (rwP) dates, with a random duplicate sample to ascertain interabstractor agreement. We calculated real-world progression-free survival, real-world time to progression, real-world time to next treatment, and overall survival (OS) using the Kaplan-Meier method (index date was the date of first-line therapy initiation), and correlations between OS and other end points were assessed at the patient level (Spearman’s ρ). RESULTS Of 30,276 eligible patients,16,606 (55%) had one or more rwP event. Of these patients, 11,366 (68%) had subsequent death, treatment discontinuation, or new treatment initiation. Correlation of real-world progression-free survival with OS was moderate to high (Spearman’s ρ, 0.76; 95% CI, 0.75 to 0.77; evaluable patients, n = 20,020), and for real-world time to progression correlation with OS was lower (Spearman’s ρ, 0.69; 95% CI, 0.68 to 0.70; evaluable patients, n = 11,902). Interabstractor agreement on rwP occurrence was 0.94 (duplicate sample, n = 1,065) and on rwP date 0.85 (95% CI, 0.81 to 0.89; evaluable patients n = 358 [patients with two independent event captures within 30 days]). Median rwP abstraction time from individual EHRs was 18.0 minutes (interquartile range, 9.7 to 34.4 minutes). CONCLUSION We demonstrated that rwP-based end points correlate with OS, and that rwP curation from a large, contemporary EHR data set can be reliable, clinically relevant, and feasible on a large scale.


2011 ◽  
Vol 2011 ◽  
pp. 1-14 ◽  
Author(s):  
Chunzhong Li ◽  
Zongben Xu

Structure of data set is of critical importance in identifying clusters, especially the density difference feature. In this paper, we present a clustering algorithm based on density consistency, which is a filtering process to identify same structure feature and classify them into same cluster. This method is not restricted by the shapes and high dimension data set, and meanwhile it is robust to noises and outliers. Extensive experiments on synthetic and real world data sets validate the proposed the new clustering algorithm.


Author(s):  
Drew Levin ◽  
Patrick Finley

ObjectiveTo develop a spatially accurate biosurveillance synthetic datagenerator for the testing, evaluation, and comparison of new outbreakdetection techniques.IntroductionDevelopment of new methods for the rapid detection of emergingdisease outbreaks is a research priority in the field of biosurveillance.Because real-world data are often proprietary in nature, scientists mustutilize synthetic data generation methods to evaluate new detectionmethodologies. Colizza et. al. have shown that epidemic spread isdependent on the airline transportation network [1], yet current datagenerators do not operate over network structures.Here we present a new spatial data generator that models thespread of contagion across a network of cities connected by airlineroutes. The generator is developed in the R programming languageand produces data compatible with the popular `surveillance’ softwarepackage.MethodsColizza et. al. demonstrate the power-law relationships betweencity population, air traffic, and degree distribution [1]. We generate atransportation network as a Chung-Lu random graph [2] that preservesthese scale-free relationships (Figure 1).First, given a power-law exponent and a desired number of cities,a probability mass function (PMF) is generated that mirrors theexpected degree distribution for the given power-law relationship.Values are then sampled from this PMF to generate an expecteddegree (number of connected cities) for each city in the network.Edges (airline connections) are added to the network probabilisticallyas described in [2]. Unconnected graph components are each joinedto the largest component using linear preferential attachment. Finally,city sizes are calculated based on an observed three-quarter power-law scaling relationship with the sampled degree distribution.Each city is represented as a customizable stochastic compartmentalSIR model. Transportation between cities is modeled similar to [2].An infection is initialized in a single random city and infection countsare recorded in each city for a fixed period of time. A consistentfraction of the modeled infection cases are recorded as daily clinicvisits. These counts are then added onto statically generated baselinedata for each city to produce a full synthetic data set. Alternatively,data sets can be generated using real-world networks, such as the onemaintained by the International Air Transport Association.ResultsDynamics such as the number of cities, degree distribution power-law exponent, traffic flow, and disease kinetics can be customized.In the presented example (Figure 2) the outbreak spreads over a 20city transportation network. Infection spreads rapidly once the morepopulated hub cities are infected. Cities that are multiple flights awayfrom the initially infected city are infected late in the process. Thegenerator is capable of creating data sets of arbitrary size, length, andconnectivity to better mirror a diverse set of observed network types.ConclusionsNew computational methods for outbreak detection andsurveillance must be compared to established approaches. Outbreakmitigation strategies require a realistic model of human transportationbehavior to best evaluate impact. These actions require test data thataccurately reflect the complexity of the real-world data they wouldbe applied to. The outbreak data generated here represents thecomplexity of modern transportation networks and are made to beeasily integrated with established software packages to allow for rapidtesting and deployment.Randomly generated scale-free transportation network with a power-lawdegree exponent ofλ=1.8. City and link sizes are scaled to reflect their weight.An example of observed daily outbreak-related clinic visits across a randomlygenerated network of 20 cities. Each city is colored by the number of flightsrequired to reach the city from the initial infection location. These generatedcounts are then added onto baseline data to create a synthetic data set forexperimentation.KeywordsSimulation; Network; Spatial; Synthetic; Data


10.2196/22624 ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. e22624 ◽  
Author(s):  
Ranganathan Chandrasekaran ◽  
Vikalp Mehta ◽  
Tejali Valkunde ◽  
Evangelos Moustakas

Background With restrictions on movement and stay-at-home orders in place due to the COVID-19 pandemic, social media platforms such as Twitter have become an outlet for users to express their concerns, opinions, and feelings about the pandemic. Individuals, health agencies, and governments are using Twitter to communicate about COVID-19. Objective The aims of this study were to examine key themes and topics of English-language COVID-19–related tweets posted by individuals and to explore the trends and variations in how the COVID-19–related tweets, key topics, and associated sentiments changed over a period of time from before to after the disease was declared a pandemic. Methods Building on the emergent stream of studies examining COVID-19–related tweets in English, we performed a temporal assessment covering the time period from January 1 to May 9, 2020, and examined variations in tweet topics and sentiment scores to uncover key trends. Combining data from two publicly available COVID-19 tweet data sets with those obtained in our own search, we compiled a data set of 13.9 million English-language COVID-19–related tweets posted by individuals. We use guided latent Dirichlet allocation (LDA) to infer themes and topics underlying the tweets, and we used VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis to compute sentiment scores and examine weekly trends for 17 weeks. Results Topic modeling yielded 26 topics, which were grouped into 10 broader themes underlying the COVID-19–related tweets. Of the 13,937,906 examined tweets, 2,858,316 (20.51%) were about the impact of COVID-19 on the economy and markets, followed by spread and growth in cases (2,154,065, 15.45%), treatment and recovery (1,831,339, 13.14%), impact on the health care sector (1,588,499, 11.40%), and governments response (1,559,591, 11.19%). Average compound sentiment scores were found to be negative throughout the examined time period for the topics of spread and growth of cases, symptoms, racism, source of the outbreak, and political impact of COVID-19. In contrast, we saw a reversal of sentiments from negative to positive for prevention, impact on the economy and markets, government response, impact on the health care industry, and treatment and recovery. Conclusions Identification of dominant themes, topics, sentiments, and changing trends in tweets about the COVID-19 pandemic can help governments, health care agencies, and policy makers frame appropriate responses to prevent and control the spread of the pandemic.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18725-e18725
Author(s):  
Ravit Geva ◽  
Barliz Waissengrin ◽  
Dan Mirelman ◽  
Felix Bokstein ◽  
Deborah T. Blumenthal ◽  
...  

e18725 Background: Healthcare data sharing is important for the creation of diverse and large data sets, supporting clinical decision making, and accelerating efficient research to improve patient outcomes. This is especially vital in the case of real world data analysis. However, stakeholders are reluctant to share their data without ensuring patients’ privacy, proper protection of their data sets and the ways they are being used. Homomorphic encryption is a cryptographic capability that can address these issues by enabling computation on encrypted data without ever decrypting it, so the analytics results are obtained without revealing the raw data. The aim of this study is to prove the accuracy of analytics results and the practical efficiency of the technology. Methods: A real-world data set of colorectal cancer patients’ survival data, following two different treatment interventions, including 623 patients and 24 variables, amounting to 14,952 items of data, was encrypted using leveled homomorphic encryption implemented in the PALISADE software library. Statistical analysis of key oncological endpoints was blindly performed on both the raw data and the homomorphically-encrypted data using descriptive statistics and survival analysis with Kaplan-Meier curves. Results were then compared with an accuracy goal of two decimals. Results: The difference between the raw data and the homomorphically encrypted data results, regarding all variables analyzed was within the pre-determined accuracy range goal, as well as the practical efficiency of the encrypted computation measured by run time, are presented in table. Conclusions: This study demonstrates that data encrypted with Homomorphic Encryption can be statistical analyzed with a precision of at least two decimal places, allowing safe clinical conclusions drawing while preserving patients’ privacy and protecting data owners’ data assets. Homomorphic encryption allows performing efficient computation on encrypted data non-interactively and without requiring decryption during computation time. Utilizing the technology will empower large-scale cross-institution and cross- stakeholder collaboration, allowing safe international collaborations. Clinical trial information: 0048-19-TLV. [Table: see text]


Author(s):  
Olivia Prosper ◽  
Swati DebRoy ◽  
Austin Mishoe ◽  
Cesar Montalvo ◽  
Niyamat Ali Siddiqui ◽  
...  

Background: Underreporting of Visceral Leishmaniasis (VL) in India remains a problem to public health controls. Effective and reliable surveillance systems are critical for monitoring disease outbreaks and public health control programs. However, in India, government surveillance systems are affected by levels of scarcity in resources and therefore, uncertainty surrounds the true incidence of asymptomatic and clinical cases, affecting morbidity and mortality rates. The State of Bihar alone contributes up to the 40\% of the worldwide VL cases. The inefficiency of surveillance systems occurs because of multiple reasons including delay in seeking health care, accessing non-authentic health care clinics, and existence of significant asymptomatic self healing infectious cases. This results in a failure of the system to adequately report true transmission rates and number of symptomatic cases that have sought medical advice (thus, high underreporting of cases). Objectives and Methods: There are several methods to estimate the extent of underreporting in the surveillance system. In this research, we use a mathematical dynamic model and two different types of data sets, namely, monthly incidence for 2003-2005 and yearly incidence from 2006-2012 from the Bihar's 21 most VL affected districts out of its 38 districts. The goals of the study are to estimate critical metrics to measure level of transmission and to evaluate the estimation process between the two data sets and 21 districts. In particularly, our focus is on (i) estimating infection transmission potential, underreporting level in incidence and proportion of self-healing cases, (ii) quantifying reproduction number of the$R_0$, and (iii) comparing underreporting incidence levels and proportion of self-healing cases between the two periods 2003-2005 and 2006-2012 and between 21 districts. Results: Our research suggests that the number of asymptomatic individuals in the population who eventually self-heal may have a significant effect on the dynamics of VL spread. The estimated mean self-healing proportion (out of all infected) is found to be $\sim 0.6$ with only 7 out of 21 affected districts having self-healing proportion less than $0.5$ for both data sets. The estimated mean underreporting level is at least $64$\% for the state of Bihar. The estimates of the basic reproduction numbers obtained are similar in magnitude for most of the districts, being in the range of (0.88, 2.79) and (0.98, 1.01) for 2003-2005 and 2006-2012, respectively. Conclusions: The estimates for the two types (monthly and yearly) of temporal data suggest that monthly data are better for estimation if less number of data points are available, however, in general, using such data set results in larger variances in parameters as compared to estimates obtained through aggregated yearly data. Estimated values of transmission related metrics are lower than those obtained from earlier analyses in the literature, and the implications of this for VL control are discussed. The spatial heterogeneity in these control metrics increases the risk of epidemics and makes the control strategies more complex.


2020 ◽  
Vol 267 (S1) ◽  
pp. 185-196
Author(s):  
J. Gerb ◽  
S. A. Ahmadi ◽  
E. Kierig ◽  
B. Ertl-Wagner ◽  
M. Dieterich ◽  
...  

Abstract Background Objective and volumetric quantification is a necessary step in the assessment and comparison of endolymphatic hydrops (ELH) results. Here, we introduce a novel tool for automatic volumetric segmentation of the endolymphatic space (ELS) for ELH detection in delayed intravenous gadolinium-enhanced magnetic resonance imaging of inner ear (iMRI) data. Methods The core component is a novel algorithm based on Volumetric Local Thresholding (VOLT). The study included three different data sets: a real-world data set (D1) to develop the novel ELH detection algorithm and two validating data sets, one artificial (D2) and one entirely unseen prospective real-world data set (D3). D1 included 210 inner ears of 105 patients (50 male; mean age 50.4 ± 17.1 years), and D3 included 20 inner ears of 10 patients (5 male; mean age 46.8 ± 14.4 years) with episodic vertigo attacks of different etiology. D1 and D3 did not differ significantly concerning age, gender, the grade of ELH, or data quality. As an artificial data set, D2 provided a known ground truth and consisted of an 8-bit cuboid volume using the same voxel-size and grid as real-world data with different sized cylindrical and cuboid-shaped cutouts (signal) whose grayscale values matched the real-world data set D1 (mean 68.7 ± 7.8; range 48.9–92.8). The evaluation included segmentation accuracy using the Sørensen-Dice overlap coefficient and segmentation precision by comparing the volume of the ELS. Results VOLT resulted in a high level of performance and accuracy in comparison with the respective gold standard. In the case of the artificial data set, VOLT outperformed the gold standard in higher noise levels. Data processing steps are fully automated and run without further user input in less than 60 s. ELS volume measured by automatic segmentation correlated significantly with the clinical grading of the ELS (p < 0.01). Conclusion VOLT enables an open-source reproducible, reliable, and automatic volumetric quantification of the inner ears’ fluid space using MR volumetric assessment of endolymphatic hydrops. This tool constitutes an important step towards comparable and systematic big data analyses of the ELS in patients with the frequent syndrome of episodic vertigo attacks. A generic version of our three-dimensional thresholding algorithm has been made available to the scientific community via GitHub as an ImageJ-plugin.


2003 ◽  
Vol 18 (4) ◽  
pp. 281-284 ◽  
Author(s):  
Svend Erik Rasmussen

Data for the standard material NBS SRM 674, TiO2, were collected on two diffractometers: a) a Philips PW 1050/37 standard diffractometer of the Bragg-Brentano type equipped with a post diffraction curved Ge monochromator, b) a Stoe Stadi P diffractometer of transmission type equipped with a curved incident beam Ge monochromator. Both monochromators were set to select pure CuKα1 radiation. The reflection type instrument gives a much larger peak to background ratio than the transmission instrument, for which the background is much higher than with the reflection instrument. Rietveld refinements were carried out on both data sets with the programs DBWS-9807 and general structure analysis system (GSAS). The structural parameter of the oxygen atom of rutile depends neither on data set nor program, whereas, e.g., thermal displacement parameters seem to depend on both data set and program.


2020 ◽  
Vol 125 (3) ◽  
pp. 3085-3108 ◽  
Author(s):  
Tarek Saier ◽  
Michael Färber

AbstractIn recent years, scholarly data sets have been used for various purposes, such as paper recommendation, citation recommendation, citation context analysis, and citation context-based document summarization. The evaluation of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing scholarly data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific disciplines available on arXiv.org. Apart from providing the papers’ plain text, in-text citations were annotated via global identifiers. Furthermore, citing and cited publications were linked to the Microsoft Academic Graph, providing access to rich metadata. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches, but also serve as a basis for new ways to analyze in-text citations, as we show prototypically in this article.


Sign in / Sign up

Export Citation Format

Share Document