Network Data Characteristics

Author(s):  
Yu Wang

Data represents the natural phenomena of our real world. Data is constructed by rows and columns; usually rows represent the observations and columns represent the variables. Observations, also called subjects, records, or data points, represent a phenomenon in the real world and variables, as also known as data elements or data fields, represent the characteristics of observations in data. Variables take different values for different observations, which can make observations independent of each other. Figure 4.1 illustrates a section of TCP/IP traffic data, in which the rows are individual network traffics, and the columns, separated by a space, are characteristics of the traffics. In this example, the first column is a session index of each connection and the second column is the date when the connection occurred. In this chapter, we will discuss some fundamental key features of variables and network data. We will present detailed discussions on variable characteristics and distributions in Sections Random Variables and Variables Distributions, and describe network data modules in Section Network Data Modules. The material covered in this chapter will help readers who do not have a solid background in this area gain an understanding of the basic concepts of variables and data. Additional information can be found from Introduction to the Practice of Statistics by Moore and McCabe (1998).

2021 ◽  
Vol 12 (01) ◽  
pp. 017-026
Author(s):  
Georg Melzer ◽  
Tim Maiwald ◽  
Hans-Ulrich Prokosch ◽  
Thomas Ganslandt

Abstract Background Even though clinical trials are indispensable for medical research, they are frequently impaired by delayed or incomplete patient recruitment, resulting in cost overruns or aborted studies. Study protocols based on real-world data with precisely expressed eligibility criteria and realistic cohort estimations are crucial for successful study execution. The increasing availability of routine clinical data in electronic health records (EHRs) provides the opportunity to also support patient recruitment during the prescreening phase. While solutions for electronic recruitment support have been published, to our knowledge, no method for the prioritization of eligibility criteria in this context has been explored. Methods In the context of the Electronic Health Records for Clinical Research (EHR4CR) project, we examined the eligibility criteria of the KATHERINE trial. Criteria were extracted from the study protocol, deduplicated, and decomposed. A paper chart review and data warehouse query were executed to retrieve clinical data for the resulting set of simplified criteria separately from both sources. Criteria were scored according to disease specificity, data availability, and discriminatory power based on their content and the clinical dataset. Results The study protocol contained 35 eligibility criteria, which after simplification yielded 70 atomic criteria. For a cohort of 106 patients with breast cancer and neoadjuvant treatment, 47.9% of data elements were captured through paper chart review, with the data warehouse query yielding 26.9% of data elements. Score application resulted in a prioritized subset of 17 criteria, which yielded a sensitivity of 1.00 and specificity 0.57 on EHR data (paper charts, 1.00 and 0.80) compared with actual recruitment in the trial. Conclusion It is possible to prioritize clinical trial eligibility criteria based on real-world data to optimize prescreening of patients on a selected subset of relevant and available criteria and reduce implementation efforts for recruitment support. The performance could be further improved by increasing EHR data coverage.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Emil S. Kostov ◽  
Evgeni E. Grigorov ◽  
Hristina V. Lebanova

Summary Non-interventional studies (NIS) are conducted to obtain additional information about a medicinal product prescribed in the usual manner in compliance with the conditions determined in the marketing authorization. They are a valuable source of real-world data for the effectiveness and safety of medicines. This study aims to assess physicians‘ knowledge of non-interventional studies in Bulgaria and identify the primary factors and barriers hindering the NIS at a national level. An individual anonymous questionnaire with 16 items was distributed among physicians in inpatient and outpatient settings. The results showed that 81.3% (n=147) of the respondents have no experience with non-interventional studies. Physicians‘ willingness to participate in NIS in the future is high and independent of their previous experience. The main barriers hindering conducting NIS in Bulgaria are related to organization, the conduct and the design of the trials, and, sometimes, the investigators‘ concerns. There is a need for proper training of the researchers and expanding healthcare resources to grow the NIS sector in Bulgaria in line with the tendencies in Europe.


Author(s):  
Flora S. Tsai

This paper proposes probabilistic models for social media mining based on the multiple attributes of social media content, bloggers, and links. The authors present a unique social media classification framework that computes the normalized document-topic matrix. After comparing the results for social media classification on real-world data, the authors find that the model outperforms the other techniques in terms of overall precision and recall. The results demonstrate that additional information contained in social media attributes can improve classification and retrieval results.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Rainer Schnell ◽  
Jonas Klingwort ◽  
James M. Farrow

Abstract Background We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between $$L_1$$ L 1 and $$L_2$$ L 2 could be computed even when $$L_2$$ L 2 does not exist at $$t_1$$ t 1 and $$L_1$$ L 1 has been deleted at $$t_2$$ t 2 . An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ($$n=850$$ n = 850 ) and residential locations ($$n=13,000$$ n = 13 , 000 ). The method has already been used in a real-world application. Results ISGP yields very accurate results. Our simulation study showed that—with appropriately chosen parameters – 99 % accuracy in the approximated distances is achieved. Conclusion We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.


Author(s):  
Flora S. Tsai

This paper proposes probabilistic models for social media mining based on the multiple attributes of social media content, bloggers, and links. The authors present a unique social media classification framework that computes the normalized document-topic matrix. After comparing the results for social media classification on real-world data, the authors find that the model outperforms the other techniques in terms of overall precision and recall. The results demonstrate that additional information contained in social media attributes can improve classification and retrieval results.


2021 ◽  
Vol 263 (3) ◽  
pp. 3085-3096
Author(s):  
Francisco Soares ◽  
Frederico Pereira ◽  
Emanuel Silva ◽  
Carlos Silva ◽  
Emanuel Sousa ◽  
...  

Recently, several studies on pedestrian safety and particularly those addressing pedestrian crossing behaviour and decision-making, have been performed using virtual reality systems. The use of simulators to assess pedestrian behaviour is conditioned by the feeling of presence and immersion, for which the sound is a determining factor. This paper presents an implementation procedure in which tyre-road noise samples are auralized and presented as auditory stimuli in a virtual environment, for assessing pedestrian crossing decision-making. The auditory samples obtained through the Close Proximity (CPX) method and subsequently auralized to represent Controlled Pass-By (CPB) sounds reproduce the sounds of a vehicle approaching a crosswalk. The auralized sounds together with the presentation of visual stimuli composed an experiment which was carried out with 30 participants. Safety indicators, as the time-to-passage at the moment that participants decided to cross a virtual crosswalk and the minimum time-to-collision were registered and compared with data obtained in real-world road crossings. A comparison with real world data points to a close alignment between results obtained in virtual and real environments, indicating a good suitability of the approach for studying pedestrian crossing behaviour.


2019 ◽  
Vol 13 (6) ◽  
pp. 995-1000 ◽  
Author(s):  
David C. Klonoff ◽  
Alberto Gutierrez ◽  
Alexander Fleming ◽  
David Kerr

Randomized clinical trials (RCTs) are no longer the sole source of data to inform guidelines, regulatory, and policy decisions. Real-world data (RWD), collected from registries, electronic health records, insurance claims, pharmacy records, social media, and sensor outputs from devices form real-world evidence (RWE), which can supplement evidence from RCTs. Benefits of using RWE include less time and cost to produce meaningful data; the ability to capture additional information, including social determinants of health that can impact health outcomes; detection of uncommon adverse events; and the potential to apply machine learning and artificial intelligence to the delivery of health care. Overall, combining data from RCTs and RWE would allow regulators to make ongoing and more evidence-based decisions in approving and monitoring products for diabetes.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e19270-e19270
Author(s):  
Donna R. R Rivera ◽  
Laura Lasiter ◽  
Jennifer Christian ◽  
Lindsey Enewold ◽  
Janet L. Espirito ◽  
...  

e19270 Background: Friends of Cancer Research convened 9 data partners to identify data elements and common definitions for real world (rw) endpoints to evaluate populations typically excluded from clinical trials. Here we report on rwOS by frontline treatment and comorbidities. Methods: A retrospective observational analysis of patients with aNSCLC initiating frontline platinum doublet chemotherapy (chemo) or PD-(L)1-based immuno-oncologic (IO) therapy (monotherapy or chemo combination) between 1 Jan 2011 to 31 Mar 2018 was conducted using administrative claims, EHR, and cancer registry RWD. We evaluated rwOS from frontline therapy initiation using Kaplan-Meier methods, stratified by ECOG status, brain metastases (ICD), history of chronic kidney or liver disease (CKD/ CLD, ICD), and evidence of kidney or liver dysfunction (KD/ LD, lab-based). Results: A total of 33,649 patients were included (N 972-17,454) with 10 to 26% of patients receiving IO as frontline therapy. There was a broad range of comorbidity prevalence across datasets and patients with evidence of comorbidity had comparatively shorter 12-month OS (Table). Conclusions: RWD analyses can generate expanded evidence on patient outcomes for populations routinely excluded from clinical trials and may help inform decision making where sparse data exist on appropriate treatment approaches. Additional understanding of data missingness, sensitivity of definitions, and covariate adjustment are needed to make direct comparisons across regimens and data sources. [Table: see text]


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e19311-e19311
Author(s):  
Lawrence H. Kushi ◽  
Laura Lasiter ◽  
Andrew J. Belli ◽  
Marley Boyd ◽  
Suanna S. Bruinooge ◽  
...  

e19311 Background: Leveraging data from a collaboration with 9 data partners, Friends of Cancer Research convened the Real-world Evidence Pilot 2.0, to examine trends and real world (rw) data endpoints in immunotherapy (IO) use for the front line treatment of aNSCLC. Methods: This study leveraged parallel analyses of rw data elements across heterogenous data sources (EHR, administrative claims, and registry) to: a) describe trends in uptake and use of novel IO frontline therapy after advanced diagnosis in NSCLC patients treated in usual care settings and b) examine associations between treatment and rw outcomes at one-year follow-up. The proportion of patients treated on each regimen (IO single agent, chemo, or IO + chemo) from 2011 through 2017 were calculated. Analysis included proportion of patients across treatment regimens stratified by year to describe post approval uptake of IO. Kaplan-Meier survival estimates were reported to adjust for follow-up time and stratified by PD-L1 status and stage. Results: Seven datasets identified a range of 999 to 4617 patients per dataset for this analysis. Across datasets, 2508, 3446, and 4176 patients initiated treatment in 2015, 2016, and 2017, respectively. No patients received IO or IO + chemo regimens prior to 2015. Initial approvals for IO use in aNSCLC occurred in October 2015 and for first line in metastatic NSCLC in October 2016. When examining survival at 1 year, overall, OS in PD-(L)1 + patients appeared longer than those with a PD-(L)1 - status. Conclusions: RWE analyses may reveal important trends in clinical cancer patient care including patterns of off-label use. The heterogeneity in the timing of IO uptake across datasets ranged from immediately after approval to ~12 months post-approval. [Table: see text]


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18813-e18813
Author(s):  
Samir Courdy ◽  
Mark Hulse ◽  
Sorena Nadaf ◽  
Allen Mao ◽  
Alex Pozhitkov ◽  
...  

e18813 Background: The City of Hope Center for Precision Medicine developed an enterprise-wide platform and precision medicine program to unlock the research potential and clinical value of complex and unique datasets by combining patient data with comprehensive genomic profiling and proprietary analytics. POSEIDON (Precision Oncology Software Environment Interoperable Data Ontologies Network) is a secure, cloud-based Oncology Insights Engine enabling exploration, analysis, visualization, and collaboration on our patient clinico-genomic data along with public data sources. This platform enables investigators to access and visualize data from clinical and multi-omics data and provides an engine that can be utilized for cohort discovery and exploration, preliminary feasibility testing to deriving patient specific insights based on real world data (RWD) and real-world evidence (RWE). Patients are consented through an IRB-approved protocol with active, opt-in participation. Methods: The POSEIDON Common Data Model (PCDM) is a standard, extensible data schema that incorporates patient data to support Precision Medicine. Data are incorporated from disparate data sources and stored in a combined harmonized manner promoting consistency of data and meaning across downstream applications. A multi-step process was created to capture and structure multiple data types into the PCDM. Natural language processing (NLP) tools are deployed to automate and structure valuable data elements from unstructured documents including pathology reports and clinical notes. NLP augmented software tools were developed to assist manual data abstractors to capture more complex terms and disease specific data elements which can include disease progression, progression free survival, and other outcomes. Results: Comprehensive data from 175,000 City of Hope patients are included within this environment for cohort exploration, longitudinal follow-up, outcomes, hypothesis development, and queries for synthetic controls. Data from disease specific-research registries constitute a rich dataset within POSEIDON by disease and tumor type, including lung cancer, colorectal cancer, breast cancer, leukemia, lymphoma and multiple myeloma, among other disease types. Automated genomic workflows were created to gain access to genomic profiling and whole exome sequencing. Genomic data is associated with the clinical data in the PCDM. Automated data flows from the Enterprise Data Warehouse EDW include data that is captured in discrete formats in the EDW and provided for in the PCDM and further enrich the data that flows from the disease registries. Statistically rigorous methods for de-identification are applied for collaborative studies. Conclusions: The City of Hope Center for Precision Medicine and the POSEIDON platform offer an exceptional resource for collaborative RWD & RWE studies.


Sign in / Sign up

Export Citation Format

Share Document