Accelerated Local Anomaly Detection via Resolving Attributed Networks

Attributed networks, in which network connectivity and node attributes are available, have been increasingly used to model real-world information systems, such as social media and e-commerce platforms. While outlier detection has been extensively studied to identify anomalies that deviate from certain chosen background, existing algorithms cannot be directly applied on attributed networks due to the heterogeneous types of information and the scale of real-world data. Meanwhile, it has been observed that local anomalies, which may align with global condition, are hard to be detected by existing algorithms with interpretability. Motivated by the observations, in this paper, we propose to study the problem of effective and efficient local anomaly detection in attributed networks. In particular, we design a collective way for modeling heterogeneous network and attribute information, and develop a novel and efficient distributed optimization algorithm to handle large-scale data. In the experiments, we compare the proposed framework with the state-of-the-art methods on both real and synthetic datasets, and demonstrate its effectiveness and efficiency through quantitative evaluation and case studies.

Download Full-text

Density-preserving projections for large-scale local anomaly detection

Knowledge and Information Systems ◽

10.1007/s10115-011-0430-4 ◽

2011 ◽

Vol 32 (1) ◽

pp. 25-52 ◽

Cited By ~ 26

Author(s):

Timothy de Vries ◽

Sanjay Chawla ◽

Michael E. Houle

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Local Anomaly

Download Full-text

A Real-time Dynamic Simulation Scheme for Large-Scale Flood Hazard Using 3D Real World Data

2007 11th International Conference Information Visualization (IV '07) ◽

10.1109/iv.2007.15 ◽

2007 ◽

Cited By ~ 6

Author(s):

C Wang ◽

T. R. Wan ◽

I. J. Palmer

Keyword(s):

Real Time ◽

Dynamic Simulation ◽

Real World ◽

Large Scale ◽

Flood Hazard ◽

Real World Data ◽

World Data ◽

Time Dynamic ◽

Simulation Scheme

Download Full-text

Proton Pump Inhibitors and Risk of Dementia: A Hypothesis Generated but Not Adequately Tested

American Journal of Alzheimer s Disease & Other Dementias® ◽

10.1177/15333175211062413 ◽

2021 ◽

Vol 36 ◽

pp. 153331752110624

Author(s):

Mishah Azhar ◽

Lawrence Fiedler ◽

Patricio S. Espinosa ◽

Charles H. Hennekens

Keyword(s):

Proton Pump Inhibitors ◽

Proton Pump ◽

Real World ◽

Large Scale ◽

Basic Research ◽

The United States ◽

Epidemiological Studies ◽

Real World Data ◽

Health Authorities ◽

Public Health Authorities

We reviewed the evidence on proton pump inhibitors (PPIs) and dementia. PPIs are among the most widely utilized drugs in the world. Dementia affects roughly 5% of the population of the United States (US) and world aged 60 years and older. With respect to PPIs and dementia, basic research has suggested plausible mechanisms but descriptive and analytic epidemiological studies are not inconsistent. In addition, a single large-scale randomized trial showed no association. When the evidence is incomplete, it is appropriate for clinicians and researchers to remain uncertain. Regulatory or public health authorities sometimes need to make real-world decisions based on real-world data. When the evidence is complete, then the most rational judgments for individual patients the health of the general public are possible At present, the evidence on PPIs and dementia suggests more reassurance than alarm. Further large-scale randomized evidence is necessary to do so.

Download Full-text

Heart Snapshot: a broadly validated smartphone measure of VO2max for collection of real world data

10.1101/2020.07.02.185314 ◽

2020 ◽

Author(s):

Dan E. Webster ◽

Meghasyam Tummalacherla ◽

Michael Higgins ◽

David Wing ◽

Euan Ashley ◽

...

Keyword(s):

Real World ◽

Gold Standard ◽

Large Scale ◽

Digital Health ◽

Clinical Care ◽

Skin Pigmentation ◽

Epidemiologic Studies ◽

Step Test ◽

Real World Data ◽

Laboratory Equipment

AbstractExpanding access to precision medicine will increasingly require that patient biometrics can be measured in remote care settings. VO2max, the maximum volume of oxygen usable during intense exercise, is one of the most predictive biometric risk factors for cardiovascular disease, frailty, and overall mortality.1,2 However, VO2max measurements are rarely performed in clinical care or large-scale epidemiologic studies due to the high cost, participant burden, and need for specialized laboratory equipment and staff.3,4 To overcome these barriers, we developed two smartphone sensor-based protocols for estimating VO2max: a generalization of a 12-minute run test (12-MRT) and a submaximal 3-minute step test (3-MST). In laboratory settings, Lins concordance for these two tests relative to gold standard VO2max testing was pc=0.66 for 12-MRT and pc=0.61 for 3-MST. Relative to “silver standards”5 (Cooper/Tecumseh protocols), concordance was pc=0.96 and pc=0.94, respectively. However, in remote settings, 12-MRT was significantly less concordant with gold standard (pc=0.25) compared to 3-MST (pc=0.61), though both had high test-retest reliability (ICC=0.88 and 0.86, respectively). These results demonstrate the importance of real-world evidence for validation of digital health measurements. In order to validate 3-MST in a broadly representative population in accordance with the All of Us Research Program6 for which this measurement was developed, the camera-based heart rate measurement was investigated for potential bias. No systematic measurement error was observed that corresponded to skin pigmentation level, operating system, or cost of the phone used. The smartphone-based 3-MST protocol, here termed Heart Snapshot, maintained fidelity across demographic variation in age and sex, across diverse skin pigmentation, and between iOS and Android implementations of various smartphone models. The source code for these smartphone measurements, along with the data used to validate them,6 are openly available to the research community.

Download Full-text

A Joint Learning Approach to Intelligent Job Interview Assessment

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/492 ◽

2018 ◽

Cited By ~ 9

Author(s):

Dazhong Shen ◽

Hengshu Zhu ◽

Chen Zhu ◽

Tong Xu ◽

Chao Ma ◽

...

Keyword(s):

Real World ◽

Latent Variable ◽

Large Scale ◽

Real World Data ◽

Job Interviews ◽

Variable Model ◽

Job Interview ◽

Joint Learning ◽

Interview Process ◽

The Right

The job interview is considered as one of the most essential tasks in talent recruitment, which forms a bridge between candidates and employers in fitting the right person for the right job. While substantial efforts have been made on improving the job interview process, it is inevitable to have biased or inconsistent interview assessment due to the subjective nature of the traditional interview process. To this end, in this paper, we propose a novel approach to intelligent job interview assessment by learning the large-scale real-world interview data. Specifically, we develop a latent variable model named Joint Learning Model on Interview Assessment (JLMIA) to jointly model job description, candidate resume and interview assessment. JLMIA can effectively learn the representative perspectives of different job interview processes from the successful job application records in history. Therefore, a variety of applications in job interviews can be enabled, such as person-job fit and interview question recommendation. Extensive experiments conducted on real-world data clearly validate the effectiveness of JLMIA, which can lead to substantially less bias in job interviews and provide a valuable understanding of job interview assessment.

Download Full-text

Exploring the Feasibility of Using Real-World Data from a Large Clinical Data Research Network to Simulate Clinical Trials of Alzheimer’s Disease

10.1101/2020.06.03.20121491 ◽

2020 ◽

Author(s):

Zhaoyi Chen ◽

Hansi Zhang ◽

Yi Guo ◽

Thomas J George ◽

Mattia Prosperi ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Clinical Trials ◽

Clinical Data ◽

Real World ◽

Large Scale ◽

Research Network ◽

Real World Data ◽

World Data ◽

Trial Simulation

AbstractClinical trials are essential but often have high financial costs and long execution time. Trial simulation using real world data (RWD) could potentially provide insights on a treatment’s efficacy and safety before running a large-scale trial. In this work, we explored the feasibility of using RWD from a large clinical data research network to simulate a randomized controlled trial of Alzheimer’s disease considering two different scenarios: an one-arm simulation of the standard-of-care control arm; and a two-arm simulation comparing treatment safety between the intervention and control arms with proper patient matching algorithms. We followed original trial’s design and addressed some key questions, including how to translate trial criteria to database queries and establish measures of safety (i.e., serious adverse events) from RWD. Our simulation generated results comparable to the original trial, but also exposed gaps in both trial simulation methodology and the generalizability issue of clinical trials.

Download Full-text

Predicting individual risk for COVID19 complications using EMR data

10.1101/2020.06.03.20121574 ◽

2020 ◽

Cited By ~ 2

Author(s):

Yaron Kinar ◽

Alon Lanyado ◽

Avi Shoshan ◽

Rachel Yesharim ◽

Tamar Domany ◽

...

Keyword(s):

High Risk ◽

Real World ◽

Large Scale ◽

Predictive Analytics ◽

Epidemiological Data ◽

Added Value ◽

Individual Risk ◽

Real World Data ◽

Mesh Terms ◽

Age And Sex

AbstractBackgroundThe global pandemic of COVID-19 has challenged healthcare organizations and caused numerous deaths and hospitalizations worldwide. The need for data-based decision support tools for many aspects of controlling and treating the disease is evident but has been hampered by the scarcity of real-world reliable data. Here we describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data.MethodsWe have utilized the computerized data of Maccabi Healthcare Services a 2.3 million member state-mandated health organization in Israel. The age and sex matched matrix used for training the XGBoost ILI-based model included, circa 690,000 rows and 900 features. The available dataset for COVID-based model included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n = 1658), or hospitalized and marked as mild (n = 332), or as having moderate (n = 83) or severe (n = 64) complications.FindingsThe AUC of our models and the priors on the 2137 COVID-19 patients for predicting moderate and severe complications as cases and all other as controls, the AUC for the ILI-based model was 0.852[0.824–0.879] for the COVID19-based model – 0.872[0.847–0.879].InterpretationThese models can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen.FundingThere was no funding for this studyResearch in contextEvidence before this studyWe have search PubMed for coronavirus[MeSH Major Topic] AND the following MeSH terms: risk score, predictive analytics, algorithm, predictive analytics. Only few studies were found on predictive analytics for developing COVID19 complications using real-world data. Many of the relevant works were based on self-reported information and are therefore difficult to implement at large scale and without patient or physician participation.Added value of this studyWe have described two models for assessing risk of COVID-19 complications and mortality, based on EMR data. One model was derived by combining a machine-learning model for influenza-complications with epidemiological data for age and sex dependent mortality rates due to COVID-19. The other was directly derived from initial COVID-19 complications data.Implications of all the available evidenceThe developed models may effectively identify patients at high-risk for developing COVID19 complications. Implementing such models into operational data systems may support COVID-19 care workflows and assist in triaging patients.

Download Full-text

Real-world longitudinal data collected from the SleepHealth mobile app study

Scientific Data ◽

10.1038/s41597-020-00753-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Sean Deering ◽

Abhishek Pratap ◽

Christine Suver ◽

A. Joseph Borelli ◽

Adam Amdur ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Digital Health ◽

The United States ◽

Mobile App ◽

Environmental Data ◽

Sleep Habits ◽

Real World Data ◽

Disease Manifestation ◽

Novel Approach

AbstractConducting biomedical research using smartphones is a novel approach to studying health and disease that is only beginning to be meaningfully explored. Gathering large-scale, real-world data to track disease manifestation and long-term trajectory in this manner is quite practical and largely untapped. Researchers can assess large study cohorts using surveys and sensor-based activities that can be interspersed with participants’ daily routines. In addition, this approach offers a medium for researchers to collect contextual and environmental data via device-based sensors, data aggregator frameworks, and connected wearable devices. The main aim of the SleepHealth Mobile App Study (SHMAS) was to gain a better understanding of the relationship between sleep habits and daytime functioning utilizing a novel digital health approach. Secondary goals included assessing the feasibility of a fully-remote approach to obtaining clinical characteristics of participants, evaluating data validity, and examining user retention patterns and data-sharing preferences. Here, we provide a description of data collected from 7,250 participants living in the United States who chose to share their data broadly with the study team and qualified researchers worldwide.

Download Full-text