Show Me Your Claims and I'll Tell You Your Offenses: Machine Learning-Based Decision Support for Fraud Detection on Medical Claim Data

ObjectiveWe describe the characteristics of patients with high medical costs by matching specific annual medical examination results and medical claim data. Clarifying the relationships between examination items and high medical costs allows the screening of high-risk persons.DesignA cross-sectional study.SubjectsSubjects were persons insured by national health insurance in Hiroshima City, Hiroshima Prefecture, from April 2016 to March 2017. To identify true heart failure (HF) patients, the disease name listed in the medical claim data was compared with drugs prescribed for HF, with extraction of only subjects whose comparative data matched.Data collection and analysisThe specific health examination includes a questionnaire on areas such as lifestyle habits, anthropometry, blood pressure, blood tests and urine tests. The percentage of the total medical costs related to the medical care of subjects with HF was described using Pareto analysis. For specific health examination items, we compared the high-cost and low-cost groups. The normality and homoscedasticity of each variable was checked and Student’s t-tests and χ² tests were applied. Finally, multiple logistic regression analysis was used to detect factors in the health examination items related to high medical costs.ResultsPareto analysis showed that 80% of all medical costs were paid by 30% of the HF patient population. The fees for cardiovascular surgery accounted for 54% of the total surgical cost, 64% of which included preventable diseases. Levels of creatinine (Cr) and γ-glutamyl transpeptidase (γ-GTP) and a history of smoking were found to be related to high medical costs.ConclusionAnalysis of specific health examination results for HF patients revealed the association between high medical costs, γ-GTP, Cr, and smoking. These results can thus serve as a reference for screening persons at high risk of HF and help prevent the exacerbation of HF.

Download Full-text

PCN157 METHODOLOGIC ISSUES OF IDENTIFYING FEBRILE NEUTROPENIA PATIENTS USING MEDICAL CLAIM DATA

Value in Health ◽

10.1016/s1098-3015(10)72245-2 ◽

2010 ◽

Vol 13 (3) ◽

pp. A53-A54

Author(s):

SL Michels ◽

R Barron

Keyword(s):

Febrile Neutropenia ◽

Claim Data ◽

Medical Claim ◽

Medical Claim Data

Download Full-text

MIHARI project, a preceding study of MID-NET, adverse event detection database of Ministry Health of Japan—Validation study of the signal detection of adverse events of drugs using export data from EMR and medical claim data

PLoS ONE ◽

10.1371/journal.pone.0255863 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0255863

Author(s):

Hiroshi Watanabe ◽

Kiyoteru Takenouchi ◽

Michio Kimura

Keyword(s):

Renal Failure ◽

Acute Renal Failure ◽

Adverse Events ◽

Medical Records ◽

Claim Data ◽

Prescription Data ◽

Laboratory Examination ◽

Medical Claim ◽

Predictive Values ◽

Medical Claim Data

We studied the effectiveness of the direct data collection from electronic medical records (EMR) when it is used for monitoring adverse drug events and also detection of already known adverse events. In this study, medical claim data and SS-MIX2 standardized storage data were used to identify four diseases (diabetes, dyslipidemia, hyperthyroidism, and acute renal failure) and the validity of the outcome definitions was evaluated by calculating positive predictive values (PPV). The maximum positive predictive value (PPV) for diabetes based on medical claim data was 40.7% and that based on prescription data from SS-MIX2 Standardized Storage was 44.7%. The PPV for dyslipidemia was 50% or higher under either of the conditions. The PPV for hyperthyroidism based on disease name data alone was 20–30%, but exceeded 60% when prescription data was included in the evaluation. Acute renal failure was evaluated using information from medical records in addition to the data. The PPV for acute renal failure based on the data of disease names and laboratory examination results was slightly higher at 53.7% and increased to 80–90% when patients who previously had a high serum creatinine (Cre) level were excluded. When defining a disease, it is important to include the condition specific to the disease; furthermore, it is very useful if laboratory examination results are also included. Therefore, the inclusion of laboratory examination results in the definitions, as in the present study, was considered very useful for the analysis of multi-center SS-MIX2 standardized storage data.

Download Full-text

A Fraud Detection Decision Support System via Human On-Line Behavior Characterization and Machine Learning

2018 First International Conference on Artificial Intelligence for Industries (AI4I) ◽

10.1109/ai4i.2018.8665694 ◽

2018 ◽

Author(s):

Gian Antonio Susto ◽

Matteo Terzi ◽

Chiara Masiero ◽

Simone Pampuri ◽

Andrea Schirru

Keyword(s):

Machine Learning ◽

Decision Support ◽

Decision Support System ◽

Support System ◽

Fraud Detection ◽

On Line

Download Full-text

Multimorbidity among Two Million Adults in China

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17103395 ◽

2020 ◽

Vol 17 (10) ◽

pp. 3395 ◽

Cited By ~ 2

Author(s):

Xiaowen Wang ◽

Shanshan Yao ◽

Mengying Wang ◽

Guiying Cao ◽

Zishuo Chen ◽

...

Keyword(s):

Association Rule ◽

Hierarchical Cluster ◽

Claim Data ◽

Middle Aged ◽

Rule Mining ◽

Medical Claim ◽

Medical Claim Data ◽

Prevalent Disease ◽

Disease Pair ◽

Older Chinese

To explore the multimorbidity prevalence and patterns among middle-aged and older adults from China. Data on thirteen chronic diseases were collected from 2,097,150 participants aged over 45 years between January 1st 2011 and December 31st 2015 from Beijing Medical Claim Data for Employees. Association rule mining and hierarchical cluster analysis were applied to assess multimorbidity patterns. Multimorbidity prevalence was 51.6% and 81.3% in the middle-aged and older groups, respectively. The most prevalent disease pair was that of osteoarthritis and rheumatoid arthritis (OARA) with hypertension (HT) (middle-aged: 22.5%; older: 41.8%). Ischaemic heart disease (IHD), HT, and OARA constituted the most common triad combination (middle-aged: 11.0%; older: 31.2%). Among the middle-aged group, the strongest associations were found in a combination of cerebrovascular disease (CBD), OARA, and HT with IHD in males (lift = 3.49), and CBD, OARA, and COPD with IHD in females (lift = 3.24). Among older patients, glaucoma and cataracts in females (lift = 2.95), and IHD, OARA, and glaucoma combined with cataracts in males (lift = 2.45) were observed. Visual impairment clusters, a mixed cluster of OARA, IHD, COPD, and cardiometabolic clusters were detected. Multimorbidity is prevalent among middle-aged and older Chinese individuals. The observations of multimorbidity patterns have implications for improving preventive care and developing appropriate guidelines for morbidity treatment.

Download Full-text

Healthcare Costs Associated with Complications in Patients with Type 2 Diabetes among 1.85 Million Adults in Beijing, China

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18073693 ◽

2021 ◽

Vol 18 (7) ◽

pp. 3693

Author(s):

Jun-Hui Wu ◽

Yao Wu ◽

Zi-Jing Wang ◽

Yi-Qun Wu ◽

Tao Wu ◽

...

Keyword(s):

Type 2 Diabetes ◽

Healthcare Costs ◽

Estimating Equation ◽

Claim Data ◽

Equation Model ◽

First Year ◽

Diabetic Patients ◽

Medical Claim ◽

Medical Claim Data

We aimed to provide reliable regression estimates of expenditures associated with various complications in type 2 diabetics in China. In total, 1,859,039 type 2 diabetes patients with complications were obtained from the Beijing Medical Claim Data for Employees database from 2008 to 2016. We estimated costs for complications using a generalized estimating equation model adjusted for age, sex, and the incidence of various complications. The average total cost for diabetic patients with complications was 17.12 thousand RMB. Prescribed drugs accounted for 63.4% of costs. We observed a significant increase in costs in the first year after the onset of complications. Compared with costs before the incidence of complications, the additional costs per person in the first year and >1 year after the event would be 10,631.16 RMB and 1150.71 RMB for cardiovascular disease, 1017.62 RMB and 653.82 RMB for cerebrovascular disease, and 301.14 RMB and 624.00 RMB for kidney disease, respectively. The estimated coefficients for outpatient visits were relatively lower than those of inpatient visits. Complications in diabetics exert a significant impact on total healthcare costs in the first year of their onset and in subsequent years. Our estimates may assist policymakers in quantifying the economic burden of diabetes complications.

Download Full-text

Medical Concept Representation Learning from Multi-source Data

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/680 ◽

2019 ◽

Author(s):

Tian Bai ◽

Brian L. Egleston ◽

Richard Bleicher ◽

Slobodan Vucetic

Keyword(s):

Language Processing ◽

Representation Learning ◽

Claim Data ◽

Medical Provider ◽

Medical Claim ◽

Medical Claims ◽

Medical Claim Data ◽

Reference Problem ◽

Low Dimensional ◽

Vector Representations

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

Download Full-text

Dynamic prediction of hospital admission with medical claim data

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0734-y ◽

2019 ◽

Vol 19 (S1) ◽

Cited By ~ 2

Author(s):

Tianzhong Yang ◽

Yang Yang ◽

Yugang Jia ◽

Xiao Li

Keyword(s):

Hospital Admission ◽

Claim Data ◽

Medical Claim ◽

Dynamic Prediction ◽

Medical Claim Data

Download Full-text

Review of classification studies for machine learning in the development of intelligent management decision support systems

Technology of technosphere safety ◽

10.25257/tts.2020.3.89.20-29 ◽

2020 ◽

Vol 89 ◽

pp. 20-29

Author(s):

Sh. K. Kadiev ◽

◽

R. Sh. Khabibulin ◽

P. P. Godlevskiy ◽

V. L. Semikov ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Mathematical Models ◽

Decision Support Systems ◽

Support Systems ◽

Management Decision ◽

Classification Methods ◽

Advantages And Disadvantages ◽

Intelligent Management ◽

Management Decision Support

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.

Download Full-text