scholarly journals Social disparities in the first wave of COVID-19 infections in Germany: A county-scale explainable machine learning approach

Author(s):  
Gabriele Doblhammer ◽  
Constantin Reinke ◽  
Daniel Kreft

AbstractBackgroundLittle is known about factors correlated with this geographic spread of the first wave of COVID-19 infections in Germany. Given the lack of individual-level socioeconomic information on COVID-19 cases, we resorted to an ecological study design, exploring regional correlates of COVID-19 diagnoses.Data and MethodWe used data from the Robert-Koch-Institute on COVID-19 diagnoses by sex, age (age groups: 0-4, 5-14, 15-34, 35-59, 60-79, 80+), county (NUTS3 region) differentiating five periods (initial phase: through 15 March; 1st lockdown period: 16 March to 31 March; 2nd lockdown period: from 1 April to 15 April; easing period: 16 April to 30 April; post-lockdown period: 1 May through 23 July). For each period we calculated age-standardized incidence of COVID-19 diagnoses on the county level, using the German age distribution from the year 2018. We characterized the regions by macro variables in nine domains: “Demography”, “Employment”, “Politics, religion, and education”, “Income”, “Settlement structure and environment”, “Health care”, “(structural) Poverty”, “Interrelationship with other regions”, and “Geography”. We trained gradient boosting models to predict the age-standardized incidence rates with the macro structures of the counties, and used SHAP values to characterize the 20 most prominent features in terms of negative/positive correlations with the outcome variable.ResultsThe change in the age-standardized incidence rates over time is reflected in the changing importance of features as indicated by the mean SHAP values for the five periods. The first COVID-19 wave started as a disease in wealthy rural counties in southern Germany, and ventured into poorer urban and agricultural counties during the course of the first wave. The negative social gradient became more pronounced from the 2nd lockdown period onwards, when wealthy counties appeared to be better protected. Population density per se does not appear to be a risk factor, and only in the post-lockdown period did connectedness become an important regional characteristic correlated with higher infections. Features related to economic and educational characteristics of the young population in a county played an important role at the beginning of the pandemic up to the 2nd lockdown phase, as did features related to the population living in nursing homes; those related to international migration and a large proportion of foreigners living in a county became important in the post-lockdown period.DiscussionIn the absence of individual level data, explainable machine learning methods based on regional data may help to better understand the changing nature of the drivers of the pandemic. High mobility of high SES groups may drive the pandemic at the beginning of waves, while mitigation measures and beliefs about the seriousness of the pandemic as well as the compliance with mitigation measures put lower SES groups at higher risks later on.

2021 ◽  
Author(s):  
Zhe Zheng ◽  
Virginia E. Pitzer ◽  
Eugene D. Shapiro ◽  
Louis J. Bont ◽  
Daniel M. Weinberger

Importance: Respiratory syncytial virus (RSV) is a leading cause of hospitalizations in young children. RSV largely disappeared in 2020 due to precautions taken because of the COVID-19 pandemic. Projecting the timing and intensity of the re-emergence of RSV and the age groups affected is crucial for planning for the administration of prophylactic antibodies and anticipating hospital capacity. Objective: To project the potential timing and intensity of re-emergent RSV epidemics in different age groups. Design, Setting, Participants: Mathematical models were used to reproduce the annual RSV epidemics before the COVID-19 pandemic in New York and California. These models were modified to project the trajectory of RSV epidemics in 2020-2025 under different scenarios with varying stringency of mitigation measures for SARS-CoV-2: 1) constant low RSV transmission rate from March 2020 to March 2021; 2) an immediate decrease in RSV transmission in March 2020 followed by a gradual increase in transmission until April 2021; 3) a decrease in non-household contacts from April to July 2020. Simulations also evaluated factors likely to impact the re-emergence of RSV epidemics, including introduction of virus from out-of-state sources and decreased transplacentally acquired immunity in infants. Main Outcomes and Measures: The primary outcome of this study was defined as the predicted number of RSV hospitalizations each month in the entire population. Secondary outcomes included the age distribution of hospitalizations among children <5 years of age, incidence of any RSV infection, and incidence of RSV lower respiratory tract infection (LRI). Results: In the 2021-2022 RSV season, we expect that the lifting of mitigation measures and build-up of susceptibility will lead to a larger-than-normal RSV outbreak. We predict an earlier-than-usual onset in the upcoming RSV season if there is substantial external introduction of RSV. Among children 1-4 years of age, the incidence of RSV infections could be twice that of a typical RSV season, with infants <6 months of age having the greatest seasonal increase in the incidence of both severe RSV LRIs and hospitalizations. Conclusions and Relevance: Pediatric departments, including pediatric intensive care units, should be alert to large RSV outbreaks. Enhanced surveillance is required for both prophylaxis administration and hospital capacity management.


2021 ◽  
Vol 4 ◽  
Author(s):  
Fiona Leonard ◽  
John Gilligan ◽  
Michael J. Barrett

Introduction: Patients boarding in the Emergency Department can contribute to overcrowding, leading to longer waiting times and patients leaving without being seen or completing their treatment. The early identification of potential admissions could act as an additional decision support tool to alert clinicians that a patient needs to be reviewed for admission and would also be of benefit to bed managers in advance bed planning for the patient. We aim to create a low-dimensional model predicting admissions early from the paediatric Emergency Department.Methods and Analysis: The methodology Cross Industry Standard Process for Data Mining (CRISP-DM) will be followed. The dataset will comprise of 2 years of data, ~76,000 records. Potential predictors were identified from previous research, comprising of demographics, registration details, triage assessment, hospital usage and past medical history. Fifteen models will be developed comprised of 3 machine learning algorithms (Logistic regression, naïve Bayes and gradient boosting machine) and 5 sampling methods, 4 of which are aimed at addressing class imbalance (undersampling, oversampling, and synthetic oversampling techniques). The variables of importance will then be identified from the optimal model (selected based on the highest Area under the curve) and used to develop an additional low-dimensional model for deployment.Discussion: A low-dimensional model comprised of routinely collected data, captured up to post triage assessment would benefit many hospitals without data rich platforms for the development of models with a high number of predictors. Novel to the planned study is the use of data from the Republic of Ireland and the application of sampling techniques aimed at improving model performance impacted by an imbalance between admissions and discharges in the outcome variable.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e14069-e14069
Author(s):  
Oguz Akbilgic ◽  
Ibrahim Karabayir ◽  
Hakan Gunturkun ◽  
Joseph F Pierre ◽  
Ashley C Rashe ◽  
...  

e14069 Background: There is growing interest in the links between cancer and the gut microbiome. However, the effect of chemotherapy upon the gut microbiome remains unknown. We studied whether machine learning can: 1) accurately classify subjects with cancer vs healthy controls and 2) whether this classification model is affected by chemotherapy exposure status. Methods: We used the American Gut Project data to build a extreme gradient boosting (XGBoost) model to distinguish between subjects with cancer vs healthy controls using data on simple demographics and published microbiome. We then further explore the selected features for cancer subjects based on chemotherapy exposure. Results: The cohort included 7,685 subjects consisting of 561 subjects with cancer, 52.5% female, 87.3% White, and average age of 44.7 (SD 17.7). The binary outcome variable represents cancer status. Among 561 subjects with cancer, 94 of them were treated with chemotherapy agents before sampling of microbiomes. As predictors, there were four demographic variables (sex, race, age, BMI) and 1,812 operational taxonomic units (OTUs) each found in at least 2 subjects via RNA sequencing. We randomly split data into 80% training and 20% hidden test. We then built an XGBoost model with 5-fold cross-validation using only training data yielding an AUC (with 95% CI) of 0.79 (0.77, 0.80) and obtained the almost the same AUC on the hidden test data. Based on feature importance analysis, we identified 12 most important features (Age, BMI and 12 OTUs; 4C0d-2, Brachyspirae, Methanosphaera, Geodermatophilaceae, Bifidobacteriaceae, Slackia, Staphylococcus, Acidaminoccus, Devosia, Proteus) and rebuilt a model using only these features and obtained AUC of 0.80 (0.77, 0.83) on the hidden test data. The average predicted probabilities for controls, cancer patients who were exposed to chemotherapy, and cancer patients who were not were 0.071 (0.070,0.073), 0.125 (0.110, 0.140), 0.156 (0.148, 0.164), respectively. There was no statistically significant difference on levels of these 12 OTUs between cancer subjects treated with and without chemotherapy. Conclusions: Machine learning achieved a moderately high accuracy identifying patients’ cancer status based on microbiome. Despite the literature on microbiome and chemotherapy interaction, the levels of 12 OTUs used in our model were not significantly different for cancer patients with or without chemotherapy exposure. Testing this model on other large population databases is needed for broader validation.


2001 ◽  
Vol 127 (3) ◽  
pp. 501-507 ◽  
Author(s):  
M. P. MUÑOZ ◽  
A. DOMÍNGUEZ ◽  
L. SALLERAS

Varicella is a disease caused by varicella-zoster virus. It is transmitted via the respiratory route, is highly communicable and mainly affects young children. An effective vaccine is now available, whose routine use is advised by health authorities in the USA and which can prevent severe disease, although breakthrough infections do occur. In deciding whether or not to include a vaccine in the routine vaccination schedule, knowledge of the morbidity of the disease in question is fundamental. Although reporting of varicella is compulsory in Catalonia, doctors only have to report the weekly number of cases diagnosed, and not their age distribution. Given that recent data on the prevalence of the infection in Catalonia according to age groups is available, it was considered that, using these data, an estimation of age-related incidence could be made.The objective of the present study was to estimate the incidence of varicella in Catalonia on the basis of the available seroprevalence data. A curve was fitted to the observed prevalence and point prevalence estimates for all ages were obtained. The incidence was derived by smoothed prevalence for each of these age groups. Estimated variance of the estimated incidence was obtained by the delta method. Predicted prevalence in the 0–4 years age group was calculated by the smoothed prevalence.The model that best fitted the sample prevalence was the exponential function. The estimated number of varicella cases in this study was 46419 (95% CI 40507–52270). As the population in Catalonia in 1996 was 6090040, the previous results give an incidence rate of 762·2 per 100000 persons/year with their 95% CI (666·1–858·3).The method described may be applied to the study of incidence rates in relation to the prevalence of diseases if we accept that the infection produces permanent immunity; the risk of mortality is the same for infected and non-infected subjects and that the disease incidence and population remain constant in time.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
L J Kjerpeseth ◽  
J Igland ◽  
R Selmer ◽  
H Ellekjaer ◽  
T Berge ◽  
...  

Abstract Background The reported incidence and prevalence of atrial fibrillation (AF) has been inconsistent among studies. Purpose We aimed to study time trends in incidence (first time) of AF hospitalizations or AF deaths in Norway in the period 2004–2014 by age and sex. Methods Nationwide hospital discharge diagnoses in the Cardiovascular Disease in Norway (CVDNOR) database and in the National Patient Registry were linked to the National Cause of Death Registry. All hospitalizations with AF as primary or secondary diagnosis and out-of-hospital deaths with AF as underlying cause (ICD-9: 427.3 or ICD 10: I48; AF or atrial flutter) in individuals ≥18 years were obtained during 1994–2014. Incident AF was defined as first hospitalization or out-of-hospital death due to AF with no previous hospitalization for AF the past 10 years. Age-standardized incidence rates with 95% confidence intervals (CIs) were calculated using direct standardization to the age-distribution in the Norwegian population per Jan 1st 2004. Age-adjusted average yearly incidence rate ratios (IRR) with 95% CIs were estimated by Poisson regression analyses. Accumulated prevalence during 1994–2014 was assessed in Norwegian residents 18 years and older per Dec 31st 2014. Results During 39,865,498 person years of follow up from 2004 to 2014 we identified 175,979 incident AF cases of which 30% were registered with AF as primary diagnosis, 69% as secondary diagnosis and 1% as out-of-hospital cause of death. The age-standardized incidence rate of AF hospitalization or out-of-hospital death per 100,000 person years was stable at 433 (426–440) in 2004 and 440 (433–447) in 2014. IRR were stable or declining across age groups of both sexes, except for the youngest age group 18–44 years, where incidence rates of AF hospitalization or out-of-hospital death increased by 2% per year, IRR 1.02 (1.01, 1.03). By 2014, the prevalence of AF assessed from hospital or death records was 2.9% in the adult population 18 years and older. Conclusion We found overall stable incidence rates of AF from 2004 to 2014 in the adult Norwegian population. Increased incidence rates of AF in the population 18–44 years are worrying and need further investigation. Acknowledgement/Funding The Norwegian Atrial Fibrillation Reseach Network


2019 ◽  
Vol 2 (1) ◽  
pp. 27-40 ◽  
Author(s):  
Md. Ali Hossain ◽  
Tania Akter Asa ◽  
Fazlul Huq ◽  
Mohammad Ali Moni

The incidence and treatment of common eye disorders in Bangladesh are poorly understood. This study aims to provide a comprehensive overview of this clinical challenge to better enable the design of appropriate healthcare strategies.  The incidence and treatment of common eye disorders in Bangladesh are poorly understood. This study aims to provide a comprehensive overview of this clinical challenge to better enable the design of appropriate healthcare strategies.  Different types of eye disorder data were collected from patients aged 1 to 96 years admitted for eye surgery from March 2016 to October 2016 (N = 2390) at the Bangladesh Eye Hospital in Dhaka, Bangladesh. Patient age distribution and types of treatment received were analysed, and incidence rates calculated.  Patients (58% male) underwent a total of 43 different types of eye surgeries. Among the surgeries reported 32.8% were Avastin intravitreal injections, 25.5% were Phaco with IOL, 14.6% were retinal laser therapies, 7.5% were YAG Laser and 6.5% were VR surgery. It is notable that a higher number of people suffered in ocular, cataract and retinal disorder respectively among all the eye disorders. With increasing patient age, the number of eye disorder treatments increased and it reached to peak number in the age group 56-60 years, although numbers varied greatly across age groups.


Stats ◽  
2019 ◽  
Vol 2 (3) ◽  
pp. 347-370
Author(s):  
Emad Mohamed ◽  
Sayed A. Mostafa

In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, and people of different age groups in terms of their causes of happiness and how they express happiness? Can gender, marital status, parenthood status and/or age be predicted from textual data expressing happiness? The first question is tackled in two steps: first, we transform the happy moments into a set of topics, lemmas, part of speech sequences, and dependency relations; then, we use each set as predictors in multi-variable binary and multinomial logistic regressions to rank these predictors in terms of their influence on each outcome variable (gender, marital status, parenthood status and age). For the prediction task, we use character, lexical, grammatical, semantic, and syntactic features in a machine learning document classification approach. The classification algorithms used include logistic regression, gradient boosting, and fastText. Our results show that textual data expressing moments of happiness can be quite beneficial in understanding the “causes of happiness” for different social groups, and that social characteristics like gender, marital status, parenthood status, and, to some extent age, can be successfully predicted form such textual data. This research aims to bring together elements from philosophy and psychology to be examined by computational corpus linguistics methods in a way that promotes the use of Natural Language Processing for the Humanities.


Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 102
Author(s):  
Kyoung Hwa Lee ◽  
Jae June Dong ◽  
Subin Kim ◽  
Dayeong Kim ◽  
Jong Hoon Hyun ◽  
...  

Early detection of bacteremia is important to prevent antibiotic abuse. Therefore, we aimed to develop a clinically applicable bacteremia prediction model using machine learning technology. Data from two tertiary medical centers’ electronic medical records during a 12-year-period were extracted. Multi-layer perceptron (MLP), random forest, and gradient boosting algorithms were applied for machine learning analysis. Clinical data within 12 and 24 hours of blood culture were analyzed and compared. Out of 622,771 blood cultures, 38,752 episodes of bacteremia were identified. In MLP with 128 hidden layer nodes, the area under the receiver operating characteristic curve (AUROC) of the prediction performance in 12- and 24-h data models was 0.762 (95% confidence interval (CI); 0.7617–0.7623) and 0.753 (95% CI; 0.7520–0.7529), respectively. AUROC of causative-pathogen subgroup analysis predictive value for Acinetobacter baumannii bacteremia was the highest at 0.839 (95% CI; 0.8388–0.8394). Compared to primary bacteremia, AUROC of sepsis caused by pneumonia was highest. Predictive performance of bacteremia was superior in younger age groups. Bacteremia prediction using machine learning technology appeared possible for acute infectious diseases. This model was more suitable especially to pneumonia caused by Acinetobacter baumannii. From the 24-h blood culture data, bacteremia was predictable by substituting only the continuously variable values.


Author(s):  
Mario Santana-Cibrian ◽  
Manuel A. Acuna-Zegarra ◽  
Jorge X. Velasco-Hernandez

SARS-CoV-2 has now infected 15 million people and produced more than six hundred thousand deaths around the world. Due to high transmission levels, many governments implemented social-distancing measures and confinement with different levels of required compliance to mitigate the COVID-19 epidemic. In several countries, these measures were effective, and it was possible to flatten the epidemic curve and control it. In others, this objective was not or has not been achieved. In far to many cities around the world rebounds of the epidemic are occurring or, in others, plateau-like states have appeared where high incidence rates remain constant for relatively long periods of time. Nonetheless, faced with the challenge of urgent social need to reactivate their economies, many countries have decided to lift mitigation measures at times of high incidence. In this paper, we use a mathematical model to characterize the impact of short duration transmission events within the confinement period previous but close to the epidemic peak. The model describes too, the possible consequences on the disease dynamics after mitigation measures are lifted. We use Mexico City as a case study. The results show that events of high mobility may produce either a later higher peak, a long plateau with relatively constant but high incidence or the same peak as in the original baseline epidemic curve, but with a post-peak interval of slower decay. Finally, we also show the importance of carefully timing the lifting of mitigation measures. If this occurs during a period of high incidence, then the disease transmission will rapidly increase, unless the effective contact rate keeps decreasing, which will be very difficult to achieve once the population is released.


Sign in / Sign up

Export Citation Format

Share Document