scholarly journals Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques

2022 ◽  
Vol 80 (1) ◽  
Author(s):  
Romana Haneef ◽  
Mariken Tijhuis ◽  
Rodolphe Thiébaut ◽  
Ondřej Májek ◽  
Ivan Pristaš ◽  
...  

Abstract Background The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. Method We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents. Results We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. Conclusions This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.

2021 ◽  
Author(s):  
Romana Haneef ◽  
Mariken Tijhuis ◽  
Rodolphe Thiébaut ◽  
Ondřej Májek ◽  
Ivan Pristaš ◽  
...  

Abstract Background: The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods. Method: We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents.Results: We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations. Conclusions: This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.


2020 ◽  
Author(s):  
Francisco Diego Rabelo-da-Ponte ◽  
Jacson Gabriel Feiten ◽  
Benson Mwangi ◽  
Fernando C. Barros ◽  
Fernando C. Wehrmeister ◽  
...  

Author(s):  
Binayak Sen ◽  
Uttam Kumar Mandal ◽  
Sankar Prasad Mondal

Computational approaches like “Black box” predictive modeling approaches are extensively used technique applied in machine learning operations of today. Considering the latest trends, present study compares capabilities of two different “Black box” predictive model like ANFIS and ANN with a population-based evolutionary algorithm GEP for forecasting machining parameters of Inconel 690 material, machined in a CNC-assisted 3-axis milling machine. The aims of this article are to represent considerable data showing, every techniques performance under the criteria of root mean square error (RSME), Correlational coefficient R and Mean absolute percentage error (MAPE). In this chapter, we vigorously demonstrate that the performance of the GEP model is far superior to ANFIS and ANN model.


2017 ◽  
pp. 71-93 ◽  
Author(s):  
I. Goloshchapova ◽  
M. Andreev

The paper proposes a new approach to measure inflation expectations of the Russian population based on text mining of information on the Internet with the help of machine learning techniques. Two indicators were constructed on the base of readers’ comments to inflation news in major Russian economic media available in the web at the period from 2014 through 2016: with the help of words frequency and sentiment analysis of comments content. During the whole considered period of time both indicators were characterized by dynamics adequate to the development of macroeconomic situation and were also able to forecast dynamics of official Bank of Russia indicators of population inflation expectations for approximately one month in advance.


Author(s):  
Axel-Cyrille Ngonga Ngomo ◽  
Mohamed Ahmed Sherif ◽  
Kleanthi Georgala ◽  
Mofeed Mohamed Hassan ◽  
Kevin Dreßler ◽  
...  

AbstractThe Linked Data paradigm builds upon the backbone of distributed knowledge bases connected by typed links. The mere volume of current knowledge bases as well as their sheer number pose two major challenges when aiming to support the computation of links across and within them. The first is that tools for link discovery have to be time-efficient when they compute links. Secondly, these tools have to produce links of high quality to serve the applications built upon Linked Data well. Solutions to the second problem build upon efficient computational approaches developed to solve the first and combine these with dedicated machine learning techniques. The current version of the Limes framework is the product of seven years of research on these two challenges. A series of machine learning techniques and efficient computation approaches were developed and integrated into this framework to address the link discovery problem. The framework combines these diverse algorithms within a generic and extensible architecture. In this article, we give an overview of version 1.7.4 of the open-source release of the framework. In particular, we focus on an overview of the architecture of the framework, an intuition of its inner workings and a brief overview of the approaches it contains. Some descriptions of the applications within which the framework was used complete the paper. Our framework is open-source and available under a GNU license at https://github.com/dice-group/LIMES together with a user manual and a developer manual.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0251876
Author(s):  
Ananya Malhotra ◽  
Bernard Rachet ◽  
Audrey Bonaventure ◽  
Stephen P. Pereira ◽  
Laura M. Woods

Background Pancreatic cancer (PC) represents a substantial public health burden. Pancreatic cancer patients have very low survival due to the difficulty of identifying cancers early when the tumour is localised to the site of origin and treatable. Recent progress has been made in identifying biomarkers for PC in the blood and urine, but these cannot be used for population-based screening as this would be prohibitively expensive and potentially harmful. Methods We conducted a case-control study using prospectively-collected electronic health records from primary care individually-linked to cancer registrations. Our cases were comprised of 1,139 patients, aged 15–99 years, diagnosed with pancreatic cancer between January 1, 2005 and June 30, 2009. Each case was age-, sex- and diagnosis time-matched to four non-pancreatic (cancer patient) controls. Disease and prescription codes for the 24 months prior to diagnosis were used to identify 57 individual symptoms. Using a machine learning approach, we trained a logistic regression model on 75% of the data to predict patients who later developed PC and tested the model’s performance on the remaining 25%. Results We were able to identify 41.3% of patients < = 60 years at ‘high risk’ of developing pancreatic cancer up to 20 months prior to diagnosis with 72.5% sensitivity, 59% specificity and, 66% AUC. 43.2% of patients >60 years were similarly identified at 17 months, with 65% sensitivity, 57% specificity and, 61% AUC. We estimate that combining our algorithm with currently available biomarker tests could result in 30 older and 400 younger patients per cancer being identified as ‘potential patients’, and the earlier diagnosis of around 60% of tumours. Conclusion After further work this approach could be applied in the primary care setting and has the potential to be used alongside a non-invasive biomarker test to increase earlier diagnosis. This would result in a greater number of patients surviving this devastating disease.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Andrea Bellavia ◽  
Ran S. Rotem ◽  
Aisha S. Dickerson ◽  
Johnni Hansen ◽  
Ole Gredal ◽  
...  

AbstractInvestigating the joint exposure to several risk factors is becoming a key component of epidemiologic studies. Individuals are exposed to multiple factors, often simultaneously, and evaluating patterns of exposures and high-dimension interactions may allow for a better understanding of health risks at the individual level. When jointly evaluating high-dimensional exposures, common statistical methods should be integrated with machine learning techniques that may better account for complex settings. Among these, Logic regression was developed to investigate a large number of binary exposures as they relate to a given outcome. This method may be of interest in several public health settings, yet has never been presented to an epidemiologic audience. In this paper, we review and discuss Logic regression as a potential tool for epidemiological studies, using an example of occupation history (68 binary exposures of primary occupations) and amyotrophic lateral sclerosis in a population-based Danish cohort. Logic regression identifies predictors that are Boolean combinations of the original (binary) exposures, fully operating within the regression framework of interest (e. g. linear, logistic). Combinations of exposures are graphically presented as Logic trees, and techniques for selecting the best Logic model are available and of high importance. While highlighting several advantages of the method, we also discuss specific drawbacks and practical issues that should be considered when using Logic regression in population-based studies. With this paper, we encourage researchers to explore the use of machine learning techniques when evaluating large-dimensional epidemiologic data, as well as advocate the need of further methodological work in the area.


Sign in / Sign up

Export Citation Format

Share Document