scholarly journals Use of artificial intelligence to estimate population health indicators in France

2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
R Haneef ◽  
S Fuentes ◽  
R Hrzic ◽  
S Fosse-Edorh ◽  
S Kab ◽  
...  

Abstract Background The use of artificial intelligence is increasing to estimate and predict health outcomes from large data sets. The main objectives were to develop two algorithms using machine learning techniques to identify new cases of diabetes (case study I) and to classify type 1 and type 2 (case study II) in France. Methods We selected the training data set from a cohort study linked with French national Health database (i.e., SNDS). Two final datasets were used to achieve each objective. A supervised machine learning method including eight following steps was developed: the selection of the data set, case definition, coding and standardization of variables, split data into training and test data sets, variable selection, training, validation and selection of the model. We planned to apply the trained models on the SNDS to estimate the incidence of diabetes and the prevalence of type 1/2 diabetes. Results For the case study I, 23/3468 and for case study II, 14/3481 SNDS variables were selected based on an optimal balance between variance explained and using the ReliefExp algorithm. We trained four models using different classification algorithms on the training data set. The Linear Discriminant Analysis model performed best in both case studies. The models were assessed on the test datasets and achieved a specificity of 67% and a sensitivity of 62% in case study I, and a specificity of 97 % and sensitivity of 100% in case study II. The case study II model was applied to the SNDS and estimated the prevalence of type 1 diabetes in 2016 in France of 0.3% and for type 2, 4.4%. The case study model I was not applied to the SNDS. Conclusions The case study II model to estimate the prevalence of type 1/2 diabetes has good performance and will be used in routine surveillance. The case study I model to identify new cases of diabetes showed a poor performance due to missing necessary information on determinants of diabetes and will need to be improved for further research.

2021 ◽  
Author(s):  
Romana Haneef ◽  
Sonsoles Fuentes ◽  
Sandrine Fosse-Edorh ◽  
Rok Hrzic ◽  
Sofiane Kab ◽  
...  

Abstract Background The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidence of diabetes based on the number of reimbursements over the last 2 years. Methods We selected a training data set from a population-based epidemiological cohort (i.e., CONSTANCES) linked with French National Health Database (i.e., SNDS) to develop a ML-algorithm for estimating the incidence of diabetes. To develop this algorithm, we adopted a supervised ML approach. Following steps were performed: i. selection of final data set, ii. target definition, iii. coding variables for a given window of time, iv. split final data into training and test data sets, v. variables selection, vi. training model, vii. validation of model with test data set and viii. selection of the model. Results The final data set used to develop the algorithm included 44,659 participants from CONSTANCES. Out of 3,468 variables, which were similar in SNDS and CONSTANCES cohort were coded, 23 variables were selected to train different algorithms. The final algorithm to estimate the incidence of diabetes was a Linear Discriminant Analysis model based on number of reimbursements of selected variables related to biological tests, drugs, medical acts and hospitalization without a procedure over the last two years. This algorithm has a sensitivity of 62%, a specificity of 67% and an accuracy of 67% [95% CI: 0.66 – 0.68]. Conclusions Supervised ML is an innovative tool for the development of new methods to exploit large health administrative databases. In context of InfAct project, we have developed and applied the first time a generic ML-algorithm to estimate the incidence of diabetes for public health surveillance. The ML-algorithm we have developed, has a moderate performance. The next step is to apply this algorithm on SNDS to estimate the incidence of type 2 diabetes cases. More research is needed to apply various MLTs to estimate the incidence of various health conditions and to calculate the contribution of various risk factors on developing type 2 diabetes.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 245
Author(s):  
Konstantinos G. Liakos ◽  
Georgios K. Georgakilas ◽  
Fotis C. Plessas ◽  
Paris Kitsos

A significant problem in the field of hardware security consists of hardware trojan (HT) viruses. The insertion of HTs into a circuit can be applied for each phase of the circuit chain of production. HTs degrade the infected circuit, destroy it or leak encrypted data. Nowadays, efforts are being made to address HTs through machine learning (ML) techniques, mainly for the gate-level netlist (GLN) phase, but there are some restrictions. Specifically, the number and variety of normal and infected circuits that exist through the free public libraries, such as Trust-HUB, are based on the few samples of benchmarks that have been created from circuits large in size. Thus, it is difficult, based on these data, to develop robust ML-based models against HTs. In this paper, we propose a new deep learning (DL) tool named Generative Artificial Intelligence Netlists SynthesIS (GAINESIS). GAINESIS is based on the Wasserstein Conditional Generative Adversarial Network (WCGAN) algorithm and area–power analysis features from the GLN phase and synthesizes new normal and infected circuit samples for this phase. Based on our GAINESIS tool, we synthesized new data sets, different in size, and developed and compared seven ML classifiers. The results demonstrate that our new generated data sets significantly enhance the performance of ML classifiers compared with the initial data set of Trust-HUB.


2021 ◽  
Vol 79 (1) ◽  
Author(s):  
Romana Haneef ◽  
Sofiane Kab ◽  
Rok Hrzic ◽  
Sonsoles Fuentes ◽  
Sandrine Fosse-Edorh ◽  
...  

Abstract Background The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidence of diabetes based on the number of reimbursements over the last 2 years. Methods We selected a final data set from a population-based epidemiological cohort (i.e., CONSTANCES) linked with French National Health Database (i.e., SNDS). To develop this algorithm, we adopted a supervised ML approach. Following steps were performed: i. selection of final data set, ii. target definition, iii. Coding variables for a given window of time, iv. split final data into training and test data sets, v. variables selection, vi. training model, vii. Validation of model with test data set and viii. Selection of the model. We used the area under the receiver operating characteristic curve (AUC) to select the best algorithm. Results The final data set used to develop the algorithm included 44,659 participants from CONSTANCES. Out of 3468 variables from SNDS linked to CONSTANCES cohort were coded, 23 variables were selected to train different algorithms. The final algorithm to estimate the incidence of diabetes was a Linear Discriminant Analysis model based on number of reimbursements of selected variables related to biological tests, drugs, medical acts and hospitalization without a procedure over the last 2 years. This algorithm has a sensitivity of 62%, a specificity of 67% and an accuracy of 67% [95% CI: 0.66–0.68]. Conclusions Supervised ML is an innovative tool for the development of new methods to exploit large health administrative databases. In context of InfAct project, we have developed and applied the first time a generic ML-algorithm to estimate the incidence of diabetes for public health surveillance. The ML-algorithm we have developed, has a moderate performance. The next step is to apply this algorithm on SNDS to estimate the incidence of type 2 diabetes cases. More research is needed to apply various MLTs to estimate the incidence of various health conditions.


Author(s):  
Christopher MacDonald ◽  
Michael Yang ◽  
Shawn Learn ◽  
Ron Hugo ◽  
Simon Park

Abstract There are several challenges associated with existing rupture detection systems such as their inability to accurately detect during transient (such as pump dynamics) conditions, delayed responses and their inability to transfer models to different pipeline configurations easily. To address these challenges, we employ multiple Artificial Intelligence (AI) classifiers that rely on pattern recognitions instead of traditional operator-set thresholds. AI techniques, consisting of two-dimensional (2D) Convolutional Neural Networks (CNN) and Adaptive Neuro Fuzzy Interface Systems (ANFIS), are used to mimic processes performed by operators during a rupture event. This includes both visualization (using CNN) and rule-based decision making (using ANFIS). The system provides a level of reasoning to an operator through the use of the rule-based AI system. Pump station sensor data is non-dimensionalized prior to AI processing, enabling application to pipeline configurations outside of the training data set. AI algorithms undergo testing and training using two data sets: laboratory-collected data that mimics transient pump-station operations and real operator data that includes Real Time Transient Model (RTTM) simulated ruptures. The use of non-dimensional sensor data enables the system to detect ruptures from pipeline data not used in the training process.


2021 ◽  
Vol 9 (1) ◽  
pp. e001889
Author(s):  
Rodrigo M Carrillo-Larco ◽  
Manuel Castillo-Cara ◽  
Cecilia Anza-Ramirez ◽  
Antonio Bernabé-Ortiz

IntroductionWe aimed to identify clusters of people with type 2 diabetes mellitus (T2DM) and to assess whether the frequency of these clusters was consistent across selected countries in Latin America and the Caribbean (LAC).Research design and methodsWe analyzed 13 population-based national surveys in nine countries (n=8361). We used k-means to develop a clustering model; predictors were age, sex, body mass index (BMI), waist circumference (WC), systolic/diastolic blood pressure (SBP/DBP), and T2DM family history. The training data set included all surveys, and the clusters were then predicted in each country-year data set. We used Euclidean distance, elbow and silhouette plots to select the optimal number of clusters and described each cluster according to the underlying predictors (mean and proportions).ResultsThe optimal number of clusters was 4. Cluster 0 grouped more men and those with the highest mean SBP/DBP. Cluster 1 had the highest mean BMI and WC, as well as the largest proportion of T2DM family history. We observed the smallest values of all predictors in cluster 2. Cluster 3 had the highest mean age. When we reflected the four clusters in each country-year data set, a different distribution was observed. For example, cluster 3 was the most frequent in the training data set, and so it was in 7 out of 13 other country-year data sets.ConclusionsUsing unsupervised machine learning algorithms, it was possible to cluster people with T2DM from the general population in LAC; clusters showed unique profiles that could be used to identify the underlying characteristics of the T2DM population in LAC.


2006 ◽  
Vol 18 (1) ◽  
pp. 119-142 ◽  
Author(s):  
Yael Eisenthal ◽  
Gideon Dror ◽  
Eytan Ruppin

This work presents a novel study of the notion of facial attractiveness in a machine learning context. To this end, we collected human beauty ratings for data sets of facial images and used various techniques for learning the attractiveness of a face. The trained predictor achieves a significant correlation of 0.65 with the average human ratings. The results clearly show that facial beauty is a universal concept that a machine can learn. Analysis of the accuracy of the beauty prediction machine as a function of the size of the training data indicates that a machine producing human-like attractiveness rating could be obtained given a moderately larger data set.


Author(s):  
Ruslan Babudzhan ◽  
Konstantyn Isaienkov ◽  
Danilo Krasiy ◽  
Oleksii Vodka ◽  
Ivan Zadorozhny ◽  
...  

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.


Author(s):  
Sotiris Kotsiantis ◽  
Dimitris Kanellopoulos ◽  
Panayotis Pintelas

In classification learning, the learning scheme is presented with a set of classified examples from which it is expected tone can learn a way of classifying unseen examples (see Table 1). Formally, the problem can be stated as follows: Given training data {(x1, y1)…(xn, yn)}, produce a classifier h: X- >Y that maps an object x ? X to its classification label y ? Y. A large number of classification techniques have been developed based on artificial intelligence (logic-based techniques, perception-based techniques) and statistics (Bayesian networks, instance-based techniques). No single learning algorithm can uniformly outperform other algorithms over all data sets. The concept of combining classifiers is proposed as a new direction for the improvement of the performance of individual machine learning algorithms. Numerous methods have been suggested for the creation of ensembles of classi- fiers (Dietterich, 2000). Although, or perhaps because, many methods of ensemble creation have been proposed, there is as yet no clear picture of which method is best.


Sign in / Sign up

Export Citation Format

Share Document