Predicting the Linguistic Accessibility of Chinese Health Translations: Using Machine Learning Algorithms (Preprint)

2021 ◽  
Author(s):  
Meng Ji ◽  
Pierrette Bouillon

BACKGROUND Linguistic accessibility has important impact on the reception and utilization of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied. OBJECTIVE Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organization health materials with original Chinese health resources developed by the Chinese health authorities. METHODS Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organization with original Chinese materials from China Centre for Disease Control and Prevention. RESULTS Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008. CONCLUSIONS The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.

2021 ◽  
Author(s):  
Christine Ji

BACKGROUND Linguistic accessibility has important impact on the reception and utilisation of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied. OBJECTIVE Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organisation health materials with original Chinese health resources developed by the Chinese health authorities. METHODS Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organisation with original Chinese materials from China Centre for Disease Control and Prevention. RESULTS Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008. CONCLUSIONS The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.


2022 ◽  
pp. 383-393
Author(s):  
Lokesh M. Giripunje ◽  
Tejas Prashant Sonar ◽  
Rohit Shivaji Mali ◽  
Jayant C. Modhave ◽  
Mahesh B. Gaikwad

Risk because of heart disease is increasing throughout the world. According to the World Health Organization report, the number of deaths because of heart disease is drastically increasing as compared to other diseases. Multiple factors are responsible for causing heart-related issues. Many approaches were suggested for prediction of heart disease, but none of them were satisfactory in clinical terms. Heart disease therapies and operations available are so costly, and following treatment, heart disease is also costly. This chapter provides a comprehensive survey of existing machine learning algorithms and presents comparison in terms of accuracy, and the authors have found that the random forest classifier is the most accurate model; hence, they are using random forest for further processes. Deployment of machine learning model using web application was done with the help of flask, HTML, GitHub, and Heroku servers. Webpages take input attributes from the users and gives the output regarding the patient heart condition with accuracy of having coronary heart disease in the next 10 years.


2019 ◽  
Author(s):  
Thomas M. Kaiser ◽  
Pieter B. Burger

Machine learning continues to make strident advances in the prediction of desired properties concerning drug development. Problematically, the efficacy of machine learning in these arenas is reliant upon highly accurate and abundant data. These two limitations, high accuracy and abundance, are often taken together; however, insight into the dataset accuracy limitation of contemporary machine learning algorithms may yield insight into whether non-bench experimental sources of data may be used to generate useful machine learning models where there is a paucity of experimental data. We took highly accurate data across six kinase types, one GPCR, one polymerase, a human protease, and HIV protease, and intentionally introduced error at varying population proportions in the datasets for each target. With the generated error in the data, we explored how the retrospective accuracy of a Naïve Bayes Network, a Random Forest Model, and a Probabilistic Neural Network model decayed as a function of error. Additionally, we explored the ability of a training dataset with an error profile resembling that produced by the Free Energy Perturbation method (FEP+) to generate machine learning models with useful retrospective capabilities. The categorical error tolerance was quite high for a Naïve Bayes Network algorithm averaging 39% error in the training set required to lose predictivity on the test set. Additionally, a Random Forest tolerated a significant degree of categorical error introduced into the training set with an average error of 29% required to lose predictivity. However, we found the Probabilistic Neural Network algorithm did not tolerate as much categorical error requiring an average of 20% error to lose predictivity. Finally, we found that a Naïve Bayes Network and a Random Forest could both use datasets with an error profile resembling that of FEP+. This work demonstrates that computational methods of known error distribution like FEP+ may be useful in generating machine learning models not based on extensive and expensive in vitro-generated datasets.


2020 ◽  
Vol 24 (Suppl. 1) ◽  
pp. 131-137
Author(s):  
Azhari Elhag ◽  
Hanaa Abu-Zinadah

In a different area of a field of the real life, problem of accurate forecasting has acquired great importance that present the interesting serve which led to the best ways to achieve a goal. So, in this paper, we aimed to compare the accuracy of some statistical models such as Time Series and Deep Learning models, to forecasting the fertility rate in the Kingdom of Saudi Arabia, the data source is the World Health Organization over the period of 1960 to 2019. The performances of models were evaluated by errors measures mean absolute percentage error.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Tahia Tazin ◽  
Md Nur Alam ◽  
Nahian Nakiba Dola ◽  
Mohammad Sajibul Bari ◽  
Sami Bourouis ◽  
...  

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization (WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the study analysis.


Author(s):  
Aadar Pandita

Heart diseases have been the primary reason for death all over the world. Majority of the deaths related to cardiovascular problems are caused by heart attacks and strokes. The World Health Organization (WHO) indicates that an approximate 17.9 million people die due to such diseases every year. Therefore, it is essential that we find methods to ensure the minimization of these numbers. In order to minimize the detrimental effects of heart diseases, we must try to predict its presence at earlier stages. Machine Learning algorithms can help us effectively predict such results with a high degree of accuracy which can in turn help doctors and patients detect the onset of such diseases and reduce their impact or prevent them from occurring. Our objective is to create a system that is able to accurately determine the presence of heart disease in a time and cost efficient manner.


Author(s):  
A Lakshmanarao ◽  
M Raja Babu ◽  
T Srinivasa Ravi Kiran

<p>The whole world is experiencing a novel infection called Coronavirus brought about by a Covid since 2019. The main concern about this disease is the absence of proficient authentic medicine The World Health Organization (WHO) proposed a few precautionary measures to manage the spread of illness and to lessen the defilement in this manner decreasing cases. In this paper, we analyzed the Coronavirus dataset accessible in Kaggle. The past contributions from a few researchers of comparative work covered a limited number of days. Our paper used the covid19 data till May 2021. The number of confirmed cases, recovered cases, and death cases are considered for analysis. The corona cases are analyzed in a daily, weekly manner to get insight into the dataset. After extensive analysis, we proposed machine learning regressors for covid 19 predictions. We applied linear regression, polynomial regression, Decision Tree Regressor, Random Forest Regressor. Decision Tree and Random Forest given an r-square value of 0.99. We also predicted future cases with these four algorithms. We can able to predict future cases better with the polynomial regression technique. This prediction can help to take preventive measures to control covid19 in near future. All the experiments are conducted with python language</p>


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251365
Author(s):  
Amita Sharma ◽  
Willem J. M. I. Verbeke

Anxiety disorders are a group of mental illnesses that cause constant and overwhelming feelings of anxiety and fear. Excessive anxiety can make an individual avoid work, school, family get-togethers, and other social situations that in turn might amplify these symptoms. According to the World Health Organization (WHO), one in thirteen persons globally suffers from anxiety. It is high time to understand the roles of various clinical biomarker measures that can diagnose the types of anxiety disorders. In this study, we apply machine learning (ML) techniques to understand the importance of a set of biomarkers with four types of anxiety disorders—Generalized Anxiety Disorder (GAD), Agoraphobia (AP), Social Anxiety Disorder (SAD) and Panic Disorder (PD). We used several machine learning models and extracted the variable importance contributing to a type of anxiety disorder. The study uses a sample of 11,081 Dutch citizens’ data collected by the Lifelines, Netherlands. The results show that there are significant and low correlations among GAD, AP, PD and SAD and we extracted the variable importance hierarchy of biomarkers with respect to each type of anxiety disorder which will be helpful in designing the experimental setup for clinical trials related to influence of biomarkers on type of anxiety disorder.


2020 ◽  
Author(s):  
Albert Morera ◽  
Juan Martínez de Aragón ◽  
José Antonio Bonet ◽  
Jingjing Liang ◽  
Sergio de-Miguel

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.


Author(s):  
R. Saradha Devi ◽  
Dr. J. G. R. Sathiaseelan

Corona Virus Infectious Disease (COVID-19) is an infectious disease. The COVID-19 disease came to earth in early 2019. It is expanding exponentially throughout the world and affected an enormous number of human beings starting from the last year. COVID-19 was declared “Pandemic” by the World Health Organization (WHO) on March 11, 2020. This research proposed a method for confirming COVID-19 instances after doctors' diagnoses. The goal of this study is to see how similar the projected findings are to the original data in COVID-19 Confirmed-Negative-Released-Death situations using machine learning. This paper suggests a verification approach created on the Deep-learning Neural Network concept for this purpose. Long short-term memory (LSTM) and Gated Recurrent Unit (GRU) are also used in this framework to train the dataset. The outcomes of the forecast match those predicted by clinical doctors.


Sign in / Sign up

Export Citation Format

Share Document