Risk factors identification and prediction of anemia among women in Bangladesh using machine learning techniques

2021 ◽  
Vol 17 ◽  
Author(s):  
Md. Merajul Islam ◽  
Md. Jahanur Rahman ◽  
Dulal Chandra Roy ◽  
Md. Moidul Islam ◽  
Most. Tawabunnahar ◽  
...  

Background: Anemia is a major public health problem with raising its prevalence worldwide including Bangladesh. Objectives: To identify the risk factors of anemia among women in Bangladesh and its prediction using machine learning (ML) based techniques. Methods: The anemia dataset, comprising of 3,020 respondents, was extracted from the Bangladesh demographic and health survey (BDHS). Two feature selection techniques as logistic regression (LR) and random forest (RF) have been utilized to determine the risk factors of anemia. Additionally, eight ML-based techniques, namely LR, linear discriminant analysis (LDA), K-nearest neighborhood (KNN), support vector machine (SVM), quadratic discriminant analysis (QDA), neural network (NN), classification and regression tree (CART), and RF have been also utilized to predict anemia disease among women in Bangladesh. Classification accuracy and area under the curve (AUC) are used to evaluate the performances of these classifiers. Results: LR and RF-based feature selection results indicate that out of 15 factors, 13 for LR and 14 factors for RF appear to be significant risk factors for anemia among women. All predictive models provide the highest classification accuracy and AUC from 74.10-81.29% and 0.744-0.819 under RF features. However, the combination of RF-based feature selection along with RF-based classifier gives the highest classification accuracy (81.29%) and AUC (0.819). Conclusion: Out of eight predictive models, the RF-RF based combination model shows the best performance for the prediction of anemia. This study suggests policymakers to make appropriate decisions to control the anemia using these mentioned combinations to save time and reduce the cost for Bangladeshi women.

2018 ◽  
Vol 7 (4) ◽  
pp. 197-201
Author(s):  
Mir M Hassan Bullo ◽  
Mirza Amir Baig ◽  
Jawad Faisal Malik ◽  
Ejaz Ahmad Khan ◽  
Muazam Abbas Ranjha ◽  
...  

Background: Measles is highly contagious vaccine preventable disease (VPD), and a major public health problem considered as leading cause of morbidity and mortality in developing countries like Pakistan. An outbreak of measles was reported in Sharifabad Islamabad on 15th of April 2017, and an investigation was launched to assess the magnitude of outbreak, evaluate risk factors and recommend control measures. Methods: A comprehensive house to house active case search along with vaccine coverage survey was conducted from April 19-22, 2017. A case was defined as "onset of maculopapular rash with fever in a resident of Sharifabad with at least one of the following signs/ symptoms, Coryza, Conjunctivitis, Cough, Otitis media or Pneumonia present in between 19 March to 22nd April 2017". Four age & sex matched controls were selected from the neighborhood. Data was collected through interview method using structured questionnaire and vaccination coverage was determined by using Epi survey form. Blood samples were sent for laboratory confirmation. Results: A total of eight cases were identified through active case finding while three were reported by local practitioner. Mean age of cases were 20 months (range 8-36 months). Severely affected age-group was 1-2 years with attack rate of 46%. Around two-third (64%) of cases and a few (16%) of controls were unvaccinated against measles. Contact with measles patient [OR 25.2, CI 3.9-160.1, P=0.00], unvaccinated children [OR 9.2 CI 2.12-40.4, P=0.000], social misconception regarding vaccination [OR 7.8 CI 1.42-42.6, P=0.00], and distance from healthcare facility [OR 5.7 CI 1.15-28.35, P=0.02] were significant risk factors. Vaccine efficacy was 90%. Conclusion: Main reasons of the outbreak were contact with the cases, and low vaccination status. We recommended comprehensive measles vaccination and community awareness sessions. On our recommendations district health authority Islamabad carried out mop up of whole area.


2021 ◽  
Vol 104 (2) ◽  
pp. 233-239

ackground: Tuberculosis (TB) is a major public health problem, including Thailand. Anti-TB drugs are very effective treatment, but they can cause hepatotoxicity. Data on the prevalence of anti-TB drug-induced hepatotoxicity (DIH), as well as the contributing risk factors, are scarce in Thailand. Objective: To measure the prevalence and identify risk factors associated with first-line drugs (FLD) induced hepatoxicity in TB patients. Materials and Methods: The present study was a retrospective study design in TB clinic of Suratthani Hospital, in Southern Thailand. All patients diagnosed with TB and received FLD between January and December 2017, were eligible for the study. Hepatoxicity defined as the following criteria: serum aspartate aminotransferase (AST) or alanine aminotransferase (ALT) levels >5x upper limit of normal (ULN) without symptoms, or AST or ALT >3x ULN with clinical symptoms. Results: Of all the 198 TB cases, 18 were identified as DIH. Prevalence of DIH was 9.1%. Hepatitis after FLD was independently associated with age>60 years (adjusted OR [aOR] 28.49, 95% CI 2.68 to 302.95, p=0.005) and serum albumin <3.5 g/dL (aOR 20.97, 95% CI 2.11 to 208.51, p=0.009). Conclusion: Age of more than 60 years and low serum albumin of less than 3.5 g/dL were significant risk factors associated with first-line anti-TB drugs induced hepatoxicity. Keywords: Hepatoxicity, Anti-tuberculosis drug, Risk factor, Thailand


Author(s):  
Maria Mohammad Yousef ◽  

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.


2014 ◽  
Vol 31 (2) ◽  
pp. 133-141
Author(s):  
H. Shahpesandy ◽  
M. Oakes ◽  
Ad van Heeswijck

BackgroundSuicide is a major public health problem, with mental disorders being one of its major risk factors. The high incidence of suicide on the Isle of Wight has motivated this study, the first of its kind on suicide in this small geographic area.AimThe aim of the study was to identify socio-demographic and clinical risk factors for suicide in the population of service users and non-service users, and gender-related characteristics of suicidal behaviour in a limited geographic region.MethodData were collected on 68 cases of suicide (ICD-10×60-X84) from residents of the Isle of Wight District between January 2006 and December 2009. All data were statistically analysed using Pearson’s χ2 test and Yates’ correction for continuity.ResultsThe mean annual suicide rates over the period were 5.65 per 100 000 for women and 19.28 for men. Significantly (p=0.0006), more men than women (male/female ratio 3:1) died as a result of suicide. Relatively (p=0.07) more women (56.2%) than men (32.7%), and significantly more (p=0.05) service users (45.3%) than non-service users (13.3%) were unemployed. Significantly, more (p=0.0006) service users (64%) than non-service users (20%) had a history of suicide attempts and relatively (p=0.06) more (50.9%) service users than non-service users (20%) had attended the accident and emergency department before their death; 69% had an adverse life event within a year before their suicide. Depression as the most common Axis-I illness was diagnosed in 36% of all; but significantly (p=0.008) more in women (66.6%) than men (17.3%). Relatively (p=0.07) more women (56.2%) than men (32.7%) have contacted services before their death. Suicide by hanging was the most common cause, accounting for the death of 71% of men and 50% of women.ConclusionsThe study found that 80% of all suicides occurred in people suffering from mental disorder. Men are at a significant risk of suicide. Depressive disorders in women and stress-related disorders in men were the most common mental disorders. Treating mental disorders and co-morbid conditions seems to be one of the key elements in suicide prevention strategies.


2021 ◽  
Vol 11 (6) ◽  
pp. 541
Author(s):  
Jin-Woo Kim ◽  
Jeong Yee ◽  
Sang-Hyeon Oh ◽  
Sun-Hyun Kim ◽  
Sun-Jong Kim ◽  
...  

Objective: This nested case–control study aimed to investigate the effects of VEGFA polymorphisms on the development of bisphosphonate-related osteonecrosis of the jaw (BRONJ) in women with osteoporosis. Methods: Eleven single nucleotide polymorphisms (SNPs) of the VEGFA were assessed in a total of 125 patients. Logistic regression was performed for multivariable analysis. Machine learning algorithms, namely, fivefold cross-validated multivariate logistic regression, elastic net, random forest, and support vector machine, were developed to predict risk factors for BRONJ occurrence. Area under the receiver-operating curve (AUROC) analysis was conducted to assess clinical performance. Results: The VEGFA rs881858 was significantly associated with BRONJ development. The odds of BRONJ development were 6.45 times (95% CI, 1.69–24.65) higher among carriers of the wild-type rs881858 allele compared with variant homozygote carriers after adjusting for covariates. Additionally, variant homozygote (GG) carriers of rs10434 had higher odds than those with wild-type allele (OR, 3.16). Age ≥ 65 years (OR, 16.05) and bisphosphonate exposure ≥ 36 months (OR, 3.67) were also significant risk factors for BRONJ occurrence. AUROC values were higher than 0.78 for all machine learning methods employed in this study. Conclusion: Our study showed that the BRONJ occurrence was associated with VEGFA polymorphisms in osteoporotic women.


2012 ◽  
pp. 724-768
Author(s):  
Jesmin Nahar ◽  
Kevin S. Tickle ◽  
A. B.M. Shawkat Ali

Extracting useful information from structured and unstructured biological data is crucial in the health industry. Some examples include medical practitioner’s need to identify breast cancer patient in the early stage, estimate survival time of a heart disease patient, or recognize uncommon disease characteristics which suddenly appear. Currently there is an explosion in biological data available in the data bases. But information extraction and true open access to data are require time to resolve issues such as ethical clearance. The emergence of novel IT technologies allows health practitioners to facilitate the comprehensive analyses of medical images, genomes, transcriptomes, and proteomes in health and disease. The information that is extracted from such technologies may soon exert a dramatic change in the pace of medical research and impact considerably on the care of patients. The current research will review the existing technologies being used in heart and cancer research. Finally this research will provide some possible solutions to overcome the limitations of existing technologies. In summary the primary objective of this research is to investigate how existing modern machine learning techniques (with their strength and limitations) are being used in the indent of heartbeat related disease and the early detection of cancer in patients. After an extensive literature review these are the objectives chosen: to develop a new approach to find the association between diseases such as high blood pressure, stroke and heartbeat, to propose an improved feature selection method to analyze huge images and microarray databases for machine learning algorithms in cancer research, to find an automatic distance function selection method for clustering tasks, to discover the most significant risk factors for specific cancers, and to determine the preventive factors for specific cancers that are aligned with the most significant risk factors. Therefore we propose a research plan to attain these objectives within this chapter. The possible solutions of the above objectives are: new heartbeat identification techniques show promising association with the heartbeat patterns and diseases, sensitivity based feature selection methods will be applied to early cancer patient classification, meta learning approaches will be adopted in clustering algorithms to select an automatic distance function, and Apriori algorithm will be applied to discover the significant risks and preventive factors for specific cancers. We expect this research will add significant contributions to the medical professional to enable more accurate diagnosis and better patient care. It will also contribute in other area such as biomedical modeling, medical image analysis and early diseases warning.


Author(s):  
Jesmin Nahar ◽  
Kevin S. Tickle ◽  
A. B.M. Shawkat Ali

Extracting useful information from structured and unstructured biological data is crucial in the health industry. Some examples include medical practitioner’s need to identify breast cancer patient in the early stage, estimate survival time of a heart disease patient, or recognize uncommon disease characteristics which suddenly appear. Currently there is an explosion in biological data available in the data bases. But information extraction and true open access to data are require time to resolve issues such as ethical clearance. The emergence of novel IT technologies allows health practitioners to facilitate the comprehensive analyses of medical images, genomes, transcriptomes, and proteomes in health and disease. The information that is extracted from such technologies may soon exert a dramatic change in the pace of medical research and impact considerably on the care of patients. The current research will review the existing technologies being used in heart and cancer research. Finally this research will provide some possible solutions to overcome the limitations of existing technologies. In summary the primary objective of this research is to investigate how existing modern machine learning techniques (with their strength and limitations) are being used in the indent of heartbeat related disease and the early detection of cancer in patients. After an extensive literature review these are the objectives chosen: to develop a new approach to find the association between diseases such as high blood pressure, stroke and heartbeat, to propose an improved feature selection method to analyze huge images and microarray databases for machine learning algorithms in cancer research, to find an automatic distance function selection method for clustering tasks, to discover the most significant risk factors for specific cancers, and to determine the preventive factors for specific cancers that are aligned with the most significant risk factors. Therefore we propose a research plan to attain these objectives within this chapter. The possible solutions of the above objectives are: new heartbeat identification techniques show promising association with the heartbeat patterns and diseases, sensitivity based feature selection methods will be applied to early cancer patient classification, meta learning approaches will be adopted in clustering algorithms to select an automatic distance function, and Apriori algorithm will be applied to discover the significant risks and preventive factors for specific cancers. We expect this research will add significant contributions to the medical professional to enable more accurate diagnosis and better patient care. It will also contribute in other area such as biomedical modeling, medical image analysis and early diseases warning.


2012 ◽  
Vol 9 (1) ◽  
pp. 13-18 ◽  
Author(s):  
J Chataut ◽  
R K Adhikari ◽  
N P Sinha

Background Hypertension is the commonest cardiovascular disorder and now regarded as major public health problem. It is a precursor to major diseases like myocardial infarction, stroke, renal failure etc. There are very limited community based data on hypertension in Nepal, so, information on the prevalence of hypertension in the population is desirable. Objectives To estimate the prevalence of hypertension and to explore the risk factors associated with hypertension. Methods In a cross sectional study , a total of 527 subjects (males n=214 and females n=313) participated in our study (age ?18 years). The participants underwent anthropometric measurement and blood pressure and answered a pretested questionnaire. Hypertension was defined as per JNC VII criteria. Results Overall prevalence of hypertension was 22.4% (males: 32.7% and female: 15.3%). Age specific prevalence of hypertension showed significant progressive increase in blood pressure ranging from 8% to 35%. Almost 40% of hypertensives did not know about their status. Bivariate analysis showed significant relationship of hypertension with gender, age, literacy, physical inactivity, body mass index (BMI), smoking and alcohol consumption. Multivariate analysis excluded literacy but all other risk factors continued to show positive association with hypertension. Conclusion Being elderly, less physical activity, obese/overweight, smoking and alcohol consumption are significant risk factors of hypertension. Therefore, intervention measures are warranted emphasizing on modifiable risk factors such as smoking, alcohol consumption, physical activity and obesity to prevent hypertension.http://dx.doi.org/10.3126/kumj.v9i1.6255 Kathmandu Univ Med J 2011;9(1):13-18


2020 ◽  
Author(s):  
Xiaomao Fan ◽  
Xingxian Huang ◽  
Yang Zhao ◽  
Lin Wang ◽  
Haibo Yu ◽  
...  

Abstract Background: Depression is considered to be a major public health problem with significant implications for individuals and society. Patients with depression can be with complementary therapies such as acupuncture. Predicting the prognostic effects of acupuncture has a big significance of helping physicians to take early interventions for patients with depression and avoid malignant events.Methods: In this work, a novel framework of predicting prognostic effects of acupuncture for depression based on electroencephalogram (EEG) recordings is presented. Specifically, EEG is utilized for predicting prognostic effects of acupuncture. Max-relevance and min-redundancy (mRMR), with merits of removing redundant information among selected features and remaining high relevance between selected features and response variable, is employed to select important lead-rhythm features extracted from EEG recordings. Then, according to the subjects’ HAMD scores before and after acupuncture for 8 weeks, the reduction rate of HAMD score is calculated as a measure of the prognostic effects of acupuncture. Finally, five widely used machine learning methods are utilized for building the predicting models of prognostic effects of acupuncture for depression.Results: Experimental results show that non-linear machine learning methods have better performance than linear ones on predicting prognostic effects of acupuncture using EEG recordings. Especially, the support vector machine with Gaussian kernel (SVM-RBF) can achieve the best and stable performance using the mRMR with both evaluating criteria of FCD and FCQ for feature selection. Both mRMR-FCD and mRMR-FCQ obtain the same best performance, where the accuracy and F1 score are 84.61 % and 86.67 %, respectively. What’s more, lead-rhythm features selected by mRMR-FCD and mRMR-FCQ are analyzed. Top seven selected lead-rhythm features have much higher mRMR evaluating scores, which guarantee the good predicting performance for machine learning methods to some degree.Conclusion: The presented framework in this work is effective in predicting prognostic effects of acupuncture for depression. It can be integrated into an intelligent medical system and provide the information of prognostic effects of acupuncture for physicians. Informed prognostic effects of acupuncture for depression in advance and taking interventions can greatly reduce the risk of malignant events for patients with mental disorders.


2020 ◽  
Vol 4 (1) ◽  
pp. 29
Author(s):  
Sasan Sarbast Abdulkhaliq ◽  
Aso Mohammad Darwesh

Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.


Sign in / Sign up

Export Citation Format

Share Document