Abstract MP10: Microbiome-based Diagnostic Screening Of Cardiovascular Disease Using A Machine Learning Approach

Hypertension ◽  
2020 ◽  
Vol 76 (Suppl_1) ◽  
Author(s):  
Sachin Aryal ◽  
Ahmad Alimadadi ◽  
Ishan Manandhar ◽  
Bina Joe ◽  
Xi Cheng

In recent years, the microbiome has been recognized as an important factor associated with cardiovascular disease (CVD), which is the leading cause of human mortality worldwide. Disparities in gut microbial compositions between individuals with and without CVD were reported, whereby, we hypothesized that utilizing such microbiome-based data for training with supervised machine learning (ML) models could be exploited as a new strategy for evaluation of cardiovascular health. To test our hypothesis, we analyzed the metagenomics data extracted from the American Gut Project. Specifically, 16S rRNA reads from stool samples of 478 CVD and 473 non-CVD control samples were analyzed using five supervised ML algorithms: random forest (RF), support vector machine with radial kernel (svmRadial), decision tree (DT), elastic net (ENet) and neural networks (NN). Thirty-nine differential bacterial taxa (LEfSe: LDA > 2) were identified between CVD and non-CVD groups. ML classifications, using these taxonomic features, achieved an AUC (area under the receiver operating characteristic curve) of ~0.58 (RF). However, by choosing the top 500 high-variance features of operational taxonomic units (OTUs) for training ML models, an improved AUC of ~0.65 (RF) was achieved. Further, by limiting the selection to only the top 25 highly contributing OTU features to reduce the dimensionality of feature space, the AUC was further significantly enhanced to ~0.70 (RF). In summary, this study is the first to demonstrate the successful development of a ML model using microbiome-based datasets for a systematic diagnostic screening of CVD.

Author(s):  
Ruslan Babudzhan ◽  
Konstantyn Isaienkov ◽  
Danilo Krasiy ◽  
Oleksii Vodka ◽  
Ivan Zadorozhny ◽  
...  

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.


2012 ◽  
Vol 9 (73) ◽  
pp. 1934-1942 ◽  
Author(s):  
Philip J. Hepworth ◽  
Alexey V. Nefedov ◽  
Ilya B. Muchnik ◽  
Kenton L. Morgan

Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.


2020 ◽  
Vol 20 (S14) ◽  
Author(s):  
Sadiq Alinsaif ◽  
Jochen Lang

Abstract Background A various number of imaging modalities are available (e.g., magnetic resonance, x-ray, ultrasound, and biopsy) where each modality can reveal different structural aspects of tissues. However, the analysis of histological slide images that are captured using a biopsy is considered the gold standard to determine whether cancer exists. Furthermore, it can reveal the stage of cancer. Therefore, supervised machine learning can be used to classify histopathological tissues. Several computational techniques have been proposed to study histopathological images with varying levels of success. Often handcrafted techniques based on texture analysis are proposed to classify histopathological tissues which can be used with supervised machine learning. Methods In this paper, we construct a novel feature space to automate the classification of tissues in histology images. Our feature representation is to integrate various features sets into a new texture feature representation. All of our descriptors are computed in the complex Shearlet domain. With complex coefficients, we investigate not only the use of magnitude coefficients, but also study the effectiveness of incorporating the relative phase (RP) coefficients to create the input feature vector. In our study, four texture-based descriptors are extracted from the Shearlet coefficients: co-occurrence texture features, Local Binary Patterns, Local Oriented Statistic Information Booster, and segmentation-based Fractal Texture Analysis. Each set of these attributes captures significant local and global statistics. Therefore, we study them individually, but additionally integrate them to boost the accuracy of classifying the histopathology tissues while being fed to classical classifiers. To tackle the problem of high-dimensionality, our proposed feature space is reduced using principal component analysis. In our study, we use two classifiers to indicate the success of our proposed feature representation: Support Vector Machine (SVM) and Decision Tree Bagger (DTB). Results Our feature representation delivered high performance when used on four public datasets. As such, the best achieved accuracy: multi-class Kather (i.e., 92.56%), BreakHis (i.e., 91.73%), Epistroma (i.e., 98.04%), Warwick-QU (i.e., 96.29%). Conclusions Our proposed method in the Shearlet domain for the classification of histopathological images proved to be effective when it was investigated on four different datasets that exhibit different levels of complexity.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hiroki Kaneko ◽  
Hironobu Umakoshi ◽  
Masatoshi Ogata ◽  
Norio Wada ◽  
Norifusa Iwahashi ◽  
...  

AbstractPrimary aldosteronism (PA) is associated with an increased risk of cardiometabolic diseases, especially in unilateral subtype. Despite its high prevalence, the case detection rate of PA is limited, partly because of no clinical models available in general practice to identify patients highly suspicious of unilateral subtype of PA, who should be referred to specialized centers. The aim of this retrospective cross-sectional study was to develop a predictive model for subtype diagnosis of PA based on machine learning methods using clinical data available in general practice. Overall, 91 patients with unilateral and 138 patients with bilateral PA were randomly assigned to the training and test cohorts. Four supervised machine learning classifiers; logistic regression, support vector machines, random forests (RF), and gradient boosting decision trees, were used to develop predictive models from 21 clinical variables. The accuracy and the area under the receiver operating characteristic curve (AUC) for predicting of subtype diagnosis of PA in the test cohort were compared among the optimized classifiers. Of the four classifiers, the accuracy and AUC were highest in RF, with 95.7% and 0.990, respectively. Serum potassium, plasma aldosterone, and serum sodium levels were highlighted as important variables in this model. For feature-selected RF with the three variables, the accuracy and AUC were 89.1% and 0.950, respectively. With an independent external PA cohort, we confirmed a similar accuracy for feature-selected RF (accuracy: 85.1%). Machine learning models developed using blood test can help predict subtype diagnosis of PA in general practice.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Conrad J. Harrison ◽  
Chris J. Sidey-Gibbons

Abstract Background Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. Methods We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. Results Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. Conclusions In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3827
Author(s):  
Gemma Urbanos ◽  
Alberto Martín ◽  
Guillermo Vázquez ◽  
Marta Villanueva ◽  
Manuel Villa ◽  
...  

Hyperspectral imaging techniques (HSI) do not require contact with patients and are non-ionizing as well as non-invasive. As a consequence, they have been extensively applied in the medical field. HSI is being combined with machine learning (ML) processes to obtain models to assist in diagnosis. In particular, the combination of these techniques has proven to be a reliable aid in the differentiation of healthy and tumor tissue during brain tumor surgery. ML algorithms such as support vector machine (SVM), random forest (RF) and convolutional neural networks (CNN) are used to make predictions and provide in-vivo visualizations that may assist neurosurgeons in being more precise, hence reducing damages to healthy tissue. In this work, thirteen in-vivo hyperspectral images from twelve different patients with high-grade gliomas (grade III and IV) have been selected to train SVM, RF and CNN classifiers. Five different classes have been defined during the experiments: healthy tissue, tumor, venous blood vessel, arterial blood vessel and dura mater. Overall accuracy (OACC) results vary from 60% to 95% depending on the training conditions. Finally, as far as the contribution of each band to the OACC is concerned, the results obtained in this work are 3.81 times greater than those reported in the literature.


2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hyeon-Kyu Park ◽  
Jae-Hyeok Lee ◽  
Jehyun Lee ◽  
Sang-Koog Kim

AbstractThe macroscopic properties of permanent magnets and the resultant performance required for real implementations are determined by the magnets’ microscopic features. However, earlier micromagnetic simulations and experimental studies required relatively a lot of work to gain any complete and comprehensive understanding of the relationships between magnets’ macroscopic properties and their microstructures. Here, by means of supervised learning, we predict reliable values of coercivity (μ0Hc) and maximum magnetic energy product (BHmax) of granular NdFeB magnets according to their microstructural attributes (e.g. inter-grain decoupling, average grain size, and misalignment of easy axes) based on numerical datasets obtained from micromagnetic simulations. We conducted several tests of a variety of supervised machine learning (ML) models including kernel ridge regression (KRR), support vector regression (SVR), and artificial neural network (ANN) regression. The hyper-parameters of these models were optimized by a very fast simulated annealing (VFSA) algorithm with an adaptive cooling schedule. In our datasets of randomly generated 1,000 polycrystalline NdFeB cuboids with different microstructural attributes, all of the models yielded similar results in predicting both μ0Hc and BHmax. Furthermore, some outliers, which deteriorated the normality of residuals in the prediction of BHmax, were detected and further analyzed. Based on all of our results, we can conclude that our ML approach combined with micromagnetic simulations provides a robust framework for optimal design of microstructures for high-performance NdFeB magnets.


Sign in / Sign up

Export Citation Format

Share Document