scholarly journals Unsupervised Learning for Hydrogen Breath Tests

Author(s):  
Michael Netzer ◽  
Friedrich Hanser ◽  
Maximilian Ledochowski ◽  
Daniel Baumgarten

Hydrogen breath tests are a well-established method to help diagnose functional intestinal disorders such as carbohydrate malabsorption or small intestinal bacterial overgrowth. In this work we apply unsupervised machine learning techniques to analyze hydrogen breath test datasets. We propose a method that uses 26 internal cluster validation measures to determine a suitable number of clusters. In an induced external validation step we use a predefined categorization proposed by a medical expert. The results indicate that the majority of the considered internal validation indexes was not able to produce a reasonable clustering. Considering a predefined categorization performed by a medical expert, a novel shape-based method obtained the highest external validation measure in terms of adjusted rand index. The predefined clusterings constitute the basis of a supervised machine learning step that is part of our ongoing research.

2021 ◽  
Vol 10 ◽  
Author(s):  
Zhizhen Li ◽  
Lei Yuan ◽  
Chen Zhang ◽  
Jiaxing Sun ◽  
Zeyuan Wang ◽  
...  

Background and ObjectivesCurrently, the prognostic performance of the staging systems proposed by the 8th edition of the American Joint Committee on Cancer (AJCC 8th) and the Liver Cancer Study Group of Japan (LCSGJ) in resectable intrahepatic cholangiocarcinoma (ICC) remains controversial. The aim of this study was to use machine learning techniques to modify existing ICC staging strategies based on clinical data and to demonstrate the accuracy and discrimination capacity in prognostic prediction.Patients and MethodsThis is a retrospective study based on 1,390 patients who underwent surgical resection for ICC at Eastern Hepatobiliary Surgery Hospital from 2007 to 2015. External validation was performed for patients from 2015 to 2017. The ensemble of three machine learning algorithms was used to select the most important prognostic factors and stepwise Cox regression was employed to derive a modified scoring system. The discriminative ability and predictive accuracy were assessed using the Concordance Index (C-index) and Brier Score (BS). The results were externally validated through a cohort of 42 patients operated on from the same institution.ResultsSix independent prognosis factors were selected and incorporated in the modified scoring system, including carcinoembryonic antigen, carbohydrate antigen 19-9, alpha-fetoprotein, prealbumin, T and N of ICC staging category in 8th edition of AJCC. The proposed scoring system showed a more favorable discriminatory ability and model performance than the AJCC 8th and LCSGJ staging systems, with a higher C-index of 0.693 (95% CI, 0.663–0.723) in the internal validation cohort and 0.671 (95% CI, 0.602–0.740) in the external validation cohort, which was then confirmed with lower BS (0.103 in internal validation cohort and 0.169 in external validation cohort). Meanwhile, machine learning techniques for variable selection together with stepwise Cox regression for survival analysis shows a better prognostic accuracy than using stepwise Cox regression method only.ConclusionsThis study put forward a modified ICC scoring system based on prognosis factors selection incorporated with machine learning, for individualized prognosis evaluation in patients with ICC.


Author(s):  
Jacopo Burrello ◽  
Martina Amongero ◽  
Fabrizio Buffolo ◽  
Elisa Sconfienza ◽  
Vittorio Forestiero ◽  
...  

Abstract Context The diagnostic work-up of primary aldosteronism (PA) includes screening and confirmation steps. Case confirmation is time-consuming, expensive, and there is no consensus on tests and thresholds to be used. Diagnostic algorithms to avoid confirmatory testing may be useful for the management of patients with PA. Objective Development and validation of diagnostic models to confirm or exclude PA diagnosis in patients with a positive screening test. Design, Patients and Setting We evaluated 1,024 patients who underwent confirmatory testing for PA. The diagnostic models were developed in a training cohort (n=522), and then tested on an internal validation cohort (n=174) and on an independent external prospective cohort (n=328). Main outcome measure Different diagnostic models and a 16-point score were developed by machine learning and regression analysis to discriminate patients with a confirmed diagnosis of PA. Results Male sex, antihypertensive medication, plasma renin activity, aldosterone, potassium levels and presence of organ damage were associated with a confirmed diagnosis of PA. Machine learning based models displayed an accuracy of 72.9-83.9%. The Primary Aldosteronism Confirmatory Testing (PACT) score correctly classified 84.1% at training and 83.9% or 81.1% at internal and external validation, respectively. A flow chart employing the PACT score to select patients for confirmatory testing, correctly managed all patients, and resulted in a 22.8% reduction in the number of confirmatory tests. Conclusions The integration of diagnostic modelling algorithms in clinical practice may improve the management of patients with PA by circumventing unnecessary confirmatory testing.


Materials ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 58
Author(s):  
Mohsin Ali Khan ◽  
Furqan Farooq ◽  
Mohammad Faisal Javed ◽  
Adeel Zafar ◽  
Krzysztof Adam Ostrowski ◽  
...  

To avoid time-consuming, costly, and laborious experimental tests that require skilled personnel, an effort has been made to formulate the depth of wear of fly-ash concrete using a comparative study of machine learning techniques, namely random forest regression (RFR) and gene expression programming (GEP). A widespread database comprising 216 experimental records was constructed from available research. The database includes depth of wear as a response parameter and nine different explanatory variables, i.e., cement content, fly ash, water content, fine and coarse aggregate, plasticizer, air-entraining agent, age of concrete, and time of testing. The performance of the models was judged via statistical metrics. The GEP model gives better performance with R2 and ρ equals 0.9667 and 0.0501 respectively and meet with the external validation criterion suggested in the previous literature. The k-fold cross-validation also verifies the accurateness of the model by evaluating R2, RSE, MAE, and RMSE. The sensitivity analysis of GEP equation indicated that the time of testing is the influential parameter. The results of this research can help the designers, practitioners, and researchers to quickly estimate the depth of wear of fly-ash concrete thus shortening its ecological susceptibilities that push to sustainable and faster construction from the viewpoint of environmentally friendly waste management.


2022 ◽  
Vol 8 ◽  
Author(s):  
Jinzhang Li ◽  
Ming Gong ◽  
Yashutosh Joshi ◽  
Lizhong Sun ◽  
Lianjun Huang ◽  
...  

BackgroundAcute renal failure (ARF) is the most common major complication following cardiac surgery for acute aortic syndrome (AAS) and worsens the postoperative prognosis. Our aim was to establish a machine learning prediction model for ARF occurrence in AAS patients.MethodsWe included AAS patient data from nine medical centers (n = 1,637) and analyzed the incidence of ARF and the risk factors for postoperative ARF. We used data from six medical centers to compare the performance of four machine learning models and performed internal validation to identify AAS patients who developed postoperative ARF. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to compare the performance of the predictive models. We compared the performance of the optimal machine learning prediction model with that of traditional prediction models. Data from three medical centers were used for external validation.ResultsThe eXtreme Gradient Boosting (XGBoost) algorithm performed best in the internal validation process (AUC = 0.82), which was better than both the logistic regression (LR) prediction model (AUC = 0.77, p < 0.001) and the traditional scoring systems. Upon external validation, the XGBoost prediction model (AUC =0.81) also performed better than both the LR prediction model (AUC = 0.75, p = 0.03) and the traditional scoring systems. We created an online application based on the XGBoost prediction model.ConclusionsWe have developed a machine learning model that has better predictive performance than traditional LR prediction models as well as other existing risk scoring systems for postoperative ARF. This model can be utilized to provide early warnings when high-risk patients are found, enabling clinicians to take prompt measures.


2021 ◽  
Vol 8 ◽  
Author(s):  
Ming-Hui Hung ◽  
Ling-Chieh Shih ◽  
Yu-Ching Wang ◽  
Hsin-Bang Leu ◽  
Po-Hsun Huang ◽  
...  

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit.Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN).Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively).Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.


2020 ◽  
Author(s):  
LEILA F. DANTAS ◽  
IGOR T. PERES ◽  
LEONARDO S.L. BASTOS ◽  
JANAINA F. MARCHESI ◽  
GUILHERME F.G. DE SOUZA ◽  
...  

Background: Tests are scarce resources, especially in low and middle-income countries, and the optimization of testing programs during a pandemic is critical for the effectiveness of the disease control. Hence, we aim to use the combination of symptoms to build a regression model as a screening tool to identify people and areas with a higher risk of SARS-CoV-2 infection to be prioritized for testing. Materials and Methods: We applied machine learning techniques and provided a visualization of potential regions with high densities of COVID-19 as a risk map. We performed a retrospective analysis of individuals registered in "Dados do Bem", an app-based symptom tracker in use in Brazil. Results: From April 28 to July 16, 2020, 337,435 individuals registered their symptoms through the app. Of these, 49,721 participants were tested for SARS-CoV-2 infection, being 5,888 (11.8%) positive. Among self-reported symptoms, loss of smell (OR[95%CI]: 4.6 [4.4 - 4.9]), fever (2.6 [2.5 - 2.8]), and shortness of breath (2.1 [1.6-2.7]) were associated with SARS-CoV-2 infection. Our final model obtained a competitive performance, with only 7% of false-negative users among the predicted as negatives (NPV = 0.93). From the 287,714 users still not tested, our model estimated that only 34.5% are potentially infected, thus reducing the need for extensive testing of all registered users. The model was incorporated by the "Dados do Bem" app aiming to prioritize users for testing. We developed an external validation in the state of Goias and found that of the 465 users selected, 52% tested positive. Conclusions: Our results showed that the combination of symptoms might predict SARS-Cov-2 infection and, therefore, can be used as a tool by decision-makers to refine testing and disease control strategies.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1782
Author(s):  
Aurelio López-Fernández ◽  
Domingo S. Rodríguez-Baena ◽  
Francisco Gómez-Vela

Nowadays, Biclustering is one of the most widely used machine learning techniques to discover local patterns in datasets from different areas such as energy consumption, marketing, social networks or bioinformatics, among them. Particularly in bioinformatics, Biclustering techniques have become extremely time-consuming, also being huge the number of results generated, due to the continuous increase in the size of the databases over the last few years. For this reason, validation techniques must be adapted to this new environment in order to help researchers focus their efforts on a specific subset of results in an efficient, fast and reliable way. The aforementioned situation may well be considered as Big Data context. In this sense, multiple machine learning techniques have been implemented by the application of Graphic Processing Units (GPU) technology and CUDA architecture to accelerate the processing of large databases. However, as far as we know, this technology has not yet been applied to any bicluster validation technique. In this work, a multi-GPU version of one of the most used bicluster validation measure, Mean Squared Residue (MSR), is presented. It takes advantage of all the hardware and memory resources offered by GPU devices. Because of to this, gMSR is able to validate a massive number of biclusters in any Biclustering-based study within a Big Data context.


Author(s):  
Karel Diéguez-Santana ◽  
Gerardo M. Casañola-Martin ◽  
James R. Green ◽  
Bakhtiyor Rasulev ◽  
Humberto González-Díaz

Background: Checking the connectivity (structure) of complex Metabolic Reaction Networks (MRNs) models proposed for new microorganisms with promising properties is an important goal for chemical biology. Objective: In principle, we can perform a hand-on checking (Manual Curation). However, this is a hard task due to the high number of combinations of pairs of nodes (possible metabolic reactions). Method: In this work, we used Combinatorial, Perturbation Theory, and Machine Learning, techniques to seek a CPTML model for MRNs >40 organisms compiled by Barabasis’ group. First, we quantified the local structure of a very large set of nodes in each MRN using a new class of node index called Markov linear indices fk. Next, we calculated CPT operators for 150000 combinations of query and reference nodes of MRNs. Last, we used these CPT operators as inputs of different ML algorithms. Results: The CPTML linear model obtained using LDA algorithm is able to discriminate nodes (metabolites) with correct assignation of reactions from not correct nodes with values of accuracy, specificity, and sensitivity in the range of 85-100% in both training and external validation data series. Conclusion: Meanwhile, PTML models based on Bayesian network, J48-Decision Tree and Random Forest algorithms were identified as the three best non-linear models with accuracy greater than 97.5%. The present work opens a door to the study of MRNs of multiple organisms using PTML models.


2021 ◽  
Vol 8 (6) ◽  
pp. 65
Author(s):  
Marco Mamprin ◽  
Ricardo R. Lopes ◽  
Jo M. Zelis ◽  
Pim A. L. Tonino ◽  
Martijn S. van Mourik ◽  
...  

Current prognostic risk scores for transcatheter aortic valve implantation (TAVI) do not benefit yet from modern machine learning techniques, which can improve risk stratification of one-year mortality of patients before TAVI. Despite the advancement of machine learning in healthcare, data sharing regulations are very strict and typically prevent exchanging patient data, without the involvement of ethical committees. A very robust validation approach, including 1300 and 631 patients per center, was performed to validate a machine learning model of one center at the other external center with their data, in a mutual fashion. This was achieved without any data exchange but solely by exchanging the models and the data processing pipelines. A dedicated exchange protocol was designed to evaluate and quantify the model’s robustness on the data of the external center. Models developed with the larger dataset offered similar or higher prediction accuracy on the external validation. Logistic regression, random forest and CatBoost lead to areas under curve of the ROC of 0.65, 0.67 and 0.65 for the internal validation and of 0.62, 0.66, 0.68 for the external validation, respectively. We propose a scalable exchange protocol which can be further extended on other TAVI centers, but more generally to any other clinical scenario, that could benefit from this validation approach.


2020 ◽  
Author(s):  
Chang Seok Bang ◽  
Ji Yong Ahn ◽  
Jie-Hyun Kim ◽  
Young-Il Kim ◽  
Il Ju Choi ◽  
...  

BACKGROUND Undifferentiated type of early gastric cancer (U-EGC) is included among the expanded indications of endoscopic submucosal dissection (ESD); however, the rate of curative resection remains unsatisfactory. Endoscopists predict the probability of curative resection by considering the size and shape of the lesion and whether ulcers are present or not. The location of the lesion, indicating the likely technical difficulty, is also considered. OBJECTIVE The aim of this study was to establish machine learning (ML) models to better predict the possibility of curative resection in U-EGC prior to ESD. METHODS A nationwide cohort of 2703 U-EGCs treated by ESD or surgery were adopted for the training and internal validation cohorts. Separately, an independent data set of the Korean ESD registry (n=275) and an Asan medical center data set (n=127) treated by ESD were chosen for external validation. Eighteen ML classifiers were selected to establish prediction models of curative resection with the following variables: age; sex; location, size, and shape of the lesion; and whether ulcers were present or not. RESULTS Among the 18 models, the extreme gradient boosting classifier showed the best performance (internal validation accuracy 93.4%, 95% CI 90.4%-96.4%; precision 92.6%, 95% CI 89.5%-95.7%; recall 99.0%, 95% CI 97.8%-99.9%; and F1 score 95.7%, 95% CI 93.3%-98.1%). Attempts at external validation showed substantial accuracy (first external validation 81.5%, 95% CI 76.9%-86.1% and second external validation 89.8%, 95% CI 84.5%-95.1%). Lesion size was the most important feature in each explainable artificial intelligence analysis. CONCLUSIONS We established an ML model capable of accurately predicting the curative resection of U-EGC before ESD by considering the morphological and ecological characteristics of the lesions.


Sign in / Sign up

Export Citation Format

Share Document