scholarly journals Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning

2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Ziyuan Jiang ◽  
Jiajin Li ◽  
Nahyun Kong ◽  
Jeong-Hyun Kim ◽  
Bong-Soo Kim ◽  
...  

AbstractAtopic dermatitis (AD) is a common skin disease in childhood whose diagnosis requires expertise in dermatology. Recent studies have indicated that host genes–microbial interactions in the gut contribute to human diseases including AD. We sought to develop an accurate and automated pipeline for AD diagnosis based on transcriptome and microbiota data. Using these data of 161 subjects including AD patients and healthy controls, we trained a machine learning classifier to predict the risk of AD. We found that the classifier could accurately differentiate subjects with AD and healthy individuals based on the omics data with an average F1-score of 0.84. With this classifier, we also identified a set of 35 genes and 50 microbiota features that are predictive for AD. Among the selected features, we discovered at least three genes and three microorganisms directly or indirectly associated with AD. Although further replications in other cohorts are needed, our findings suggest that these genes and microbiota features may provide novel biological insights and may be developed into useful biomarkers of AD prediction.

2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

2021 ◽  
Vol 13 (2) ◽  
pp. 971
Author(s):  
Papiya Debnath ◽  
Pankaj Chittora ◽  
Tulika Chakrabarti ◽  
Prasun Chakrabarti ◽  
Zbigniew Leonowicz ◽  
...  

Earthquakes are one of the most overwhelming types of natural hazards. As a result, successfully handling the situation they create is crucial. Due to earthquakes, many lives can be lost, alongside devastating impacts to the economy. The ability to forecast earthquakes is one of the biggest issues in geoscience. Machine learning technology can play a vital role in the field of geoscience for forecasting earthquakes. We aim to develop a method for forecasting the magnitude range of earthquakes using machine learning classifier algorithms. Three different ranges have been categorized: fatal earthquake; moderate earthquake; and mild earthquake. In order to distinguish between these classifications, seven different machine learning classifier algorithms have been used for building the model. To train the model, six different datasets of India and regions nearby to India have been used. The Bayes Net, Random Tree, Simple Logistic, Random Forest, Logistic Model Tree (LMT), ZeroR and Logistic Regression algorithms have been applied to each dataset. All of the models have been developed using the Weka tool and the results have been noted. It was observed that Simple Logistic and LMT classifiers performed well in each case.


2020 ◽  
Author(s):  
Ahmed M. Moustafa ◽  
Paul J. Planet

AbstractBackgroundDiscrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events.MethodsWe developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20th 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states.ResultsGNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed.ConclusionsOur classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots.


Nutrients ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 2695 ◽  
Author(s):  
Katja Bezek ◽  
Ana Petelin ◽  
Jure Pražnikar ◽  
Esther Nova ◽  
Noemi Redondo ◽  
...  

The dynamics and diversity of human gut microbiota that can remarkably influence the wellbeing and health of the host are constantly changing through the host’s lifetime in response to various factors. The aim of the present study was to determine a set of parameters that could have a major impact on classifying subjects into a single cluster regarding gut bacteria composition. Therefore, a set of demographical, environmental, and clinical data of healthy adults aged 25–50 years (117 female and 83 men) was collected. Fecal microbiota composition was characterized using Illumina MiSeq 16S rRNA gene amplicon sequencing. Hierarchical clustering was performed to analyze the microbiota data set, and a supervised machine learning model (SVM; Support Vector Machines) was applied for classification. Seventy variables from collected data were included in machine learning analysis. The agglomerative clustering algorithm suggested the presence of four distinct community types of most abundant bacterial phyla. Each cluster harbored a statistically significant different proportion of bacterial phyla. Regarding prediction, the most important features classifying subjects into clusters were measures of obesity (waist to hip ratio, BMI, and visceral fat index), total body water, blood pressure, energy intake, total fat, olive oil intake, total fiber intake, and water intake. In conclusion, the SVM model was shown as a valuable tool to classify healthy individuals based on their gut microbiota composition.


Author(s):  
Wataru Takabatake ◽  
Kohei Yamamoto ◽  
Kentaroh Toyoda ◽  
Tomoaki Ohtsuki ◽  
Yohei Shibata ◽  
...  

2021 ◽  
Author(s):  
Zara Yu

The novel coronavirus disease 2019 (COVID-19) has created a serious threat to global health. We developed a new quantum machine learning (QML) assisted diagnostic method that can provide an accurate diagnosis to aid decision processes of medical providers. One of the key elements in our method was to implement the quantum variational method to efficiently classify data, taking crucial multiple correlations among the features into account. We established and fine-tuned this quantum classifier by using a group of data drawn from publicly available COVID-19 cases. We have shown that QML is capable of processing patient information efficiently and accurately for the diagnosis of COVID-19.


Author(s):  
Vaibhavraj Nath Chauhan ◽  
Sanjana Purbia ◽  
Pankaj Chittora ◽  
Prasun Chakrabarti ◽  
Sandeep Poddar

Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 13-14
Author(s):  
Shoshana Revel-Vilk ◽  
Gabriel Chodick ◽  
Varda Shalev ◽  
Noga Gadir

Background: Gaucher disease (GD) is a rare, autosomal recessive condition, characterized by deficiency of the lysosomal enzyme β-glucocerebrosidase. The main disease features are anemia, thrombocytopenia, hepato-splenomegaly and bone infarction, osteonecrosis, and pathological fractures. However, diagnosis of GD can be challenging, especially for non-specialists, owing to wide variability in age at presentation, non-specific features, severity and type of clinical manifestations, and lack of awareness of the early signs and symptoms of the disease. Delayed and misdiagnosis of GD may lead to irreversible bone disease, severe growth retardation, and high risk of bleeding; in rare cases, misdiagnosis may be life-threatening. Developing a system for early and accurate diagnosis of GD is thus an essential unmet need. The development of an algorithm for early diagnosis of patients with rare diseases such as GD may help reduce delays in diagnosis and enable prompt, appropriate initiation of therapy, earlier decision-making, prevent potentially irreversible morbidities and unnecessary tests (some invasive), reduce anxiety, and facilitate genetic counseling. This study aims to develop a predictive model for the accurate diagnosis of GD using machine learning based on real-world clinical data. Methods: This study will be comprised of three parts. Part 1, a retrospective observational database analysis, will use data from the electronic patient database of the Maccabi Healthcare Service (MHS), the second largest Health Maintenance Organization in Israel. The MHS includes 2.2 million health records from 25% of the Israeli population. Clinical records have been fully computerized for >20 years and are fully integrated with automated central laboratory, digitized imaging and pharmacy purchase data. Patients with confirmed GD who have been enrolled in the MHS health plan for ≥1 year will be eligible for inclusion, with approximately 250 patients with GD expected to be enrolled. Using MHS data from patients with GD, the Gaucher Earlier Diagnosis Consensus (GED-C) scoring system, developed by a consensus panel using Delphi methodology on the signs and co-variables that may be important for the diagnosis of GD, will be evaluated and compared with alternative scores developed directly from clinical data based on supervised machine learning. In Part 2, a clinical study, the best performing modeled scores from Part 1 will be applied to the MHS database to identify individuals who may have undiagnosed GD ('GD suspects'). Samples for diagnostic testing (using a specific and sensitive biomarker (glucosylsphingosine, lyso-Gb1) followed by beta-glucocerebrosidase (GBA) genotyping for positive samples) will be collected from MHS biobank (for individuals who have consented). Individuals not participating in the biobank will be asked to provide a sample. This part of the study will evaluate the predictive value of the modeled scores, and assess the sensitivity and specificity of the model for the diagnosis of new patients with GD. In Part 3, analysis of data from newly diagnosed patients identified in Part 2 will be used to develop machine learning models for the diagnosis of GD (Figure 1). Signs and co-variables included in the GED-C score will be used, eliminating features that are non-informative. Features will be quantitative where possible, and interaction terms will be added for age of onset and trend for key features. A number of methods will be developed, with the best performing, based on its precision at a given sensitivity level, being selected as the final model. External validation of the best identified model is planned, to ensure unbiased estimate of the model's accuracy. Discussion: The main goal of the study is to develop an algorithm to help detect patients with GD, independent of physicians' ability to recognize signs and symptoms, using the application of machine learning to data from a large health database. The study is expected to result in a practical tool that will alert physicians to the possibility of GD. The resulting model will also improve our understanding of GD based on the relative importance of features for GD prediction. Such tools will have a positive impact on patient care and quality of life and on healthcare costs and may lead to a change in approach for diagnosing rare diseases. Disclosures Revel-Vilk: Takeda: Honoraria; sanofi-Genzyme: Honoraria; Pfizer: Honoraria. Chodick:Novartis Pharma AG: Other: Institutional grant. Gadir:Takeda: Current Employment.


Sign in / Sign up

Export Citation Format

Share Document