scholarly journals Quantifying the information content of lake microbiomes using a machine learning-based framework

2020 ◽  
Author(s):  
Theodor Sperlea ◽  
Nico Kreuder ◽  
Daniela Beisser ◽  
Georges Hattab ◽  
Jens Boenigk ◽  
...  

Abstract Background: Bacteria and microbial eukaryotes occupy a wide range of ecological niches and are essential for the functioning of ecosystems. The advent of next-generation sequencing methods enabled the study of environmental microbial community compositions. Yet, many questions regarding the stability and functioning of environmental microbiomes remain open. Results: In the current study, we present a methodological framework to quantify the information shared between the microbial community of a habitat and the abiotic parameters of this habitat. It is built on theoretical considerations of systems ecology and makes use of state-of-the-art machine learning techniques. It can also be used to identify bioindicators. We apply the framework to a dataset containing operational taxonomic units (OTUs) as well as more than twenty physico-chemical and geographic parameters measured in a large-scale survey of European lakes. While a large part of variation (up to 61\%) in many physico-chemical parameters can be explained by microbial community composition, some of the examined parameters only share little information with the microbiome. Moreover, we have identified OTUs that act as `multi-task’ bioindicators that could be potential candidates for lake water monitoring schemes. Conclusions: This study demonstrates the benefits of machine learning approaches in microbial ecology. Our results represent, for the first time, a quantification of information shared between the lake microbiome and a wide array of ecosystem parameters. Building on the results and methodology presented here, it will be possible to identify microbial taxa and processes central for the functioning and stability of lake ecosystems.

2020 ◽  
Author(s):  
Mazin Mohammed ◽  
Karrar Hameed Abdulkareem ◽  
Mashael S. Maashi ◽  
Salama A. Mostafa A. Mostafa ◽  
Abdullah Baz ◽  
...  

BACKGROUND In most recent times, global concern has been caused by a coronavirus (COVID19), which is considered a global health threat due to its rapid spread across the globe. Machine learning (ML) is a computational method that can be used to automatically learn from experience and improve the accuracy of predictions. OBJECTIVE In this study, the use of machine learning has been applied to Coronavirus dataset of 50 X-ray images to enable the development of directions and detection modalities with risk causes.The dataset contains a wide range of samples of COVID-19 cases alongside SARS, MERS, and ARDS. The experiment was carried out using a total of 50 X-ray images, out of which 25 images were that of positive COVIDE-19 cases, while the other 25 were normal cases. METHODS An orange tool has been used for data manipulation. To be able to classify patients as carriers of Coronavirus and non-Coronavirus carriers, this tool has been employed in developing and analysing seven types of predictive models. Models such as , artificial neural network (ANN), support vector machine (SVM), linear kernel and radial basis function (RBF), k-nearest neighbour (k-NN), Decision Tree (DT), and CN2 rule inducer were used in this study.Furthermore, the standard InceptionV3 model has been used for feature extraction target. RESULTS The various machine learning techniques that have been trained on coronavirus disease 2019 (COVID-19) dataset with improved ML techniques parameters. The data set was divided into two parts, which are training and testing. The model was trained using 70% of the dataset, while the remaining 30% was used to test the model. The results show that the improved SVM achieved a F1 of 97% and an accuracy of 98%. CONCLUSIONS :. In this study, seven models have been developed to aid the detection of coronavirus. In such cases, the learning performance can be improved through knowledge transfer, whereby time-consuming data labelling efforts are not required.the evaluations of all the models are done in terms of different parameters. it can be concluded that all the models performed well, but the SVM demonstrated the best result for accuracy metric. Future work will compare classical approaches with deep learning ones and try to obtain better results. CLINICALTRIAL None


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1678 ◽  
Author(s):  
Ahmed H. Salamah ◽  
Mohamed Tamazin ◽  
Maha A. Sharkas ◽  
Mohamed Khedr ◽  
Mohamed Mahmoud

The smartphone market is rapidly spreading, coupled with several services and applications. Some of these services require the knowledge of the exact location of their handsets. The Global Positioning System (GPS) suffers from accuracy deterioration and outages in indoor environments. The Wi-Fi Fingerprinting approach has been widely used in indoor positioning systems. In this paper, Principal Component Analysis (PCA) is utilized to improve the performance and to reduce the computation complexity of the Wi-Fi indoor localization systems based on a machine learning approach. The experimental setup and performance of the proposed method were tested in real indoor environments at a large-scale environment of 960 m2 to analyze the performance of different machine learning approaches. The results show that the performance of the proposed method outperforms conventional indoor localization techniques based on machine learning techniques.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Khushnood Abbas ◽  
Alireza Abbasi ◽  
Shi Dong ◽  
Ling Niu ◽  
Laihang Yu ◽  
...  

Abstract Background Technological and research advances have produced large volumes of biomedical data. When represented as a network (graph), these data become useful for modeling entities and interactions in biological and similar complex systems. In the field of network biology and network medicine, there is a particular interest in predicting results from drug–drug, drug–disease, and protein–protein interactions to advance the speed of drug discovery. Existing data and modern computational methods allow to identify potentially beneficial and harmful interactions, and therefore, narrow drug trials ahead of actual clinical trials. Such automated data-driven investigation relies on machine learning techniques. However, traditional machine learning approaches require extensive preprocessing of the data that makes them impractical for large datasets. This study presents wide range of machine learning methods for predicting outcomes from biomedical interactions and evaluates the performance of the traditional methods with more recent network-based approaches. Results We applied a wide range of 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on three important evaluations metrics namely AUROC, AUPR, and F1-score. We achieved this by converting link prediction problem as binary classification problem. In order to achieve this we have considered the existing links as positive example and randomly sampled negative examples from non-existant set. After experimental evaluation we found that Prone, ACT and $$LRW_5$$ L R W 5 are the top 3 best performers on all five datasets. Conclusions This work presents a comparative evaluation of network-based machine learning algorithms for predicting network links, with applications in the prediction of drug-target and drug–drug interactions, and applied well known network-based machine learning methods. Our work is helpful in guiding researchers in the appropriate selection of machine learning methods for pharmaceutical tasks.


2021 ◽  
Author(s):  
Theodor Sperlea ◽  
Jan Philip Schenk ◽  
Hagen Dreßler ◽  
Daniela Beisser ◽  
Georges Hattab ◽  
...  

Microbes such as bacteria, archaea, and protists are essential for element cycling and ecosystem functioning, but many questions central to the understanding of the role of microbes in ecology are still open. Here, we analyze the relationship between lake microbiomes and the land cover surrounding the lakes. By applying machine learning methods, we quantify the covariance between land cover categories and the microbial community composition recorded in the largest amplicon sequencing dataset of European lakes available to date. We identify microbial bioindicators for these land cover categories. Combining land cover and physico-chemical bioindicators identified from the same amplicon sequencing dataset, we develop two novel similarity metrics that facilitate insights into the ecology of the lake microbiome. We show that the bioindicator network, i.e., the graph linking OTUs indicative of the same environmental parameters, corresponds to microbial co-occurrence patterns. Taken together, we demonstrate the strength of machine learning approaches to identify correlations between microbial diversity and environmental factors, potentially opening new approaches to integrate environmental molecular diversity into monitoring and water quality assessments.


2018 ◽  
Vol 16 (08) ◽  
pp. 1840009 ◽  
Author(s):  
Sebastien Piat ◽  
Nairi Usher ◽  
Simone Severini ◽  
Mark Herbster ◽  
Tommaso Mansi ◽  
...  

Computer vision has a wide range of applications from medical image analysis to robotics. Over the past few years, the field has been transformed by machine learning and stands to benefit from potential advances in quantum computing. The main challenge for processing images on current and near-term quantum devices is the size of the data such devices can process. Images can be large, multidimensional and have multiple color channels. Current machine learning approaches to computer vision that exploit quantum resources require a significant amount of manual pre-processing of the images in order to be able to fit them onto the device. This paper proposes a framework to address the problem of processing large scale data on small quantum devices. This framework does not require any dataset-specific processing or information and works on large, grayscale and RGB images. Furthermore, it is capable of scaling to larger quantum hardware architectures as they become available. In the proposed approach, a classical autoencoder is trained to compress the image data to a size that can be loaded onto a quantum device. Then, a Restricted Boltzmann Machine (RBM) is trained on the D-Wave device using the compressed data, and the weights from the RBM are then used to initialize a neural network for image classification. Results are demonstrated on two MNIST datasets and two medical imaging datasets.


2021 ◽  
Vol 14 (10) ◽  
pp. 948
Author(s):  
Jiaying You ◽  
Michael Hsing ◽  
Artem Cherkasov

Aging is considered an inevitable process that causes deleterious effects in the functioning and appearance of cells, tissues, and organs. Recent emergence of large-scale gene expression datasets and significant advances in machine learning techniques have enabled drug repurposing efforts in promoting longevity. In this work, we further developed our previous approach—DeepCOP, a quantitative chemogenomic model that predicts gene regulating effects, and extended its application across multiple cell lines presented in LINCS to predict aging gene regulating effects induced by small molecules. As a result, a quantitative chemogenomic Deep Model was trained using gene ontology labels, molecular fingerprints, and cell line descriptors to predict gene expression responses to chemical perturbations. Other state-of-the-art machine learning approaches were also evaluated as benchmarks. Among those, the deep neural network (DNN) classifier has top-ranked known drugs with beneficial effects on aging genes, and some of these drugs were previously shown to promote longevity, illustrating the potential utility of this methodology. These results further demonstrate the capability of “hybrid” chemogenomic models, incorporating quantitative descriptors from biomarkers to capture cell specific drug–gene interactions. Such models can therefore be used for discovering drugs with desired gene regulatory effects associated with longevity.


2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.


2019 ◽  
Vol 19 (1) ◽  
pp. 4-16 ◽  
Author(s):  
Qihui Wu ◽  
Hanzhong Ke ◽  
Dongli Li ◽  
Qi Wang ◽  
Jiansong Fang ◽  
...  

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sungmin O. ◽  
Rene Orth

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.


2019 ◽  
Vol 78 (5) ◽  
pp. 617-628 ◽  
Author(s):  
Erika Van Nieuwenhove ◽  
Vasiliki Lagou ◽  
Lien Van Eyck ◽  
James Dooley ◽  
Ulrich Bodenhofer ◽  
...  

ObjectivesJuvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed.MethodsHere we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches.ResultsImmune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy.ConclusionsThese results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.


Sign in / Sign up

Export Citation Format

Share Document