biological data
Recently Published Documents


TOTAL DOCUMENTS

3247
(FIVE YEARS 1086)

H-INDEX

69
(FIVE YEARS 15)

2022 ◽  
Author(s):  
Stephen Coleman ◽  
Xaquin Castro Dopico ◽  
Gunilla B Karlsson Hedestam ◽  
Paul DW Kirk ◽  
Chris Wallace

Systematic differences between batches of samples present significant challenges when analysing biological data. Such batch effects are well-studied and are liable to occur in any setting where multiple batches are assayed. Many existing methods for accounting for these have focused on high-dimensional data such as RNA-seq and have assumptions that reflect this. Here we focus on batch-correction in low-dimensional classification problems. We propose a semi-supervised Bayesian generative classifier based on mixture models that jointly predicts class labels and models batch effects. Our model allows observations to be probabilistically assigned to classes in a way that incorporates uncertainty arising from batch effects. We explore two choices for the within-class densities: the multivariate normal and the multivariate t. A simulation study demonstrates that our method performs well compared to popular off-the-shelf machine learning methods and is also quick; performing 15,000 iterations on a dataset of 500 samples with 2 measurements each in 7.3 seconds for the MVN mixture model and 11.9 seconds for the MVT mixture model. We apply our model to two datasets generated using the enzyme-linked immunosorbent assay (ELISA), a spectrophotometric assay often used to screen for antibodies. The examples we consider were collected in 2020 and measure seropositivity for SARS-CoV-2. We use our model to estimate seroprevalence in the populations studied. We implement the models in C++ using a Metropolis-within-Gibbs algorithm; this is available in the R package at https://github.com/stcolema/BatchMixtureModel. Scripts to recreate our analysis are at https://github.com/stcolema/BatchClassifierPaper.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.


Author(s):  
Nikolay L Kazanskiy ◽  
Muhammad A Butt ◽  
Svetlana N Khonina

Currently, old-style personal medicare techniques rely mostly on traditional methods, such as cumbersome tools and complicated processes, which can be time-consuming and inconvenient in some circumstances. Furthermore, such old methods need the use of heavy equipment, blood draws, and traditional bench-top testing procedures. Invasive ways of acquiring test samples can potentially cause patients discomfort and anguish. Wearable sensors, on the other hand, may be attached to numerous body areas to capture diverse biochemical and physiological characteristics as a developing analytical tool. Physical, chemical, and biological data transferred via the skin is used to monitor health in various circumstances. Wearable sensors can assess the aberrant conditions of the physical or chemical components of the human body in real-time, exposing the body state in time, thanks to unintrusive sampling and high accuracy. Most commercially available wearable gadgets are mechanically hard components attached to bands and worn on the wrist, with form factors ultimately constrained by the size and weight of the batteries required for the power supply. Wearable gadgets with “skin-like” qualities are a new type of automation that is only starting to make its way out of research labs and into pre-commercial prototypes. In this paper, we studied the recent advancement in battery-powered wearable sensors established on optical phenomena and skin-like battery-free sensors which brings a breakthrough in wearable sensing automation.


Proteomes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 2
Author(s):  
Aarón Millán-Oropeza ◽  
Mélisande Blein-Nicolas ◽  
Véronique Monnet ◽  
Michel Zivy ◽  
Céline Henry

In proteomics, it is essential to quantify proteins in absolute terms if we wish to compare results among studies and integrate high-throughput biological data into genome-scale metabolic models. While labeling target peptides with stable isotopes allow protein abundance to be accurately quantified, the utility of this technique is constrained by the low number of quantifiable proteins that it yields. Recently, label-free shotgun proteomics has become the “gold standard” for carrying out global assessments of biological samples containing thousands of proteins. However, this tool must be further improved if we wish to accurately quantify absolute levels of proteins. Here, we used different label-free quantification techniques to estimate absolute protein abundance in the model yeast Saccharomyces cerevisiae. More specifically, we evaluated the performance of seven different quantification methods, based either on spectral counting (SC) or extracted-ion chromatogram (XIC), which were applied to samples from five different proteome backgrounds. We also compared the accuracy and reproducibility of two strategies for transforming relative abundance into absolute abundance: a UPS2-based strategy and the total protein approach (TPA). This study mentions technical challenges related to UPS2 use and proposes ways of addressing them, including utilizing a smaller, more highly optimized amount of UPS2. Overall, three SC-based methods (PAI, SAF, and NSAF) yielded the best results because they struck a good balance between experimental performance and protein quantification.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Mariam Laatifi ◽  
Samira Douzi ◽  
Abdelaziz Bouklouz ◽  
Hind Ezzine ◽  
Jaafar Jaafari ◽  
...  

AbstractThe purpose of this study is to develop and test machine learning-based models for COVID-19 severity prediction. COVID-19 test samples from 337 COVID-19 positive patients at Cheikh Zaid Hospital were grouped according to the severity of their illness. Ours is the first study to estimate illness severity by combining biological and non-biological data from patients with COVID-19. Moreover the use of ML for therapeutic purposes in Morocco is currently restricted, and ours is the first study to investigate the severity of COVID-19. When data analysis approaches were used to uncover patterns and essential characteristics in the data, C-reactive protein, platelets, and D-dimers were determined to be the most associated to COVID-19 severity prediction. In this research, many data reduction algorithms were used, and Machine Learning models were trained to predict the severity of sickness using patient data. A new feature engineering method based on topological data analysis called Uniform Manifold Approximation and Projection (UMAP) shown that it achieves better results. It has 100% accuracy, specificity, sensitivity, and ROC curve in conducting a prognostic prediction using different machine learning classifiers such as X_GBoost, AdaBoost, Random Forest, and ExtraTrees. The proposed approach aims to assist hospitals and medical facilities in determining who should be seen first and who has a higher priority for admission to the hospital.


2022 ◽  
Vol 12 ◽  
Author(s):  
Kejie Li ◽  
Jessica Hurt ◽  
Christopher D. Whelan ◽  
Ravi Challa ◽  
Dongdong Lin ◽  
...  

Many fit-for-purpose bioinformatics tools generate plots to interpret complex biological data and illustrate findings. However, assembling individual plots in different formats from various sources into one high-resolution figure in the desired layout requires mastery of commercial tools or even programming skills. In addition, it is a time-consuming and sometimes frustrating process even for a computationally savvy scientist who frequently takes a trial-and-error iterative approach to get satisfactory results. To address the challenge, we developed bioInfograph, a web-based tool that allows users to interactively arrange high-resolution images in diversified formats, mainly Scalable Vector Graphics (SVG), to produce one multi-panel publication-quality composite figure in both PDF and HTML formats in a user-friendly manner, requiring no programming skills. It solves stylesheet conflicts of coexisting SVG plots, integrates a rich-text editor, and allows creative design by providing advanced functionalities like image transparency, controlled vertical stacking of plots, versatile image formats, and layout templates. To highlight, the sharable interactive HTML output with zoom-in function is a unique feature not seen in any other similar tools. In the end, we make the online tool publicly available at https://baohongz.github.io/bioInfograph while releasing the source code at https://github.com/baohongz/bioInfograph under MIT open-source license.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Nathan P. Gill ◽  
Raji Balasubramanian ◽  
James R. Bain ◽  
Michael J. Muehlbauer ◽  
William L. Lowe ◽  
...  

Abstract Background  Construction of networks from cross-sectional biological data is increasingly common. Many recent methods have been based on Gaussian graphical modeling, and prioritize estimation of conditional pairwise dependencies among nodes in the network. However, challenges remain on how specific paths through the resultant network contribute to overall ‘network-level’ correlations. For biological applications, understanding these relationships is particularly relevant for parsing structural information contained in complex subnetworks. Results We propose the pair-path subscore (PPS), a method for interpreting Gaussian graphical models at the level of individual network paths. The scoring is based on the relative importance of such paths in determining the Pearson correlation between their terminal nodes. PPS is validated using human metabolomics data from the Hyperglycemia and adverse pregnancy outcome (HAPO) study, with observations confirming well-documented biological relationships among the metabolites. We also highlight how the PPS can be used in an exploratory fashion to generate new biological hypotheses. Our method is implemented in the R package , available at https://github.com/nathan-gill/pps. Conclusions The PPS can be used to probe network structure on a finer scale by investigating which paths in a potentially intricate topology contribute most substantially to marginal behavior. Adding PPS to the network analysis toolkit may enable researchers to ask new questions about the relationships among nodes in network data.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Hanjing Jiang ◽  
Yabing Huang

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.


Nutrients ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 204
Author(s):  
Teofana Otilia Bizerea-Moga ◽  
Laura Pitulice ◽  
Cristina Loredana Pantea ◽  
Orsolya Olah ◽  
Otilia Marginean ◽  
...  

Small and large birth weights (BWs) for gestational age (GA) represent extremes, but the correlation between extreme BW and metabolic syndrome (MetS) has not been fully elucidated. In this study, we examined this correlation in obese children based on changes in their metabolic profile from childhood to adolescence. A retrospective observational study was performed on 535 obese patients aged 0–18 years in the Clinical and Emergency Hospital for Children “Louis Turcanu” in Timisoara, Romania, based on clinical and biological data from January 2015 to December 2019. We emphasized the links between extreme BW and obesity, extreme BW and cardiometabolic risk, obesity and cardiometabolic risk, and extreme BW, obesity and MetS. Children born large for gestational age (LGA) predominated over those born small for gestational age (SGA). Our findings showed that BW has an independent effect on triglycerides and insulin resistance, whereas obesity had a direct influence on hypertension, impaired glucose metabolism and hypertriglyceridemia. The influences of BW and obesity on the development of MetS and its components are difficult to separate; therefore, large prospective studies in normal-weight patients are needed.


Sign in / Sign up

Export Citation Format

Share Document