human dataset
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 11)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pierre Morisse ◽  
Camille Marchet ◽  
Antoine Limasset ◽  
Thierry Lecroq ◽  
Arnaud Lefebvre

AbstractThird-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT.


2020 ◽  
Author(s):  
Rogier L.C. Plas ◽  
Guido Hooiveld ◽  
Renger F. Witkamp ◽  
Klaske van Norren

Abstract BackgroundCancer cachexia is a complex and multi-factorial syndrome. As currently available therapeutic options are limited, more in-depth knowledge on cachexia pathophysiology and the underlying molecular mechanisms remains warranted. Studies with animal models provide useful insights but they only mimic the human situation to a certain degree. Furthermore, there is heterogeneity in the design of published animal studies and outcomes. To further address this issue, we performed a comparative study analysing muscle whole genome gene expression of different cachexia studies in mice and human.MethodsWe selected data sets from the NCBI Gene Expression Omnibus database containing muscle gene expression data measured by micro-array or RNA-sequencing, at least comprising a cachectic/tumour bearing group (n>3) and a non-cachectic/control group (n>3). This provided 12 datasets; 9 from mouse models and 3 human datasets. All datasets were quality checked, normalised and annotated. Datasets were merged and compared at different levels. General similarity and differences in gene expression were determined using ordered list analysis and principal component analysis (PCA). Moreover, similarities and differences at pathway level were studied by applying gene set enrichment analysis (GSEA) of KEGG pathways.ResultsAnimal models displayed similarities to each other and to human datasets at different levels and with different processes. At the gene level, a similarity analysis indicated little similarity between the animal models and the human datasets, while animal models showed high similarity. Only one of the C26 mice models (GSE121972) showed significant similarity to more than one human dataset. Moreover, one human dataset comparing cachectic and non-cachectic cancer patients showed no similarity to any of the other datasets. PCA results indicated that a xenograft model showed most different expression from the other datasets and the Lewis lung carcinoma model to be least different from the human datasets. GSEA results showed four pathways clearly standing out across experiments with downregulation of oxidative phosphorylation and thermogenesis pathway, and upregulation of the proteasome and RNA transport pathway. However, these pathways were not consistently changed in the human datasets.ConclusionsOur comparative analysis showed that there is currently no basis to define a preferred animal model for human cachexia. More human datasets containing proper controls are needed. Repetition of the current analysis upon publication of additional human datasets is warranted.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Wenzheng Ma ◽  
Yi Cao ◽  
Wenzheng Bao ◽  
Bin Yang ◽  
Yuehui Chen

The interactions between proteins play important roles in several organisms, and such issue can be involved in almost all activities in the cell. The research of protein-protein interactions (PPIs) can make a huge contribution to the prevention and treatment of diseases. Currently, many prediction methods based on machine learning have been proposed to predict PPIs. In this article, we propose a novel method ACT-SVM that can effectively predict PPIs. The ACT-SVM model maps protein sequences to digital features, performs feature extraction twice on the protein sequence to obtain vector A and descriptor CT, and combines them into a vector. Then, the feature vectors of the protein pair are merged as the input of the support vector machine (SVM) classifier. We utilize nonredundant H. pylori and human dataset to verify the prediction performance of our method. Finally, the proposed method has a prediction accuracy of 0.727897 for H. pylori data and a prediction accuracy of 0.838799 for human dataset. The results demonstrate that this method can be called a stable and reliable prediction model of PPIs.


2019 ◽  
Vol 8 (12) ◽  
pp. 2080
Author(s):  
Muhammad I. Achakzai ◽  
Christos Argyropoulos ◽  
Maria-Eleni Roumelioti

In this study, we introduce a novel framework for the estimation of residual renal function (RRF), based on the population compartmental kinetic behavior of beta 2 microglobulin (B2M) and its dialytic removal. Using this model, we simulated a large cohort of patients with various levels of RRF receiving either conventional high-flux hemodialysis or on-line hemodiafiltration. These simulations were used to estimate a novel population kinetic (PK) equation for RRF (PK-RRF) that was validated in an external public dataset of real patients. We assessed the performance of the resulting equation(s) against their ability to estimate urea clearance using cross-validation. Our equations were derived entirely from computer simulations and advanced statistical modeling and had extremely high discrimination (Area Under the Curve, AUC 0.888–0.909) when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized predialysis and postdialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appeared to have higher clinical usefulness as assessed by Decision Curve Analysis, potentially supporting decisions for individualizing dialysis prescriptions in patients with preserved RRF.


Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1139 ◽  
Author(s):  
Francisco Gómez-Vela ◽  
Fernando M. Delgado-Chaves ◽  
Domingo S. Rodríguez-Baena ◽  
Miguel García-Torres ◽  
Federico Divina

Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization.


2019 ◽  
Author(s):  
Mohammad I Achakzai ◽  
Christos Argyropoulos ◽  
Maria-Eleni Roumelioti

AbstractIn this study, we introduce a novel framework for the estimation of residual renal function (RRF), based on the population compartmental kinetic behavior of Beta 2 Microglobulin (B2M) and its dialytic removal. Using this model, we simulated a large cohort of patients with various levels of RRF receiving either conventional high-flux hemodialysis or on-line hemodiafiltration. These simulations were used to estimate a novel population kinetic (PK) equation for RRF (PK-RRF) that was validated in an external public dataset of real patients. We assessed the performance of the resulting equation(s) against their ability to estimate urea clearance using cross-validation. Our equations derived entirely from computer simulations and advanced statistical modeling, and had extremely high discrimination (AUC 0.888 – 0.909) when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized pre and post dialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appeared to have higher clinical usefulness as assessed by Decision Curve Analysis, potentially supporting decisions that for individualizing dialysis frequency in patients with preserved RRF.


Author(s):  
Mohammad Achakzai ◽  
Christos Argyropoulos ◽  
Maria Eleni Roumelioti

In this study, we introduce a novel framework for the estimation of residual renal function (RRF), based on the population compartmental kinetic behavior of Beta 2 Microglobulin (B2M) and its dialytic removal. Using this model, we simulated a large cohort of patients with various levels of RRF receiving either conventional high-flux hemodialysis or on-line hemodiafiltration. These simulations were used to estimate a novel population kinetic (PK) equation for RRF (PK-RRF) that was validated in an external public dataset of real patients. We assessed the performance of the resulting equation(s) against their ability to estimate urea clearance using cross-validation. Our equations derived entirely from computer simulations and advanced statistical modeling, and had extremely high discrimination (AUC 0.808 – 0.909) when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized pre and post dialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appeared to have higher clinical usefulness as assessed by Decision Curve Analysis, potentially supporting decisions that for individualizing dialysis frequency in patients with preserved RRF.


2019 ◽  
Author(s):  
Chuanyi Zhang ◽  
Idoia Ochoa

AbstractMotivationVariant discovery is crucial in medical and clinical research, especially in the setting of personalized medicine. As such, precision in variant identification is paramount. However, variants identified by current genomic analysis pipelines contain many false positives (i.e., incorrectly called variants). These can be potentially eliminated by applying state-of-the-art filtering tools, such as the Variant Quality Score Recalibration (VQSR) or the Hard Filtering (HF), both proposed by GATK. However, these methods are very user-dependent and fail to run in some cases. We propose VEF, a variant filtering tool based on ensemble methods that overcomes the main drawbacks of VQSR and the HF. Contrary to these methods, we treat filtering as a supervised learning problem. This is possible by using for training variant call data for which the set of “true” variants is known, i.e., a gold standard exists. Hence, we can classify each variant in the training VCF file as true or false using the gold standard, and further use the annotations of each variant as features for the classification problem. Once trained, VEF can be directly applied to filter the variants contained in a given VCF file. Analysis of several ensemble methods revealed random forest as offering the best performance, and hence VEF uses a random forest for the classification task.ResultsAfter training VEF on a Whole Genome Sequencing (WGS) Human dataset of sample NA12878, we tested its performance on a WGS Human dataset of sample NA24385. For these two samples, the set of high-confident variants has been produced and made available. Results show that the proposed filtering tool VEF consistently outperforms VQSR and HF. In addition, we show that VEF generalizes well even when some features have missing values, and when the training and testing datasets differ either in coverage or in the sequencing machine that was used to generate the data. Finally, since the training needs to be performed only once, there is a significant saving in running time when compared to VQSR (50 minutes versus 4 minutes approximately for filtering the SNPs of WGS Human sample NA24385). Code and scripts available at: github.com/ChuanyiZ/vef.


Sign in / Sign up

Export Citation Format

Share Document