scholarly journals LC-N2G: a local consistency approach for nutrigenomics data analysis

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiangnan Xu ◽  
Samantha M. Solon-Biet ◽  
Alistair Senior ◽  
David Raubenheimer ◽  
Stephen J. Simpson ◽  
...  

Abstract Background Nutrigenomics aims at understanding the interaction between nutrition and gene information. Due to the complex interactions of nutrients and genes, their relationship exhibits non-linearity. One of the most effective and efficient methods to explore their relationship is the nutritional geometry framework which fits a response surface for the gene expression over two prespecified nutrition variables. However, when the number of nutrients involved is large, it is challenging to find combinations of informative nutrients with respect to a certain gene and to test whether the relationship is stronger than chance. Methods for identifying informative combinations are essential to understanding the relationship between nutrients and genes. Results We introduce Local Consistency Nutrition to Graphics (LC-N2G), a novel approach for ranking and identifying combinations of nutrients with gene expression. In LC-N2G, we first propose a model-free quantity called Local Consistency statistic to measure whether there is non-random relationship between combinations of nutrients and gene expression measurements based on (1) the similarity between samples in the nutrient space and (2) their difference in gene expression. Then combinations with small LC are selected and a permutation test is performed to evaluate their significance. Finally, the response surfaces are generated for the subset of significant relationships. Evaluation on simulated data and real data shows the LC-N2G can accurately find combinations that are correlated with gene expression. Conclusion The LC-N2G is practically powerful for identifying the informative nutrition variables correlated with gene expression. Therefore, LC-N2G is important in the area of nutrigenomics for understanding the relationship between nutrition and gene expression information.

Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


2019 ◽  
Vol 35 (23) ◽  
pp. 4955-4961
Author(s):  
Yongzhuang Liu ◽  
Jian Liu ◽  
Yadong Wang

Abstract Motivation Whole-genome sequencing (WGS) of tumor–normal sample pairs is a powerful approach for comprehensively characterizing germline copy number variations (CNVs) and somatic copy number alterations (SCNAs) in cancer research and clinical practice. Existing computational approaches for detecting copy number events cannot detect germline CNVs and SCNAs simultaneously, and yield low accuracy for SCNAs. Results In this study, we developed TumorCNV, a novel approach for jointly detecting germline CNVs and SCNAs from WGS data of the matched tumor–normal sample pair. We compared TumorCNV with existing copy number event detection approaches using the simulated data and real data for the COLO-829 melanoma cell line. The experimental results showed that TumorCNV achieved superior performance than existing approaches. Availability and implementation The software TumorCNV is implemented using a combination of Java and R, and it is freely available from the website at https://github.com/yongzhuang/TumorCNV. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zhanpeng Wang ◽  
Jiaping Wang ◽  
Michael Kourakos ◽  
Nhung Hoang ◽  
Hyong Hark Lee ◽  
...  

AbstractPopulation genetics relies heavily on simulated data for validation, inference, and intuition. In particular, since real data is always limited, simulated data is crucial for training machine learning methods. Simulation software can accurately model evolutionary processes, but requires many hand-selected input parameters. As a result, simulated data often fails to mirror the properties of real genetic data, which limits the scope of methods that rely on it. In this work, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project, and show that we can accurately recapitulate the features of real data.


Genetics ◽  
1994 ◽  
Vol 138 (3) ◽  
pp. 963-971 ◽  
Author(s):  
G A Churchill ◽  
R W Doerge

Abstract The detection of genes that control quantitative characters is a problem of great interest to the genetic mapping community. Methods for locating these quantitative trait loci (QTL) relative to maps of genetic markers are now widely used. This paper addresses an issue common to all QTL mapping methods, that of determining an appropriate threshold value for declaring significant QTL effects. An empirical method is described, based on the concept of a permutation test, for estimating threshold values that are tailored to the experimental data at hand. The method is demonstrated using two real data sets derived from F(2) and recombinant inbred plant populations. An example using simulated data from a backcross design illustrates the effect of marker density on threshold values.


2006 ◽  
Vol 04 (04) ◽  
pp. 911-993 ◽  
Author(s):  
HAIFENG LI ◽  
XIN CHEN ◽  
KESHU ZHANG ◽  
TAO JIANG

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising.


Author(s):  
Lianbo Yu ◽  
Parul Gulati ◽  
Soledad Fernandez ◽  
Michael Pennell ◽  
Lawrence Kirschner ◽  
...  

Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.


2005 ◽  
Vol 15 (04) ◽  
pp. 311-322 ◽  
Author(s):  
CARLA S. MÖLLER-LEVET ◽  
HUJUN YIN

In this paper a novel approach is introduced for modeling and clustering gene expression time-series. The radial basis function neural networks have been used to produce a generalized and smooth characterization of the expression time-series. A co-expression coefficient is defined to evaluate the similarities of the models based on their temporal shapes and the distribution of the time points. The profiles are grouped using a fuzzy clustering algorithm incorporated with the proposed co-expression coefficient metric. The results on artificial and real data are presented to illustrate the advantages of the metric and method in grouping temporal profiles. The proposed metric has also been compared with the commonly used correlation coefficient under the same procedures and the results show that the proposed method produces better biologicaly relevant clusters.


2021 ◽  
Author(s):  
Alessandro Scano ◽  
Robert Mihai Mira ◽  
Andrea d'Avella

Synergistic models have been employed to investigate motor coordination separately in the muscular and kinematic domains. However, the relationship between muscle synergies, constrained to be non-negative, and kinematic synergies, whose elements can be positive and negative, has received limited attention. Existing algorithms for extracting synergies from combined kinematic and muscular data either do not enforce non-negativity constraints or separate non-negative variables into positive and negative components. We propose a mixed matrix factorization (MMF) algorithm based on a gradient descent update rule which overcomes these limitations. It directly assesses the relationship between kinematic and muscle activity variables, by enforcing the non-negativity constrain on a subset of variables. We validated the algorithm on simulated kinematic-muscular data generated from known spatial synergies and temporal coefficients, by assessing the similarity between extracted and ground truth synergies and temporal coefficients when the data are corrupted by different noise levels. We also compared the performance of MMF to that of non-negative matrix factorization applied to separate positive and negative components (NMFpn). Finally, we factorized kinematic and EMG data collected during upper-limb movements to demonstrate the potential of the algorithm. MMF achieved almost perfect reconstruction on noiseless simulated data. It performed better than NMFpn in recovering the correct spatial synergies and temporal coefficients with noisy simulated data. It allowed to correctly select the original number of ground truth synergies. We showed meaningful applicability to real data. MMF can also be applied to any multivariate data that contains both non-negative and unconstrained variables.


2015 ◽  
Author(s):  
Marla Johnson ◽  
Elizabeth Purdom

Current sequencing of mRNA can provide estimates of the levels of individual isoforms within the cell, where isoforms are the different distinct mRNA products or proteins created by a gene. It remains to adapt many standard statistical methods commonly used for analyzing gene expression levels to take advantage of this additional information. One novel question is whether we can find groupings or clusters of samples that are distinguished not by their gene expression but by their isoform usage. Such clusters in tumors, for example, could be the result of shared disruption to the splicing system that creates the different isoforms. We propose a novel approach to clustering mRNA-Seq data that identifies clusters of samples with common isoform usage. We show via simulation that our methods are more sensitive to finding clusters of similar alternative splicing patterns than standard clustering techniques applied directly to the estimates of isoform levels. We further demonstrate that clustering on isoform usage is more accurate than clustering directly on isoform levels by examining real data that contains a technical artifact that resulted in different batches having different isoform usage patterns.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Muhammad Farooq ◽  
Qamruz zaman ◽  
Muhammad Ijaz ◽  
Said Farooq Shah ◽  
Mutua Kilai

In practice, the data sets with extreme values are possible in many fields such as engineering, lifetime analysis, business, and economics. A lot of probability distributions are derived and presented to increase the model flexibility in the presence of such values. The current study also focuses on investigations to derive a new probability model New Flexible Family (NFF) of distributions. The significance of NFF is carried out using the Weibull distribution called New Flexible Weibull distribution or in short NFW. Various mathematical properties of NFW have been discussed including the estimation of parameters and entropy measures. Two real data sets with extreme values and a simulation study have been conducted so as to delineate the importance of NFW. Furthermore, NFW is compared with other existing probability distributions; numerically, it has been observed that the new mechanism of producing the lifetime probability distributions plays a significant role in making predictions about the population than others using the data sets with extreme values.


Sign in / Sign up

Export Citation Format

Share Document