Boolean Network Learning in Vector Spaces for Genome-wide Network Analysis

Mapping Intimacies ◽

10.24963/kr.2021/53 ◽

2021 ◽

Author(s):

Taisuke Sato ◽

Ryosuke Kojima

Keyword(s):

Prediction Accuracy ◽

State Transition ◽

Real Data ◽

Boolean Networks ◽

State Transitions ◽

Data Sets ◽

Integer Vector ◽

Binary Matrix ◽

Genome Wide ◽

Boolean Formulas

Boolean networks (BNs) are one of the standard tools for modeling gene regulatory networks in biology but their learning has been limited to small networks due to computational difficulty. Aiming at unprecedented scalability, we focus on a subclass of BNs called AND/OR Boolean networks where Boolean formulas are restricted to a conjunction or a disjunction of literals. We represent an AND/OR BN with N nodes by an N x 2N binary matrix Q paired with an N dimensional integer vector theta called a threshold vector, a state of the BN by an N dimensional binary state vector s and a state transition by matrix operations on Q, theta and s. Given a list of state transitions S = s_0...s_L, we learn Q and theta in a continuous space by minimizing a cost function J(Q*,theta,S) w.r.t. a real number matrix Q* and theta while thresholding Q* into a binary matrix Q using theta so that Q represents an AND/OR BN realizing the target state transitions S. We conducted experiments with artificial and real data sets to check scalability and accuracy of our learning algorithm. First we randomly generated AND/OR BNs up to N=5,000 nodes and empirically confirmed O(N^2) learning time behavior using them. We also observed 99.8% bit-by-bit prediction accuracy (prediction accuracy = 1 - test error) with state transition data generated by AND/OR BNs. For real data, we learned genome-wide AND/OR BNs with 10,928 nodes for budding yeast from transcription profiling data sets, each containing 10,928 mRNAs and 40 transitions and achieved for instance 84.3% prediction accuracy and successfully extracted more than 6,000 small AND/ORs whose average prediction accuracy reaches much higher 94.9%.

A COMPARISON OF SCORING METRICS FOR PREDICTING THE NEXT NAVIGATION STEP WITH MARKOV MODEL-BASED SYSTEMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003956 ◽

2010 ◽

Vol 09 (04) ◽

pp. 547-573 ◽

Cited By ~ 4

Author(s):

JOSÉ BORGES ◽

MARK LEVENE

Keyword(s):

Markov Model ◽

Prediction Accuracy ◽

Prediction Models ◽

Markov Models ◽

Real Data ◽

Absolute Error ◽

Brier Score ◽

Data Sets ◽

Extensive Evaluation ◽

The Impact

The problem of predicting the next request during a user's navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and to predict the next navigation step, while prediction accuracy has been mainly evaluated with the hit and miss score. We claim that this score, although useful, is not sufficient for evaluating next link prediction models with the aim of finding a sufficient order of the model, the size of a recommendation set, and assessing the impact of unexpected events on the prediction accuracy. Herein, we make use of a variable length Markov model to compare the usefulness of three alternatives to the hit and miss score: the Mean Absolute Error, the Ignorance Score, and the Brier score. We present an extensive evaluation of the methods on real data sets and a comprehensive comparison of the scoring methods.

FINDING ATTRACTORS IN ASYNCHRONOUS BOOLEAN DYNAMICS

Advances in Complex Systems ◽

10.1142/s0219525911003098 ◽

2011 ◽

Vol 14 (03) ◽

pp. 439-449 ◽

Cited By ~ 11

Author(s):

THOMAS SKODAWESSELY ◽

KONSTANTIN KLEMM

Keyword(s):

Fixed Points ◽

State Transition ◽

Boolean Networks ◽

Computational Method ◽

State Transitions ◽

Transition Graph ◽

Numerical Tests ◽

State Transition Graph ◽

Boolean Dynamics ◽

Reduced State

We present a computational method for finding attractors (ergodic sets of states) of Boolean networks under asynchronous update. The approach is based on a systematic removal of state transitions to render the state transition graph acyclic. In this reduced state transition graph, all attractors are fixed points that can be enumerated with little effort in most instances. This attractor set is then extended to the attractor set of the original dynamics. Our numerical tests on standard Kauffman networks indicate that the method is efficient in the sense that the total number of state vectors visited grows moderately with the number of states contained in attractors.

Faculty Opinions recommendation of Genome-wide chromatin state transitions associated with developmental and environmental cues.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717978055.793470964 ◽

2013 ◽

Author(s):

Stephen Safe ◽

Kyounghyun Kim

Keyword(s):

Chromatin State ◽

State Transitions ◽

Environmental Cues ◽

Genome Wide

Transforming variables to central normality

Machine Learning ◽

10.1007/s10994-021-05960-5 ◽

2021 ◽

Author(s):

Jakob Raymaekers ◽

Peter J. Rousseeuw

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Simulation Study ◽

Real Data ◽

Data Sets ◽

Transformation Parameter ◽

Likelihood Estimator ◽

Extensive Simulation ◽

Highly Sensitive

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Entropy ◽

10.3390/e23010062 ◽

2020 ◽

Vol 23 (1) ◽

pp. 62

Author(s):

Zhengwei Liu ◽

Fukang Zhu

Keyword(s):

Likelihood Estimation ◽

Real Data ◽

Autoregressive Models ◽

Superior Performance ◽

Data Sets ◽

Binomial Thinning ◽

Free Case ◽

Two Parameters ◽

Conditional Maximum ◽

Thinning Operator

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2019-0007 ◽

2020 ◽

Vol 19 (3) ◽

Author(s):

Saheb Foroutaifar

Keyword(s):

Data Analysis ◽

Bayesian Methods ◽

Milk Fat ◽

Prediction Accuracy ◽

Real Data ◽

Fat Percentage ◽

Real Data Analysis ◽

Wide Range ◽

High Heritability ◽

Qtl Effects

AbstractThe main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different numbers of QTL and the distribution of their effects. For real data analysis, a German Holstein dataset for milk fat percentage, milk yield, and somatic cell score was used. The simulation results showed that, with the exception of the Bayes R, the other methods were sensitive to changes in the number of QTLs and distribution of QTL effects. Having a distribution of QTL effects, similar to what different Bayesian methods assume for estimating marker effects, did not improve their prediction accuracy. The Bayes B method gave higher or equal accuracy rather than the rest. The real data analysis showed that similar to scenarios with a large number of QTLs in the simulation, there was no difference between the accuracies of the different methods for any of the traits.

Goodness-of-Fit Tests for Bivariate Time Series of Counts

Econometrics ◽

10.3390/econometrics9010010 ◽

2021 ◽

Vol 9 (1) ◽

pp. 10

Author(s):

Šárka Hudecová ◽

Marie Hušková ◽

Simos G. Meintanis

Keyword(s):

Goodness Of Fit ◽

Probability Generating Function ◽

Parametric Bootstrap ◽

Real Data ◽

Data Sets ◽

Test Statistics ◽

Finite Sample ◽

Generalized Poisson ◽

Goodness Of Fit Tests ◽

Monte Carlo Experiments

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.

TraceAll: A Real-Time Processing for Contact Tracing Using Indoor Trajectories

Information ◽

10.3390/info12050202 ◽

2021 ◽

Vol 12 (5) ◽

pp. 202

Author(s):

Louai Alarabi ◽

Saleh Basalamah ◽

Abdeltawab Hendawi ◽

Mohammed Abdalla

Keyword(s):

Infectious Diseases ◽

Infected Patient ◽

Public Health Problem ◽

Real Data ◽

Exposure Period ◽

Contact Tracing ◽

Data Sets ◽

Major Public Health Problem ◽

Real Time Processing ◽

Recent Developments

The rapid spread of infectious diseases is a major public health problem. Recent developments in fighting these diseases have heightened the need for a contact tracing process. Contact tracing can be considered an ideal method for controlling the transmission of infectious diseases. The result of the contact tracing process is performing diagnostic tests, treating for suspected cases or self-isolation, and then treating for infected persons; this eventually results in limiting the spread of diseases. This paper proposes a technique named TraceAll that traces all contacts exposed to the infected patient and produces a list of these contacts to be considered potentially infected patients. Initially, it considers the infected patient as the querying user and starts to fetch the contacts exposed to him. Secondly, it obtains all the trajectories that belong to the objects moved nearby the querying user. Next, it investigates these trajectories by considering the social distance and exposure period to identify if these objects have become infected or not. The experimental evaluation of the proposed technique with real data sets illustrates the effectiveness of this solution. Comparative analysis experiments confirm that TraceAll outperforms baseline methods by 40% regarding the efficiency of answering contact tracing queries.

ALPINE: Active Link Prediction Using Network Embedding

Applied Sciences ◽

10.3390/app11115043 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5043

Author(s):

Xi Chen ◽

Bo Kang ◽

Jefrey Lijffijt ◽

Tijl De Bie

Keyword(s):

Active Learning ◽

Protein Interactions ◽

Link Prediction ◽

Prediction Accuracy ◽

Real Data ◽

Network Embedding ◽

Protein Protein Interactions ◽

Additional Information ◽

The Cost ◽

Active Link

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

The Flexible Burr X-G Family: Properties, Inference, and Applications in Engineering Science

Symmetry ◽

10.3390/sym13030474 ◽

2021 ◽

Vol 13 (3) ◽

pp. 474

Author(s):

Abdulhakim A. Al-Babtain ◽

Ibrahim Elbatal ◽

Hazem Al-Mofleh ◽

Ahmed M. Gemeay ◽

Ahmed Z. Afify ◽

...

Keyword(s):

Numerical Simulations ◽

Exponential Distribution ◽

Real Data ◽

Exponential Model ◽

Statistical Properties ◽

Engineering Science ◽

Data Sets ◽

Engineering Sciences ◽

General Statistical ◽

Anderson Darling

In this paper, we introduce a new flexible generator of continuous distributions called the transmuted Burr X-G (TBX-G) family to extend and increase the flexibility of the Burr X generator. The general statistical properties of the TBX-G family are calculated. One special sub-model, TBX-exponential distribution, is studied in detail. We discuss eight estimation approaches to estimating the TBX-exponential parameters, and numerical simulations are conducted to compare the suggested approaches based on partial and overall ranks. Based on our study, the Anderson–Darling estimators are recommended to estimate the TBX-exponential parameters. Using two skewed real data sets from the engineering sciences, we illustrate the importance and flexibility of the TBX-exponential model compared with other existing competing distributions.