Mining the Hidden Link Structure from Distribution Flows for a Spatial Social Network

This study aims at developing a non-(semi-)parametric method to extract the hidden network structure from the {0,1}-valued distribution flow data with missing observations on the links between nodes. Such an input data type widely exists in the studies of information propagation process, such as the rumor spreading through social media. In that case, a social network does exist as the media of the spreading process, but its link structure is completely unobservable; therefore, it is important to make inference of the structure (links) of the hidden network. Unlike the previous studies on this topic which only consider abstract networks, we believe that apart from the link structure, different social-economic features and different geographic locations of nodes can also play critical roles in shaping the spreading process, which has to be taken into account. To uncover the hidden link structure and its dependence on the external social-economic features of the node set, a multidimensional spatial social network model is constructed in this study with the spatial dimension large enough to account for all influential social-economic factors. Based on the spatial network, we propose a nonparametric mean-field equation to govern the rumor spreading process and apply the likelihood estimator to make inference of the unknown link structure from the observed rumor distribution flows. Our method turns out easily extendible to cover the class of block networks that are useful in most real applications. The method is tested through simulated data and demonstrated on a data set of rumor spreading on Twitter.

Download Full-text

Research on twin-SIR rumor spreading model in online social network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189426 ◽

2020 ◽

pp. 1-12

Author(s):

Jing Yi ◽

Peiyu Liu ◽

Zhihao Wang ◽

Wenfeng Liu

Keyword(s):

Online Social Network ◽

Mean Field ◽

Traditional Media ◽

Spreading Process ◽

Rumor Spreading ◽

News Reports ◽

Spreading Model ◽

The Mean ◽

Mean Field Equation ◽

Selection Of

In the study filed of rumor spreading, kill rumor or dispel rumor is very important in order to control rumor spreading and reduce the bad influence of the rumor. In the previous studies, rumor clarification is mostly finished by relying on external media or news reports instead of intervening and controlling from inside the network, which causes that the speed of rumor clarification is far lower than the speed of rumor spreading, and it is not ideal for the effect of rumor clarification. In this paper, a new Twin-SIR spreading model is proposed, in which, a rumor clarification node named as “rumor dispeller” with the spreading ability is introduced. The rumor dispeller is involved in the spreading process of the model together with the rumor spreader to control the spreading of rumor and thus to achieve the purpose of clarifying rumor. At the same time, during the process of building the model, we also apply the traditional media as a spreading parameter to the spreading process of the model. We built the mean-field equation of the model and then implemented further analysis of the model on homogeneous networks and heterogeneous networks. Through experimental simulations, the “rumor dispeller” was found to have the ability to reduce the spread of rumor spreading, and that the selection of the initial “rumor dispeller” node can affect the effect of rumor spreading, and at the same time, the external media have an important influence on rumor clarification. These conclusions have a new function for guiding us to study the mechanism of rumor spreading.

Download Full-text

Effects of management decisions on genetic evaluation of simulated calving records using random regression

Translational Animal Science ◽

10.1093/tas/txab078 ◽

2021 ◽

Author(s):

M D MacNeil ◽

J W Buchanan ◽

M L Spangler ◽

E Hay

Keyword(s):

Reproductive Success ◽

Simulated Data ◽

Genetic Evaluation ◽

Random Regression ◽

Management Decisions ◽

Third Order ◽

Data Set ◽

Binary Phenotype ◽

Random Regression Model ◽

Missing Observation

Abstract The objective of this study was to evaluate the effects of various data structures on the genetic evaluation for the binary phenotype of reproductive success. The data were simulated based on an existing pedigree and an underlying fertility phenotype with a heritability of 0.10. A data set of complete observations was generated for all cows. This data set was then modified mimicking the culling of cows when they first failed to reproduce, cows having a missing observation at either their second or fifth opportunity to reproduce as if they had been selected as donors for embryo transfer, and censoring records following the sixth opportunity to reproduce as in a cull-for-age strategy. The data were analyzed using a third order polynomial random regression model. The EBV of interest for each animal was the sum of the age-specific EBV over the first 10 observations (reproductive success at ages 2-11). Thus, the EBV might be interpreted as the genetic expectation of number of calves produced when a female is given ten opportunities to calve. Culling open cows resulted in the EBV for 3 year-old cows being reduced from 8.27 ± 0.03 when open cows were retained to 7.60 ± 0.02 when they were culled. The magnitude of this effect decreased as cows grew older when they first failed to reproduce and were subsequently culled. Cows that did not fail over the 11 years of simulated data had an EBV of 9.43 ± 0.01 and 9.35 ± 0.01 based on analyses of the complete data and the data in which cows that failed to reproduce were culled, respectively. Cows that had a missing observation for their second record had a significantly reduced EBV, but the corresponding effect at the fifth record was negligible. The current study illustrates that culling and management decisions, and particularly those that impact the beginning of the trajectory of sustained reproductive success, can influence both the magnitude and accuracy of resulting EBV.

Download Full-text

A Traveler’s Guide to the Multiverse: Promises, Pitfalls, and a Framework for the Evaluation of Analytic Decisions

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920954925 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592095492

Author(s):

Marco Del Giudice ◽

Steven W. Gangestad

Keyword(s):

Degrees Of Freedom ◽

A Priori ◽

Simulated Data ◽

Style Analysis ◽

Data Set ◽

Scant Attention ◽

Equivalence Type ◽

The Impact ◽

Biased Estimates

Decisions made by researchers while analyzing data (e.g., how to measure variables, how to handle outliers) are sometimes arbitrary, without an objective justification for choosing one alternative over another. Multiverse-style methods (e.g., specification curve, vibration of effects) estimate an effect across an entire set of possible specifications to expose the impact of hidden degrees of freedom and/or obtain robust, less biased estimates of the effect of interest. However, if specifications are not truly arbitrary, multiverse-style analyses can produce misleading results, potentially hiding meaningful effects within a mass of poorly justified alternatives. So far, a key question has received scant attention: How does one decide whether alternatives are arbitrary? We offer a framework and conceptual tools for doing so. We discuss three kinds of a priori nonequivalence among alternatives—measurement nonequivalence, effect nonequivalence, and power/precision nonequivalence. The criteria we review lead to three decision scenarios: Type E decisions (principled equivalence), Type N decisions (principled nonequivalence), and Type U decisions (uncertainty). In uncertain scenarios, multiverse-style analysis should be conducted in a deliberately exploratory fashion. The framework is discussed with reference to published examples and illustrated with the help of a simulated data set. Our framework will help researchers reap the benefits of multiverse-style methods while avoiding their pitfalls.

Download Full-text

Messages of Oscillatory Correlograms: A Spike Train Model

Neural Computation ◽

10.1162/neco.2007.12-06-424 ◽

2008 ◽

Vol 20 (5) ◽

pp. 1211-1238 ◽

Cited By ~ 5

Author(s):

Gaby Schneider

Keyword(s):

Spike Train ◽

Null Model ◽

Simulated Data ◽

Joint Analysis ◽

Data Set ◽

Proposed Model ◽

Peak Asymmetry ◽

Millisecond Range ◽

Temporal Interactions ◽

Underlying Processes

Oscillatory correlograms are widely used to study neuronal activity that shows a joint periodic rhythm. In most cases, the statistical analysis of cross-correlation histograms (CCH) features is based on the null model of independent processes, and the resulting conclusions about the underlying processes remain qualitative. Therefore, we propose a spike train model for synchronous oscillatory firing activity that directly links characteristics of the CCH to parameters of the underlying processes. The model focuses particularly on asymmetric central peaks, which differ in slope and width on the two sides. Asymmetric peaks can be associated with phase offsets in the (sub-) millisecond range. These spatiotemporal firing patterns can be highly consistent across units yet invisible in the underlying processes. The proposed model includes a single temporal parameter that accounts for this peak asymmetry. The model provides approaches for the analysis of oscillatory correlograms, taking into account dependencies and nonstationarities in the underlying processes. In particular, the auto- and the cross-correlogram can be investigated in a joint analysis because they depend on the same spike train parameters. Particular temporal interactions such as the degree to which different units synchronize in a common oscillatory rhythm can also be investigated. The analysis is demonstrated by application to a simulated data set.

Download Full-text

A Bayesian model for binary Markov chains

International Journal of Mathematics and Mathematical Sciences ◽

10.1155/s0161171204202319 ◽

2004 ◽

Vol 2004 (8) ◽

pp. 421-429 ◽

Cited By ~ 2

Author(s):

Souad Assoudou ◽

Belkheir Essebbar

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chains ◽

Bayesian Estimation ◽

Bayesian Model ◽

Transition Probabilities ◽

Simulated Data ◽

Bayesian Estimator ◽

Jeffreys Prior ◽

Data Set

This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on the Jeffreys' prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC) techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

Construction of a Genetic Linkage Map in Tetraploid Species Using Molecular Markers

Genetics ◽

10.1093/genetics/157.3.1369 ◽

2001 ◽

Vol 157 (3) ◽

pp. 1369-1385 ◽

Cited By ~ 2

Author(s):

Z W Luo ◽

C A Hackett ◽

J E Bradshaw ◽

J W McNicol ◽

D Milbourne

Keyword(s):

Molecular Markers ◽

Linkage Map ◽

Recombination Frequency ◽

Simulated Data ◽

Likelihood Estimation ◽

Lod Score ◽

Data Set ◽

Independent Segregation ◽

The Em Algorithm ◽

Small Set

Abstract This article presents methodology for the construction of a linkage map in an autotetraploid species, using either codominant or dominant molecular markers scored on two parents and their full-sib progeny. The steps of the analysis are as follows: identification of parental genotypes from the parental and offspring phenotypes; testing for independent segregation of markers; partition of markers into linkage groups using cluster analysis; maximum-likelihood estimation of the phase, recombination frequency, and LOD score for all pairs of markers in the same linkage group using the EM algorithm; ordering the markers and estimating distances between them; and reconstructing their linkage phases. The information from different marker configurations about the recombination frequency is examined and found to vary considerably, depending on the number of different alleles, the number of alleles shared by the parents, and the phase of the markers. The methods are applied to a simulated data set and to a small set of SSR and AFLP markers scored in a full-sib population of tetraploid potato.

Download Full-text

Utilizarea teoriei valorilor extreme în climatologie

Starea actuală a componentelor de mediu ◽

10.53380/9789975315593.17 ◽

2019 ◽

Author(s):

Valentin Raileanu ◽

Keyword(s):

Maximum Likelihood ◽

Extreme Values ◽

Probability Distributions ◽

Simulated Data ◽

Likelihood Estimation ◽

R Software ◽

Data Set ◽

Data Format ◽

Generalized Pareto ◽

Distribution Parameters

The article briefly describes the history and fields of application of the theory of extreme values, including climatology. The data format, the Generalized Extreme Value (GEV) probability distributions with Bock Maxima, the Generalized Pareto (GP) distributions with Point of Threshold (POT) and the analysis methods are presented. Estimating the distribution parameters is done using the Maximum Likelihood Estimation (MLE) method. Free R software installation, the minimum set of required commands and the GUI in2extRemes graphical package are described. As an example, the results of the GEV analysis of a simulated data set in in2extRemes are presented.

Download Full-text

Intermittency and related issues in 16O-Ag/Br collision at 200A GeV/c

Canadian Journal of Physics ◽

10.1139/p10-038 ◽

2010 ◽

Vol 88 (8) ◽

pp. 575-584 ◽

Cited By ~ 4

Author(s):

M. K. Ghosh ◽

P. K. Haldar ◽

S. K. Manna ◽

A. Mukhopadhyay ◽

G. Singh

Keyword(s):

Particle Density ◽

Simulated Data ◽

Space Distribution ◽

Monte Carlo Code ◽

Final State ◽

Data Set ◽

Fractal Properties ◽

Self Similar ◽

Singly Charged ◽

Almost All

In this paper we present some results on the nonstatistical fluctuation in the 1-dimensional (1-d) density distribution of singly charged produced particles in the framework of the intermittency phenomenon. A set of nuclear emulsion data on 16O-Ag/Br interactions at an incident momentum of 200A GeV/c, was analyzed in terms of different statistical methods that are related to the self-similar fractal properties of the particle density function. A comparison of the present experiment with a similar experiment induced by the 32S nuclei and also with a set of results simulated by the Lund Monte Carlo code FRITIOF is presented. A similar comparison between this experiment and a pseudo-random number generated simulated data set is also made. The analysis reveals the presence of a weak intermittency in the 1-d phase space distribution of the produced particles. The results also indicate the occurrence of a nonthermal phase transition during emission of final-state hadrons. Our results on factorial correlators suggests that short-range correlations are present in the angular distribution of charged hadrons, whereas those on oscillatory moments show that such correlations are not restricted only to a few particles. In almost all cases, the simulated results fail to replicate their experimental counterparts.

Download Full-text

Estimating the Biomass of Waterhyacinth (Eichhornia crassipes) Using the Normalized Difference Vegetation Index Derived from Simulated Landsat 5 TM

Invasive Plant Science and Management ◽

10.1614/ipsm-d-14-00033.1 ◽

2015 ◽

Vol 8 (2) ◽

pp. 203-211 ◽

Cited By ~ 6

Author(s):

Wilfredo Robles ◽

John D. Madsen ◽

Ryan M. Wersal

Keyword(s):

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Simulated Data ◽

Water Bodies ◽

Biomass Estimation ◽

Aquatic Weed ◽

Target Area ◽

Data Set ◽

Area Index ◽

Landsat 5 Tm

Waterhyacinth is a free-floating aquatic weed that is considered a nuisance worldwide. Excessive growth of waterhyacinth limits recreational use of water bodies as well as interferes with many ecological processes. Accurate estimates of biomass are useful to assess the effectiveness of control methods to manage this aquatic weed. While large water bodies require significant labor inputs with respect to ground-truth surveys, available technology like remote sensing could be capable of providing temporal and spatial information from a target area at a much reduced cost. Studies were conducted at Lakes Columbus and Aberdeen (Mississippi) during the growing seasons of 2005 and 2006 over established populations of waterhyacinth. The objective was to estimate biomass based on nondestructive methods using the normalized difference vegetation index (NDVI) derived from Landsat 5 TM simulated data. Biomass was collected monthly using a 0.10m2 quadrat at 25 randomly-located locations at each site. Morphometric plant parameters were also collected to enhance the use of NDVI for biomass estimation. Reflectance measurements using a hyperspectral sensor were taken every month at each site during biomass collection. These spectral signatures were then transformed into a Landsat 5 TM simulated data set using MatLab® software. A positive linear relationship (r2 = 0.28) was found between measured biomass of waterhyacinth and NDVI values from the simulated dataset. While this relationship appears weak, the addition of morphological parameters such as leaf area index (LAI) and leaf length enhanced the relationship yielding an r2 = 0.66. Empirically, NDVI saturates at high LAI, which may limit its use to estimate the biomass in very dense vegetation. Further studies using NDVI calculated from narrower spectral bands than those contained in Landsat 5 TM are recommended.

Download Full-text