Spillover as Movement Agenda Setting: Using Computational and Network Techniques for Improved Rare Event Identification

The increasing availability of data, along with sophisticated computational methods for analyzing them, presents researchers with new opportunities and challenges. In this article, we address both by describing computational and network methods that can be used to identify cases of rare phenomena. We evaluate each method’s relative utility in the identification of a specific rare phenomenon of interest to social movement researchers: the spillover of social movement claims from one movement to another. We identify and test five different approaches to detecting cases of spillover in the largest data set of protest events currently available, finding that an ensemble approach that combines clique and correspondence analysis and an ensemble approach combining all methods perform considerably better than others. Our approach is preferable to other ways of analyzing such cases; compared to qualitative approaches, our computational process identifies many more cases of spillover—some of which are surprising and would likely not be otherwise investigated. At the same time, compared to crude quantitative measures, our approach substantially reduces the “noise,” or identification of false-positive cases, of movement spillover. We argue that this technique, which can be adapted to other research topics, is a good illustration of how the thoughtful implementation of computational methods can allow for the efficient identification of rare events and also bridge deductive and inductive approaches to scientific inquiry.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.64 ◽

1995 ◽

Vol 3 (3) ◽

pp. 133-142 ◽

Cited By ~ 10

Author(s):

M. Hana ◽

W.F. McClure ◽

T.B. Whitaker ◽

M. White ◽

D.R. Bahler

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Near Infrared ◽

Back Propagation ◽

Linear Network ◽

Data Set ◽

Input Layer ◽

Propagation Network ◽

Better Than

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

Download Full-text

Diffusion in CP Stars: The Quest for Accuracy

Highlights of Astronomy ◽

10.1017/s1539299600018414 ◽

1998 ◽

Vol 11 (2) ◽

pp. 671-673

Author(s):

G. Alecian

Keyword(s):

Computational Methods ◽

Stellar Evolution ◽

Diffusion Processes ◽

Stellar Atmospheres ◽

Time Dependent ◽

Data Bases ◽

Self Consistent ◽

Stellar Envelopes ◽

Better Than

We present a brief review about recent progresses concerning the study of diffusion processes in CP stars. The most spectacular of them concerns the calculation of radiative accelerations in stellar envelopes for which an accuracy better than 30% can now be reached for a large number of ions. This improvement is mainly due to huge and accurate atomic and opacity data bases available since the beginning of the 90’s. Developments of efficient computational methods have been carried out to take advantage of these new data. These progresses have, in turn, led to a better understanding of how the element stratification is building up with time. A computation of self-consistent stellar evolution models, including time-dependent diffusion, can now be within the scope of the next few years. However, the progresses previously mentioned do not apply for stellar atmospheres and upper layers of envelopes.

Download Full-text

Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data

Entropy ◽

10.3390/e20090642 ◽

2018 ◽

Vol 20 (9) ◽

pp. 642 ◽

Cited By ~ 2

Author(s):

Erlandson Saraiva ◽

Adriano Suzuki ◽

Luis Milan

Keyword(s):

Computational Methods ◽

Posterior Distribution ◽

Estimation Procedure ◽

Small Sample ◽

Survival Model ◽

Data Set ◽

Right Censored Data ◽

Slice Sampling ◽

Bivariate Survival ◽

Model Based

In this paper, we study the performance of Bayesian computational methods to estimate the parameters of a bivariate survival model based on the Ali–Mikhail–Haq copula with marginal distributions given by Weibull distributions. The estimation procedure was based on Monte Carlo Markov Chain (MCMC) algorithms. We present three version of the Metropolis–Hastings algorithm: Independent Metropolis–Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis–Hastings with a natural-candidate generating density (MH). Since the creation of a good candidate generating density in IMH and RWM may be difficult, we also describe how to update a parameter of interest using the slice sampling (SS) method. A simulation study was carried out to compare the performances of the IMH, RWM and SS. A comparison was made using the sample root mean square error as an indicator of performance. Results obtained from the simulations show that the SS algorithm is an effective alternative to the IMH and RWM methods when simulating values from the posterior distribution, especially for small sample sizes. We also applied these methods to a real data set.

Download Full-text

Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

Biogeosciences Discussions ◽

10.5194/bgd-6-5271-2009 ◽

2009 ◽

Vol 6 (3) ◽

pp. 5271-5304 ◽

Cited By ~ 22

Author(s):

M. Jung ◽

M. Reichstein ◽

A. Bondeau

Keyword(s):

Eddy Covariance ◽

Learning Algorithm ◽

Gross Primary Production ◽

Global Network ◽

Data Set ◽

Model Trees ◽

Ensemble Approach ◽

Model Tree ◽

Biosphere Model ◽

Variance Explained

Abstract. Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble approach using an artificial data set derived from the the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998–2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the model tree ensemble upscaling and associated problems of extrapolation capacity. We show that the model tree ensemble is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained). We demonstrate the substantially improved accuracy of the model tree ensemble over individual model trees in particular for the monthly anomalies and for situations of extrapolation. We estimate that roughly one fifth of the domain is subject to extrapolation while the model tree ensemble is still able to reproduce 73% of the LPJmL GPP variability here. This paper presents for the first time a benchmark for a global FLUXNET upscaling approach that will be employed in future studies. Although the real world FLUXNET upscaling is more complicated than for a noise free and reduced complexity biosphere model as presented here, our results show that an empirical upscaling from the current FLUXNET network with a model tree ensemble is feasible and able to extract global patterns of carbon flux variability.

Download Full-text

Fast effect size shrinkage software for beta-binomial models of allelic imbalance

F1000Research ◽

10.12688/f1000research.20916.2 ◽

2020 ◽

Vol 8 ◽

pp. 2024

Author(s):

Joshua P. Zitovsky ◽

Michael I. Love

Keyword(s):

Allelic Imbalance ◽

Real Data ◽

Shrinkage Estimators ◽

Data Set ◽

Bayesian Shrinkage ◽

In Cis ◽

Posterior Estimation ◽

Binomial Models ◽

Better Than ◽

Diploid Organism

Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

A Novel Network Traffic Anomaly Detection Based on Multi-Scale Fusion

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.48-49.102 ◽

2011 ◽

Vol 48-49 ◽

pp. 102-105

Author(s):

Guo Zhen Cheng ◽

Dong Nian Cheng ◽

He Lei

Keyword(s):

Anomaly Detection ◽

False Alarm ◽

False Alarm Rate ◽

Network Traffic ◽

Self Similarity ◽

Data Set ◽

Multi Scale ◽

Traffic Anomaly ◽

Detection Evaluation ◽

Better Than

Detecting network traffic anomaly is very important for network security. But it has high false alarm rate, low detect rate and that can’t perform real-time detection in the backbone very well due to its nonlinearity, nonstationarity and self-similarity. Therefore we propose a novel detection method—EMD-DS, and prove that it can reduce mean error rate of anomaly detection efficiently after EMD. On the KDD CUP 1999 intrusion detection evaluation data set, this detector detects 85.1% attacks at low false alarm rate which is better than some other systems.

Download Full-text

A Novel Evolutionary Biclustering Approach using MapReduce(EBC-MR)

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2016010103 ◽

2016 ◽

Vol 6 (1) ◽

pp. 26-36 ◽

Cited By ~ 1

Author(s):

Rathipriya R.

Keyword(s):

Genetic Algorithm ◽

Expression Data ◽

Correlation Measure ◽

Web Data ◽

Mapreduce Framework ◽

Average Correlation ◽

Data Set ◽

Cluster Data ◽

Value Measure ◽

Better Than

A novel biclustering approach is proposed in this paper, which can be used to cluster data (like web data, gene expression data) into local pattern using MapReduce framework. The proposed biclustering approach extracts the highly coherent bicluster using a correlation measure called Average Correlation Value measure. Furthermore, MapReduce based genetic algorithm is firstly used to the biclustering of web data. This method can avoid local convergence in the optimization algorithms mostly. The MSWeb dataset and MSNBC web usage data set are used to test the performance of new MapReduce based Evolutionary biclustering algorithm. The experimental study is carried out for comparison of proposed algorithm with traditional genetic algorithm in biclustering. The results reveal that novel proposed approach preforms better than existing evolutionary biclustering approach.

Download Full-text