scholarly journals Influence Maximization Algorithm Based on Reverse Reachable Set

Author(s):  
Gengxin Sun ◽  
Chih-Cheng Chen

Most of the existing influence maximization algorithms are not suitable for large-scale social networks due to their high time complexity or limited influence propagation range. Therefore, a D-RIS influence maximization algorithm is proposed based on the independent cascade model and combined with the reverse reachable set sampling. Under the premise that the influence propagation function satisfies monotonicity and submodularity, the D-RIS algorithm uses automatic debugging method to determine the critical value of the number of reverse reachable sets, which not only obtains a better influence propagation range, and greatly reduce the time complexity. The experimental results on the two real data sets of Slashdot and Epinions show that D-RIS algorithm is close to the CELF algorithm and higher than RIS algorithm, HighDegree algorithm, LIR algorithm and pBmH algorithm in influence propagation range. At the same time, it is significantly better than the CELF algorithm and RIS algorithm in running time, which indicates that D-RIS algorithm is more suitable for large scale social network.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Gengxin Sun ◽  
Chih-Cheng Chen

Most of the existing influence maximization algorithms are not suitable for large-scale social networks due to their high time complexity or limited influence propagation range. Therefore, a D-RIS (dynamic-reverse reachable set) influence maximization algorithm is proposed based on the independent cascade model and combined with the reverse reachable set sampling. Under the premise that the influence propagation function satisfies monotonicity and submodularity, the D-RIS algorithm uses an automatic debugging method to determine the critical value of the number of reverse reachable sets, which not only obtains a better influence propagation range but also greatly reduces the time complexity. The experimental results on the two real datasets of Slashdot and Epinions show that D-RIS algorithm is close to the CELF (cost-effective lazy-forward) algorithm and higher than RIS algorithm, HighDegree algorithm, LIR algorithm, and pBmH (population-based metaheuristics) algorithm in influence propagation range. At the same time, it is significantly better than the CELF algorithm and RIS algorithm in running time, which indicates that D-RIS algorithm is more suitable for large-scale social network.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2013 ◽  
Vol 312 ◽  
pp. 771-776
Author(s):  
Min Juan Zheng ◽  
Guo Jian Cheng ◽  
Fei Zhao

The quadratic programming problem in the standard support vector machine (SVM) algorithm has high time complexity and space complexity in solving the large-scale problems which becomes a bottleneck in the SVM applications. Ball Vector Machine (BVM) converts the quadratic programming problem of the traditional SVM into the minimum enclosed ball problem (MEB). It can indirectly get the solution of quadratic programming through solving the MEB problem which significantly reduces the time complexity and space complexity. The experiments show that when handling five large-scale and high-dimensional data sets, the BVM and standard SVM have a considerable accuracy, but the BVM has higher speed and less requirement space than standard SVM.


2019 ◽  
Author(s):  
Jan A Freudenthal ◽  
Simon Pfaff ◽  
Niklas Terhoeven ◽  
Arthur Korte ◽  
Markus J Ankenbrand ◽  
...  

AbstractBackgroundChloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome.ResultsThe performance of different assembly tools has never been systematically compared. Here we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes.ConclusionsWe create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.


2011 ◽  
Vol 271-273 ◽  
pp. 1451-1454
Author(s):  
Gang Zhang ◽  
Jian Yin ◽  
Liang Lun Cheng ◽  
Chun Ru Wang

Teaching quality is a key metric in college teaching effect and ability evaluation. In many previous literatures, evaluation of such metric is merely depended on subjective judgment of few experts based on their experience, which leads to some false, bias or unstable results. Moreover, pure human based evaluation is expensive that is difficult to extend to large scale. With the application of information technology, much information in college teaching is recorded and stored electronically, which founds the basic of a computer-aid analysis. In this paper, we perform teaching quality evaluation within machine learning framework, focusing on learning and modeling electronic information associated with quality of teaching, to get a stable model described the substantial principles of teaching quality. Artificial Neural Network (ANN) is selected as the main model in this work. Experiment results on real data sets consisted of 4 subjects / 8 semesters show the effectiveness of the proposed method.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jan A. Freudenthal ◽  
Simon Pfaff ◽  
Niklas Terhoeven ◽  
Arthur Korte ◽  
Markus J. Ankenbrand ◽  
...  

Abstract Background Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. Results The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. Conclusions We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.


2014 ◽  
Vol 10 (S306) ◽  
pp. 51-53
Author(s):  
Sebastian Dorn ◽  
Erandy Ramirez ◽  
Kerstin E. Kunze ◽  
Stefan Hofmann ◽  
Torsten A. Enßlin

AbstractThe presence of multiple fields during inflation might seed a detectable amount of non-Gaussianity in the curvature perturbations, which in turn becomes observable in present data sets like the cosmic microwave background (CMB) or the large scale structure (LSS). Within this proceeding we present a fully analytic method to infer inflationary parameters from observations by exploiting higher-order statistics of the curvature perturbations. To keep this analyticity, and thereby to dispense with numerically expensive sampling techniques, a saddle-point approximation is introduced whose precision has been validated for a numerical toy example. Applied to real data, this approach might enable to discriminate among the still viable models of inflation.


Energies ◽  
2020 ◽  
Vol 13 (5) ◽  
pp. 1085
Author(s):  
Syed Naeem Haider ◽  
Qianchuan Zhao ◽  
Xueliang Li

Prediction of a battery’s health in data centers plays a significant role in Battery Management Systems (BMS). Data centers use thousands of batteries, and their lifespan ultimately decreases over time. Predicting battery’s degradation status is very critical, even before the first failure is encountered during its discharge cycle, which also turns out to be a very difficult task in real life. Therefore, a framework to improve Auto-Regressive Integrated Moving Average (ARIMA) accuracy for forecasting battery’s health with clustered predictors is proposed. Clustering approaches, such as Dynamic Time Warping (DTW) or k-shape-based, are beneficial to find patterns in data sets with multiple time series. The aspect of large number of batteries in a data center is used to cluster the voltage patterns, which are further utilized to improve the accuracy of the ARIMA model. Our proposed work shows that the forecasting accuracy of the ARIMA model is significantly improved by applying the results of the clustered predictor for batteries in a real data center. This paper presents the actual historical data of 40 batteries of the large-scale data center for one whole year to validate the effectiveness of the proposed methodology.


2021 ◽  
Author(s):  
Kieran Elmes ◽  
Astra Heywood ◽  
Zhiyi Huang ◽  
Alex Gavryushkin

AbstractLarge-scale genotype-phenotype screens provide a wealth of data for identifying molecular alternations associated with a phenotype. Epistatic effects play an important role in such association studies. For example, siRNA perturbation screens can be used to identify pairwise gene-silencing effects. In bacteria, epistasis has practical consequences in determining antimicrobial resistance as the genetic background of a strain plays an important role in determining resistance. Existing computational tools which account for epistasis do not scale to human exome-wide screens and struggle with genetically diverse bacterial species such as Pseudomonas aeruginosa. Combining earlier work in interaction detection with recent advances in integer compression, we present a method for epistatic interaction detection on sparse (human) exome-scale data, and an R implementation in the package Pint. Our method takes advantage of sparsity in the input data and recent progress in integer compression to perform lasso-penalised linear regression on all pairwise combinations of the input, estimating up to 200 million potential effects, including epistatic interactions. Hence the human exome is within the reach of our method, assuming one parameter per gene and one parameter per epistatic effect for every pair of genes. We demonstrate Pint on both simulated and real data sets, including antibiotic resistance testing and siRNA perturbation screens.


Sign in / Sign up

Export Citation Format

Share Document