scholarly journals A linear-time algorithm that avoids inverses and computes jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups

2020 ◽  
Author(s):  
John Spouge ◽  
Joseph M. Ziegelbauer ◽  
Mileidy Gonzalez

Abstract [Please see the manuscript file pdf to view the full abstract.]Background: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given elements in a set with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products ( ).Results: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like ; its novel downward phase mirrors the upward phase while exploiting the symmetry of and its complement . The algorithm requires storage for elements of and only about products. In contrast, the standard segment tree algorithms require about products for construction and products for calculating each , i.e., about products in total; and a naïve quadratic algorithm using element-by-element products to compute each requires products.Conclusions: In the herpesvirus application, the Jackknife Product algorithm required 15 minutes; standard segment tree algorithms would have taken an estimated 3 hours; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.

2020 ◽  
Author(s):  
John Spouge ◽  
Joseph M. Ziegelbauer ◽  
Mileidy Gonzalez

Abstract Background: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given n elements g0,g1,...gn-1 in a set with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products gbar;=g0,g1,...gj-1 g j+1...gn-1 (0&le;j<n).Results: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like g[i,j)=gigi+1...gj-1; its novel downward phase mirrors the upward phase while exploiting the symmetry of and its complement gbar;j. The algorithm requires storage for elements of and only about products. In contrast, the standard segment tree algorithms require about n products for construction and log2 n products for calculating each gbar;j, i.e., about products n log n in total; and a naïve quadratic algorithm using n-2 element-by-element products to compute each gbar;j requires n (n-2) products.Conclusions: In the herpesvirus application, the Jackknife Product algorithm required 15 minutes; standard segment tree algorithms would have taken an estimated 3 hours; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.


2020 ◽  
Author(s):  
John Spouge ◽  
Joseph M. Ziegelbauer ◽  
Mileidy Gonzalez

Abstract Background: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given elements in a set with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products ( ).Results: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like ; its novel downward phase mirrors the upward phase while exploiting the symmetry of and its complement . The algorithm requires storage for elements of and only about products. In contrast, the standard segment tree algorithms require about products for construction and products for calculating each , i.e., about products in total; and a naïve quadratic algorithm using element-by-element products to compute each requires products.Conclusions: In the herpesvirus application, the Jackknife Product algorithm required 15 minutes; standard segment tree algorithms would have taken an estimated 3 hours; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
John L. Spouge ◽  
Joseph M. Ziegelbauer ◽  
Mileidy Gonzalez

Abstract Background Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given $$n$$ n elements $$g_{0} ,g_{1} , \ldots ,g_{n - 1}$$ g 0 , g 1 , … , g n - 1 in a set $$G$$ G with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products $$\bar{g}_{j} = g_{0} g_{1} \cdots g_{j - 1} g_{j + 1} \cdots g_{n - 1}$$ g ¯ j = g 0 g 1 ⋯ g j - 1 g j + 1 ⋯ g n - 1 ($$0 \le j < n$$ 0 ≤ j < n ). Results This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like $$g_{{\left[ {i,j} \right)}} = g_{i} g_{i + 1} \cdots g_{j - 1}$$ g i , j = g i g i + 1 ⋯ g j - 1 ; its novel downward phase mirrors the upward phase while exploiting the symmetry of $$g_{j}$$ g j and its complement $$\bar{g}_{j}$$ g ¯ j . The algorithm requires storage for $$2n$$ 2 n elements of $$G$$ G and only about $$3n$$ 3 n products. In contrast, the standard segment tree algorithms require about $$n$$ n products for construction and $$\log_{2} n$$ log 2 n products for calculating each $$\bar{g}_{j}$$ g ¯ j , i.e., about $$n\log_{2} n$$ n log 2 n products in total; and a naïve quadratic algorithm using $$n - 2$$ n - 2 element-by-element products to compute each $$\bar{g}_{j}$$ g ¯ j requires $$n\left( {n - 2} \right)$$ n n - 2 products. Conclusions In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.


2017 ◽  
Author(s):  
Florian Wagner

The nonparametric minimum hypergeometric (mHG) test is a popular alternative to Kolmogorov-Smirnov (KS)-type tests for determining gene set enrichment. However, these approaches have not been compared to each other in a quantitative manner. Here, I first perform a simulation study to show that the mHG test is significantly more powerful than the one-sided KS test for detecting gene set enrichment. I then illustrate a shortcoming of the mHG test, which has motivated a semiparametric generalization of the test, termed the XL-mHG test. I describe an improved quadratic-time algorithm for the efficient calculation of exact XL-mHG p-values, as well as a linear-time algorithm for calculating a tighter upper bound for the p-value. Finally, I demonstrate that the XL-mHG test outperforms the one-sided KS test when applied to a reference gene expression study, and discuss general principles for analyzing gene set enrichment using the XL-mHG test. An efficient open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.


Author(s):  
Florian Wagner

The nonparametric minimum hypergeometric (mHG) test is a popular alternative to Kolmogorov-Smirnov (KS)-type tests for determining gene set enrichment. However, these approaches have not been compared to each other in a quantitative manner. Here, I first perform a simulation study to show that the mHG test is significantly more powerful than the one-sided KS test for detecting gene set enrichment. I then illustrate a shortcoming of the mHG test, which has motivated a semiparametric generalization of the test, termed the XL-mHG test. I describe an improved quadratic-time algorithm for the efficient calculation of exact XL-mHG p-values, as well as a linear-time algorithm for calculating a tighter upper bound for the p-value. Finally, I demonstrate that the XL-mHG test outperforms the one-sided KS test when applied to a reference gene expression study, and discuss general principles for analyzing gene set enrichment using the XL-mHG test. An efficient open-source Python/Cython implementation of the XL-mHG test is provided in the xlmhg package, available from PyPI and GitHub (https://github.com/flo-compbio/xlmhg) under an OSI-approved license.


2019 ◽  
Vol 17 ◽  
Author(s):  
Xiaoli Yu ◽  
Lu Zhang ◽  
Na Li ◽  
Peng Hu ◽  
Zhaoqin Zhu ◽  
...  

Aim: We aimed to identify new plasma biomarkers for the diagnosis of Pulmonary tuberculosis. Background: Tuberculosis is an ancient infectious disease that remains one of the major global health problems. Until now, effective, convenient, and affordable methods for diagnosis of Pulmonary tuberculosis were still lacked. Objective: This study focused on construct a label-free LC-MS/MS based comparative proteomics between six tuberculosis patients and six healthy controls to identify differentially expressed proteins (DEPs) in plasma. Method: To reduce the influences of high-abundant proteins, albumin and globulin were removed from plasma samples using affinity gels. Then DEPs from the plasma samples were identified using a label-free Quadrupole-Orbitrap LC-MS/MS system. The results were analyzed by the protein database search algorithm SEQUEST-HT to identify mass spectra to peptides. The predictive abilities of combinations of host markers were investigated by general discriminant analysis (GDA), with leave-one-out cross-validation. Results: A total of 572 proteins were identified and 549 proteins were quantified. The threshold for differentially expressed protein was set as adjusted p-value < 0.05 and fold change ≥1.5 or ≤0.6667, 32 DEPs were found. ClusterVis, TBtools, and STRING were used to find new potential biomarkers of PTB. Six proteins, LY6D, DSC3, CDSN, FABP5, SERPINB12, and SLURP1, which performed well in the LOOCV method validation, were termed as potential biomarkers. The percentage of cross-validated grouped cases correctly classified and original grouped cases correctly classified is greater than or equal to 91.7%. Conclusion: We successfully identified five candidate biomarkers for immunodiagnosis of PTB in plasma, LY6D, DSC3, CDSN, SERPINB12, and SLURP1. Our work supported this group of proteins as potential biomarkers for pulmonary tuberculosis, and be worthy of further validation.


Mathematics ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 293
Author(s):  
Xinyue Liu ◽  
Huiqin Jiang ◽  
Pu Wu ◽  
Zehui Shao

For a simple graph G=(V,E) with no isolated vertices, a total Roman {3}-dominating function(TR3DF) on G is a function f:V(G)→{0,1,2,3} having the property that (i) ∑w∈N(v)f(w)≥3 if f(v)=0; (ii) ∑w∈N(v)f(w)≥2 if f(v)=1; and (iii) every vertex v with f(v)≠0 has a neighbor u with f(u)≠0 for every vertex v∈V(G). The weight of a TR3DF f is the sum f(V)=∑v∈V(G)f(v) and the minimum weight of a total Roman {3}-dominating function on G is called the total Roman {3}-domination number denoted by γt{R3}(G). In this paper, we show that the total Roman {3}-domination problem is NP-complete for planar graphs and chordal bipartite graphs. Finally, we present a linear-time algorithm to compute the value of γt{R3} for trees.


1976 ◽  
Author(s):  
A. K. Jones ◽  
R. J. Lipton ◽  
L. Snyder

Sign in / Sign up

Export Citation Format

Share Document