Parallel graph Laplacian for large datasets

Author(s):  
Abhishek ◽  
Eti Goel ◽  
Tulika Saxena ◽  
Shekhar Verma
2017 ◽  
Vol 25 (2) ◽  
pp. 927-960
Author(s):  
Jarod Jacobs

In this article, I discuss three statistical tools that have proven pivotal in linguistic research, particularly those studies that seek to evaluate large datasets. These tools are the Gaussian Curve, significance tests, and hierarchical clustering. I present a brief description of these tools and their general uses. Then, I apply them to an analysis of the variations between the “biblical” DSS and our other witnesses, focusing upon variations involving particles. Finally, I engage the recent debate surrounding the diachronic study of Biblical Hebrew. This article serves a dual function. First, it presents statistical tools that are useful for many linguistic studies. Second, it develops an analysis of the he-locale, as it is used in the “biblical” Dead Sea Scrolls, Masoretic Text, and Samaritan Pentateuch. Through that analysis, this article highlights the value of inferential statistical tools as we attempt to better understand the Hebrew of our ancient witnesses.


2018 ◽  
Author(s):  
Andrew Dalke ◽  
Jerome Hert ◽  
Christian Kramer

We present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. mmpdb is freely available.


2012 ◽  
Vol 38 (11) ◽  
pp. 1831
Author(s):  
Wen-Jun HU ◽  
Shi-Tong WANG ◽  
Juan WANG ◽  
Wen-Hao YING

Author(s):  
Mark Newman

An introduction to the mathematical tools used in the study of networks. Topics discussed include: the adjacency matrix; weighted, directed, acyclic, and bipartite networks; multilayer and dynamic networks; trees; planar networks. Some basic properties of networks are then discussed, including degrees, density and sparsity, paths on networks, component structure, and connectivity and cut sets. The final part of the chapter focuses on the graph Laplacian and its applications to network visualization, graph partitioning, the theory of random walks, and other problems.


2020 ◽  
Vol 196 ◽  
pp. 105777
Author(s):  
Jadson Jose Monteiro Oliveira ◽  
Robson Leonardo Ferreira Cordeiro

2019 ◽  
Vol 7 (2) ◽  
pp. 24
Author(s):  
Aju J. Fenn ◽  
Lucas Gerdes ◽  
Samuel Rothstein

Using data from 2005 to 2016, this paper examines if players in the National Hockey League (NHL) are being paid a positive differential for their services due to the competition from the Kontinental Hockey League (KHL) and the Swedish Hockey League (SHL). In order to control for performance, we use two different large datasets, (N = 4046) and (N = 1717). In keeping with the existing literature, we use lagged performance statistics and dummy variables to control for the type of NHL contract. The first dataset contains lagged career performance statistics, while the performance statistics are based on the statistics generated during the years under the player’s previous contract. Fixed effects least squares (FELS) and quantile regression results suggest that player production statistics, contract status, and country of origin are significant determinants of NHL player salaries.


Author(s):  
Jürgen Jost ◽  
Raffaella Mulas ◽  
Florentin Münch

AbstractWe offer a new method for proving that the maxima eigenvalue of the normalized graph Laplacian of a graph with n vertices is at least $$\frac{n+1}{n-1}$$ n + 1 n - 1 provided the graph is not complete and that equality is attained if and only if the complement graph is a single edge or a complete bipartite graph with both parts of size $$\frac{n-1}{2}$$ n - 1 2 . With the same method, we also prove a new lower bound to the largest eigenvalue in terms of the minimum vertex degree, provided this is at most $$\frac{n-1}{2}$$ n - 1 2 .


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Danqing Xu ◽  
Chen Wang ◽  
Atlas Khan ◽  
Ning Shang ◽  
Zihuai He ◽  
...  

AbstractLabeling clinical data from electronic health records (EHR) in health systems requires extensive knowledge of human expert, and painstaking review by clinicians. Furthermore, existing phenotyping algorithms are not uniformly applied across large datasets and can suffer from inconsistencies in case definitions across different algorithms. We describe here quantitative disease risk scores based on almost unsupervised methods that require minimal input from clinicians, can be applied to large datasets, and alleviate some of the main weaknesses of existing phenotyping algorithms. We show applications to phenotypic data on approximately 100,000 individuals in eMERGE, and focus on several complex diseases, including Chronic Kidney Disease, Coronary Artery Disease, Type 2 Diabetes, Heart Failure, and a few others. We demonstrate that relative to existing approaches, the proposed methods have higher prediction accuracy, can better identify phenotypic features relevant to the disease under consideration, can perform better at clinical risk stratification, and can identify undiagnosed cases based on phenotypic features available in the EHR. Using genetic data from the eMERGE-seq panel that includes sequencing data for 109 genes on 21,363 individuals from multiple ethnicities, we also show how the new quantitative disease risk scores help improve the power of genetic association studies relative to the standard use of disease phenotypes. The results demonstrate the effectiveness of quantitative disease risk scores derived from rich phenotypic EHR databases to provide a more meaningful characterization of clinical risk for diseases of interest beyond the prevalent binary (case-control) classification.


Sign in / Sign up

Export Citation Format

Share Document