Predicting the NHL playoffs with PageRank

Author(s):  
Nathan Swanson ◽  
Donald Koban ◽  
Patrick Brundage

AbstractApplying Google’s PageRank model to sports is a popular concept in contemporary sports ranking. However, there is limited evidence that rankings generated with PageRank models do well at predicting the winners of playoffs series. In this paper, we use a PageRank model to predict the outcomes of the 2008–2016 NHL playoffs. Unlike previous studies that use a uniform personalization vector, we incorporate Corsi statistics into a personalization vector, use a nine-fold cross validation to identify tuning parameters, and evaluate the prediction accuracy of the tuned model. We found our ratings had a 70% accuracy for predicting the outcome of playoff series, outperforming the Colley, Massey, Bradley-Terry, Maher, and Generalized Markov models by 5%. The implication of our results is that fitting parameter values and adding a personalization vector can lead to improved performance when using PageRank models.

2021 ◽  
Vol 17 (2) ◽  
pp. e1008767
Author(s):  
Zutan Li ◽  
Hangjin Jiang ◽  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Kun Lang ◽  
...  

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.


Author(s):  
Navneet Kaur ◽  
Santosh Srivatsav ◽  
Nemiraja Jadiyappa ◽  
Parneet Kaur

Modern portfolio theory claims that diversification into non-correlated or negatively correlated activities reduces the overall risk of a portfolio. Considering the total income of a bank as a portfolio of interest income and non-interest income, this paper investigates how the variability of interest income and non- interest income, and covariance between interest income and non-interest income influence the various risk factors of banks. We set out a study in the Indian context. We have extracted data for the period 2005-2017 and employed an extended version of Ridge, Lasso and Elastics Net regression to take care of multi-collinearly in our data. We have considered 10-fold cross-validation techniques to get optimal values of tuning parameters for Ridge, Lasso, and Elastics Net regression (which is a convex combination of ridge and the LASSO). We have compared different regression techniques by comparing RMSE and R2. We observe that non-interest income is positively correlated with interest income in the Indian context, but it does stabilize variance, idiosyncratic risk & market risk (Beta) of Indian Banks.


2020 ◽  
Vol 10 (7) ◽  
pp. 2265-2273 ◽  
Author(s):  
Ahmad H. Sallam ◽  
Emily Conley ◽  
Dzianis Prakapenka ◽  
Yang Da ◽  
James A. Anderson

The use of haplotypes may improve the accuracy of genomic prediction over single SNPs because haplotypes can better capture linkage disequilibrium and genomic similarity in different lines and may capture local high-order allelic interactions. Additionally, prediction accuracy could be improved by portraying population structure in the calibration set. A set of 383 advanced lines and cultivars that represent the diversity of the University of Minnesota wheat breeding program was phenotyped for yield, test weight, and protein content and genotyped using the Illumina 90K SNP Assay. Population structure was confirmed using single SNPs. Haplotype blocks of 5, 10, 15, and 20 adjacent markers were constructed for all chromosomes. A multi-allelic haplotype prediction algorithm was implemented and compared with single SNPs using both k-fold cross validation and stratified sampling optimization. After confirming population structure, the stratified sampling improved the predictive ability compared with k-fold cross validation for yield and protein content, but reduced the predictive ability for test weight. In all cases, haplotype predictions outperformed single SNPs. Haplotypes of 15 adjacent markers showed the best improvement in accuracy for all traits; however, this was more pronounced in yield and protein content. The combined use of haplotypes of 15 adjacent markers and training population optimization significantly improved the predictive ability for yield and protein content by 14.3 (four percentage points) and 16.8% (seven percentage points), respectively, compared with using single SNPs and k-fold cross validation. These results emphasize the effectiveness of using haplotypes in genomic selection to increase genetic gain in self-fertilized crops.


Author(s):  
Zhihao Ke ◽  
Xiaoning Liu ◽  
Yining Chen ◽  
Hongfu Shi ◽  
Zigang Deng

Abstract By the merits of self-stability and low energy consumption, high temperature superconducting (HTS) maglev has the potential to become a novel type of transportation mode. As a key index to guarantee the lateral self-stability of HTS maglev, guiding force has strong non-linearity and is determined by multitudinous factors, and these complexities impede its further researches. Compared to traditional finite element and polynomial fitting method, the prosperity of deep learning algorithms could provide another guiding force prediction approach, but the verification of this approach is still blank. Therefore, this paper establishes 5 different neural network models (RBF, DNN, CNN, RNN, LSTM) to predict HTS maglev guiding force, and compares their prediction efficiency based on 3720 pieces of collected data. Meanwhile, two adaptively iterative algorithms for parameters matrix and learning rate adjustment are proposed, which could effectively reduce computing time and unnecessary iterations. And according to the results, it is revealed that, the DNN model shows the best fitting goodness, while the LSTM model displays the smoothest fitting curve on guiding force prediction. Based on this discovery, the effects of learning rate and iterations on prediction accuracy of the constructed DNN model are studied. And the learning rate and iterations at the highest guiding force prediction accuracy are 0.00025 and 90000, respectively. Moreover, the K-fold cross validation method is also applied to this DNN model, whose result manifests the generalization and robustness of this DNN model. The imperative of K-fold cross validation method to ensure universality of guiding force prediction model is likewise assessed. This paper firstly combines HTS maglev guiding force prediction with deep learning algorithms considering different field cooling height, real-time magnetic flux density, liquid nitrogen temperature and motion direction of bulk. Additionally, this paper gives a convenient and efficient method for HTS guiding force prediction and parameter optimization.


2015 ◽  
Vol 11 (1) ◽  
pp. 13-19 ◽  
Author(s):  
Mohamed N. Triba ◽  
Laurence Le Moyec ◽  
Roland Amathieu ◽  
Corentine Goossens ◽  
Nadia Bouchemal ◽  
...  

In some cases, quality parameter values (the number of significant components,Q2, CV-ANOVAp-value,…) of PLS/OPLS models calculated with K-fold cross-validation can be strongly determined by the composition of the different validation subsets.


Author(s):  
Marcus O. Olatoye ◽  
Zhenbin Hu ◽  
Geoffrey P. Morris

AbstractModifying plant architecture is often necessary for yield improvement and climate adaptation, but we lack understanding of the genotype-phenotype map for plant morphology in sorghum. Here, we use a nested association mapping (NAM) population that captures global allelic diversity of sorghum to characterize the genetics of leaf erectness, leaf width (at two stages), and stem diameter. Recombinant inbred lines (n = 2200) were phenotyped in multiple environments (35,200 observations) and joint linkage mapping was performed with ∼93,000 markers. Fifty-four QTL of small to large effect were identified for trait BLUPs (9–16 per trait) each explaining 0.4–4% of variation across the NAM population. While some of these QTL colocalize with sorghum homologs of grass genes [e.g. involved in hormone synthesis (maize spi1), floral transition (SbCN8), and transcriptional regulation of development (rice Ideal plant architecture1)], most QTL did not colocalize with an a priori candidate gene (82%). Genomic prediction accuracy was generally high in five-fold cross-validation (0.65–0.83), and varied from low to high in leave-one-family-out cross-validation (0.04–0.61). The findings provide a foundation to identify the molecular basis of architecture variation in sorghum and establish genomic-enabled breeding for improved plant architecture.Core ideasUnderstanding the genetics of plant architecture could facilitate the development of crop ideotypes for yield and adaptationThe genetics of plant architecture traits was characterized in sorghum using multi-environment phenotyping in a global nested association mapping populationFifty-five quantitative trait loci were identified; some colocalize with homologs of known developmental regulators but most do notGenomic prediction accuracy was consistently high in five-fold cross-validation, but accuracy varied considerably in leave-one-family-out predictions


2021 ◽  
Vol 11 (16) ◽  
pp. 7731
Author(s):  
Rao Zeng ◽  
Minghong Liao

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.


2020 ◽  
Vol 25 (6) ◽  
pp. 4805-4830
Author(s):  
Davide Falessi ◽  
Jacky Huang ◽  
Likhita Narayana ◽  
Jennifer Fong Thai ◽  
Burak Turhan

Abstract We are in the shoes of a practitioner who uses previous project releases’ data to predict which classes of the current release are defect-prone. In this scenario, the practitioner would like to use the most accurate classifier among the many available ones. A validation technique, hereinafter “technique”, defines how to measure the prediction accuracy of a classifier. Several previous research efforts analyzed several techniques. However, no previous study compared validation techniques in the within-project across-release class-level context or considered techniques that preserve the order of data. In this paper, we investigate which technique recommends the most accurate classifier. We use the last release of a project as the ground truth to evaluate the classifier’s accuracy and hence the ability of a technique to recommend an accurate classifier. We consider nine classifiers, two industry and 13 open projects, and three validation techniques: namely 10-fold cross-validation (i.e., the most used technique), bootstrap (i.e., the recommended technique), and walk-forward (i.e., a technique preserving the order of data). Our results show that: 1) classifiers differ in accuracy in all datasets regardless of their entity per value, 2) walk-forward outperforms both 10-fold cross-validation and bootstrap statistically in all three accuracy metrics: AUC of the selected classifier, bias and absolute bias, 3) surprisingly, all techniques resulted to be more prone to overestimate than to underestimate the performances of classifiers, and 3) the defect rate resulted in changing between the second and first half in both industry projects and 83% of open-source datasets. This study recommends the use of techniques that preserve the order of data such as walk-forward over 10-fold cross-validation and bootstrap in the within-project across-release class-level context given the above empirical results and that walk-forward is by nature more simple, inexpensive, and stable than the other two techniques.


2019 ◽  
Author(s):  
Zutan Li ◽  
Hangjin Jiang ◽  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Liangyun Zhang ◽  
...  

ABSTRACTN6-methyladenin(6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for understanding of 6mA’s biological functions. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca, and Rosa chinensis, with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


Sign in / Sign up

Export Citation Format

Share Document