scholarly journals lassosum2: an updated version complementing LDpred2

2021 ◽  
Author(s):  
Florian Privé ◽  
Bjarni J. Vilhjálmsson ◽  
Timothy S. H. Mak

AbstractWe present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.

2020 ◽  
Vol 16 (3) ◽  
pp. 1061-1074 ◽  
Author(s):  
Jörg Franke ◽  
Veronika Valler ◽  
Stefan Brönnimann ◽  
Raphael Neukom ◽  
Fernando Jaume-Santero

Abstract. Differences between paleoclimatic reconstructions are caused by two factors: the method and the input data. While many studies compare methods, we will focus in this study on the consequences of the input data choice in a state-of-the-art Kalman-filter paleoclimate data assimilation approach. We evaluate reconstruction quality in the 20th century based on three collections of tree-ring records: (1) 54 of the best temperature-sensitive tree-ring chronologies chosen by experts; (2) 415 temperature-sensitive tree-ring records chosen less strictly by regional working groups and statistical screening; (3) 2287 tree-ring series that are not screened for climate sensitivity. The three data sets cover the range from small sample size, small spatial coverage and strict screening for temperature sensitivity to large sample size and spatial coverage but no screening. Additionally, we explore a combination of these data sets plus screening methods to improve the reconstruction quality. A large, unscreened collection generally leads to a poor reconstruction skill. A small expert selection of extratropical Northern Hemisphere records allows for a skillful high-latitude temperature reconstruction but cannot be expected to provide information for other regions and other variables. We achieve the best reconstruction skill across all variables and regions by combining all available input data but rejecting records with insignificant climatic information (p value of regression model >0.05) and removing duplicate records. It is important to use a tree-ring proxy system model that includes both major growth limitations, temperature and moisture.


Author(s):  
Samara F. Kiihl ◽  
Maria Jose Martinez-Garrido ◽  
Arce Domingo-Relloso ◽  
Jose Bermudez ◽  
Maria Tellez-Plaza

Abstract Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which provides maximum likelihood estimates (MLE) of 5-mC and 5-hmC proportions. While all other available R packages provide 5-mC and 5-hmC MLEs only for the oxBS+BS combination, MLML2R also provides MLE for TAB combinations. For combinations of any two of the methods, we derived the pool-adjacent-violators algorithm (PAVA) exact constrained MLE in analytical form. For the three methods combination, we implemented both the iterative method by Qu et al. [Qu, J., M. Zhou, Q. Song, E. E. Hong and A. D. Smith (2013): “Mlml: consistent simultaneous estimates of dna methylation and hydroxymethylation,” Bioinformatics, 29, 2645–2646.], and also a novel non iterative approximation using Lagrange multipliers. The newly proposed non iterative solutions greatly decrease computational time, common bottlenecks when processing high-throughput data. The MLML2R package is flexible as it takes as input both, preprocessed intensities from Infinium Methylation arrays and counts from Next Generation Sequencing technologies. The MLML2R package is freely available at https://CRAN.R-project.org/package=MLML2R.


2019 ◽  
Vol 35 (24) ◽  
pp. 5146-5154 ◽  
Author(s):  
Joanna Zyla ◽  
Michal Marczyk ◽  
Teresa Domaszewska ◽  
Stefan H E Kaufmann ◽  
Joanna Polanska ◽  
...  

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zheng Li ◽  
Robert Kluger ◽  
Xianbiao Hu ◽  
Yao-Jan Wu ◽  
Xiaoyu Zhu

The primary objective of this study was to increase the sample size of public probe vehicle-based arterial travel time estimation. The complete methodology of increasing sample size using incomplete trajectory was built based on a k-Nearest Neighbors ( k-NN) regression algorithm. The virtual travel time of an incomplete trajectory was represented by similar complete trajectories. As incomplete trajectories were not used to calculate travel time in previous studies, the sample size of travel time estimation can be increased without collecting extra data. A case study was conducted on a major arterial in the city of Tucson, Arizona, including 13 links. In the case study, probe vehicle data were collected from a smartphone application used for navigation and guidance. The case study showed that the method could significantly increase link travel time samples, but there were still limitations. In addition, sensitivity analysis was conducted using leave-one-out cross-validation to verify the performance of the k-NN model under different parameters and input data. The data analysis showed that the algorithm performed differently under different parameters and input data. Our study suggested optimal parameters should be selected using a historical dataset before real-world application.


2014 ◽  
Vol 42 (15) ◽  
pp. e121-e121 ◽  
Author(s):  
Hari Krishna Yalamanchili ◽  
Zhaoyuan Li ◽  
Panwen Wang ◽  
Maria P. Wong ◽  
Jianfeng Yao ◽  
...  

Abstract Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet.


Author(s):  
Lalu Zulfikar Muslim ◽  
I Gede Pasek Suta Wijaya ◽  
Fitri Bimantoro

the classification of fruit quality on a computer using image data is very necessary. In addition, this can also be used in making decisions and policies related to business strategies in the industry. In this research, the quality classification of watermelon was carried out using the Weighted K-Means Algorithm. The classification of watermelon fruit in this study was divided into three groups, namely fresh, medium, and rotten. The classification process in the system created is divided into two stages, namely training and examinations.The data that is input into the system is watermelon image data in YCbCr format. In the training phase, the input data that is processed is image data that has been classified. As for the testing/classification phase, the input data processed is an arbitrary image that has not been classified.The results of the classification with watermelon case studies using the weighted k-means algorithm obtained a conclusion that the greater the amount of training data, the computing time needed for the training and testing process will increase, as well as the level of accuracy, precision and recall of the classification results obtained will also get better. While the greater the number of k values, the computational time needed for the training and testing process will increase, but the level of accuracy, precision, and recall of the results of the classification that gets smaller.


2021 ◽  
Vol 17 (7) ◽  
pp. e1009131
Author(s):  
Maciej Migdal ◽  
Dan Fu Ruan ◽  
William F. Forrest ◽  
Amir Horowitz ◽  
Christian Hammer

Human immunogenetic variation in the form of HLA and KIR types has been shown to be strongly associated with a multitude of immune-related phenotypes. However, association studies involving immunogenetic loci most commonly involve simple analyses of classical HLA allelic diversity, resulting in limitations regarding the interpretability and reproducibility of results. We here present MiDAS, a comprehensive R package for immunogenetic data transformation and statistical analysis. MiDAS recodes input data in the form of HLA alleles and KIR types into biologically meaningful variables, allowing HLA amino acid fine mapping, analyses of HLA evolutionary divergence as well as experimentally validated HLA-KIR interactions. Further, MiDAS enables comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS thus closes the gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to immune and disease biology. It is freely available under a MIT license.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Furqan Awan ◽  
Muhammad Muddassir Ali ◽  
Muhammad Hamid ◽  
Muhammad Huzair Awan ◽  
Muhammad Hassan Mushtaq ◽  
...  

The main aim of this study was to develop a set of functions that can analyze the genomic data with less time consumption and memory. Epi-gene is presented as a solution to large sequence file handling and computational time problems. It uses less time and less programming skills in order to work with a large number of genomes. In the current study, some features of the Epi-gene R-package were described and illustrated by using a dataset of the 14 Aeromonas hydrophila genomes. The joining, relabeling, and conversion functions were also included in this package to handle the FASTA formatted sequences. To calculate the subsets of core genes, accessory genes, and unique genes, various Epi-gene functions have been used. Heat maps and phylogenetic genome trees were also constructed. This whole procedure was completed in less than 30 minutes. This package can only work on Windows operating systems. Different functions from other packages such as dplyr and ggtree were also used that were available in R computing environment.


2021 ◽  
Author(s):  
Marton Kovacs ◽  
Don van Ravenzwaaij ◽  
Rink Hoekstra ◽  
Balazs Aczel

Planning sample size often requires researchers to identify a statistical technique and to make several choices during their calculations. Currently, there is a lack of clear guidelines for researchers to find and use the applicable procedure. In the present tutorial, we introduce a web app and R package that offer nine different procedures to determine and justify the sample size for independent two-group study designs. The application highlights the most important decision points for each procedure and suggests example justifications for them. The resulting sample size report can serve as a template for preregistrations and manuscripts.


2021 ◽  
Author(s):  
Ramiro Magno ◽  
Isabel Duarte ◽  
Ana -Teresa Maia

AbstractMotivationThe Polygenic Score (PGS) Catalog is a recently established open database of published polygenic scores that, to date, has collected, curated, and made available 721 polygenic scores from over 133 publications. The PGS Catalog REST API is the only method allowing programmatic access to this resource.ResultsHere, we describe quincunx, an R package that provides the first client interface to the PGS Catalog REST API. quincunx enables users to query and quickly retrieve, filter and integrate metadata associated with polygenic scores, as well as polygenic scoring files in tidy table format.Availabilityquincunx is freely available under an MIT License, and can be accessed from https://github.com/maialab/quincunx.


Sign in / Sign up

Export Citation Format

Share Document