scholarly journals skater: An R package for SNP-based Kinship Analysis, Testing, and Evaluation

2021 ◽  
Author(s):  
Stephen D. Turner ◽  
V. P. Nagraj ◽  
Matthew Scholz ◽  
Shakeel Jessa ◽  
Carlos Acevedo ◽  
...  

Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream processing and manipulation. A dedicated software package that consistently and intuitively implements this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.

F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 18
Author(s):  
Stephen D. Turner ◽  
V.P. Nagraj ◽  
Matthew Scholz ◽  
Shakeel Jessa ◽  
Carlos Acevedo ◽  
...  

Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.


2014 ◽  
Vol 17 (4) ◽  
Author(s):  
Raymond K. Walters ◽  
Charles Laurin ◽  
Gitta H. Lubke

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2012 ◽  
Vol 13 (10) ◽  
pp. R87 ◽  
Author(s):  
Altuna Akalin ◽  
Matthias Kormaksson ◽  
Sheng Li ◽  
Francine E Garrett-Bakelman ◽  
Maria E Figueroa ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bing Song ◽  
August E. Woerner ◽  
John Planz

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html.


2021 ◽  
Author(s):  
Qingqing Chen ◽  
Ate Poorthuis

Identifying meaningful locations, such as home or work, from human mobility data has become an increasingly common prerequisite for geographic research. Although location-based services (LBS) and other mobile technology have rapidly grown in recent years, it can be challenging to infer meaningful places from such data, which - compared to conventional datasets – can be devoid of context. Existing approaches are often developed ad-hoc and can lack transparency and reproducibility. To address this, we introduce an R software package for inferring home locations from LBS data. The package implements pre-existing algorithms and provides building blocks to make writing algorithmic ‘recipes’ more convenient. We evaluate this approach by analyzing a de-identified LBS dataset from Singapore that aims to balance ethics and privacy with the research goal of identifying meaningful locations. We show that ensemble approaches, combining multiple algorithms, can be especially valuable in this regard as the resulting patterns of inferred home locations closely correlate with the distribution of residential population. We hope this package, and others like it, will contribute to an increase in use and sharing of comparable algorithms, research code and data. This will increase transparency and reproducibility in mobility analyses and further the ongoing discourse around ethical big data research.


2019 ◽  
Author(s):  
Ying Sheng ◽  
Chiung-Yu Huang ◽  
Siarhei Lobach ◽  
Lydia Zablotska ◽  
Iryna Lobach ◽  
...  

ABSTRACTLarge-scale genome-wide analyses scans provide massive volumes of genetic variants on large number of cases and controls that can be used to estimate the genetic effects. Yet, the sets of non-genetic variables available in publicly available databases are often brief. It is known that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR) (e.g., Gail et al (1984), Neuhaus et al (1993), Hauck et al (1991), Zeger et al (1988)). We are interested to assess what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable or knowing the distribution of the omitted variable. Importantly, our derivations show that omitting a continuous variable can result in either under- or over-estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false positive rate, and power in the model that omits a continuous variable. We show the application to two genome-wide studies of Alzheimer’s disease.Data Availability StatementThe data that support the findings of this study are openly available in the Database of Genotypes and Phenotypes at [https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000372.v1.p1], reference number [phs000372.v1.p1] and at the Alzheimer’s Disease Neuroimaging Initiative http://adni.loni.usc.edu/.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
B .V Binoy ◽  
M. A Naseer ◽  
P.P Anil Kumar ◽  
Nina Lazar

Purpose Real estate valuation studies gained popularity with the availability of large-scale property transaction data in the latter part of the twentieth century. Hedonic price modeling (HPM) was the most popular method in the initial years until it was taken over by advanced modeling methods in the twenty-first century. Even though there exist a few literature reviews on this topic, no comprehensive bibliometric analysis is conducted in this area. In view of gaining a better understanding of the dynamics of property valuation studies, this paper aims to conduct a bibliometric analysis. Design/methodology/approach A comprehensive search in the Scopus database, followed by detailed screening resulted in 1,400 articles. The identified research articles spanning over five decades (1964–2019) are analyzed using the open-source R package “bibliometrix.” Findings The study found the USA to be the most productive country in various aspects, such as number of publications, number of authors and publication hotspots. The findings also demonstrate assessments on the publication trends, journals, citations, keywords, co-citation and collaboration networks. It was observed that there exists an upsurge in the number of publications after the year 2000 owing to improved data availability and better modeling techniques. Research limitations/implications This study is significant in understanding the major research areas and modeling techniques used in property valuation. Future studies can incorporate multiple database sources and include more articles. Originality/value The current study is one of the first bibliometric studies on property valuation. Previous studies have not explored the possibilities of geographic information system in bibliometric research. Spatial mapping and analysis of publications provide a geographical perspective of valuation research.


2020 ◽  
Vol 36 (15) ◽  
pp. 4374-4376
Author(s):  
Ninon Mounier ◽  
Zoltán Kutalik

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document