scholarly journals VariantKey: A Reversible Numerical Representation of Human Genetic Variants

2018 ◽  
Author(s):  
Nicola Asuni ◽  
Steven Wilder

AbstractHuman genetic variants are usually represented by four values with variable length: chromosome, position, reference and alternate alleles. There is no guarantee that these components are represented in a consistent way across different data sources, and processing variant-based data can be inefficient because four different comparison operations are needed for each variant, three of which are string comparisons. Existing variant identifiers do not typically represent every possible variant we may be interested in, nor they are directly reversible. Similarly, genomic regions are typically represented inconsistently by three or four values. Working with strings, in contrast to numbers, poses extra challenges on computer memory allocation and data-representation. To overcome these limitations, a novel reversible numerical encoding schema for human genetic variants (VariantKey) and genomics regions (RegionKey), is presented here alongside a multi-language open-source software implementation (https://github.com/Genomicsplc/variantkey). VariantKey and RegionKey represents variants and regions as single 64 bit numeric entities, while preserving the ability to be searched and sorted by chromosome and position. The individual components of short variants can be directly read back from the VariantKey, while long variants are supported with a fast lookup table.

2018 ◽  
Author(s):  
Mark Y. Fang ◽  
Sebastian Markmiller ◽  
William E. Dowdle ◽  
Anthony Q. Vu ◽  
Paul J. Bushway ◽  
...  

ABSTRACTHuman genetic variants are usually represented by four values with variable length: chromosome, position, reference and alternate alleles. Thereis no guarantee that these components are represented in a consistent way across different data sources, and processing variant-based data can be inefficient because four different comparison operations are needed for each variant, three of which are string comparisons. Working with strings, in contrast to numbers, poses extra challenges on computer memory allocation and data-representation. Existing variant identifiers do not typicallyrepresent every possible variant we may be interested in, nor they are directly reversible. To overcome these limitations, VariantKey, a novel reversible numerical encoding schema for human genetic variants, is presented here alongside a multi-language open-source software implementation (http://github.com/genomicspls/variantkey). VariantKey represents variants as single 64 bit numeric entities, while preserving the ability to be searched and sorted by chromosome and position. The individual components of short variants can be directly read back from the VariantKey, while long variants are supported with a fast lookup table.Highlights~100 compounds identified by high-content screen inhibit SGs in HEK293, NPCs and iPS-MNs.ALS-associated RBPs are recruited to SGs in an RNA-dependent mannerMolecules with planar moieties prevent recruitment of ALS-associated RBPs to SGsCompounds inhibit TDP-43 accumulation in SGs and in TARDBP mutant iPS-MNs.


2008 ◽  
Vol 69 (9) ◽  
pp. 1510-1516 ◽  
Author(s):  
V. V. Mazalov ◽  
M. Tamaki ◽  
S. V. Vinnichenko

2015 ◽  
Vol 19 (4) ◽  
pp. 791-813 ◽  
Author(s):  
Zilia Iskoujina ◽  
Joanne Roberts

Purpose – This paper aims to add to the understanding of knowledge sharing in online communities through an investigation of the relationship between individual participant’s motivations and management in open source software (OSS) communities. Drawing on a review of literature concerning knowledge sharing in organisations, the factors that motivate participants to share their knowledge in OSS communities, and the management of such communities, it is hypothesised that the quality of management influences the extent to which the motivations of members actually result in knowledge sharing. Design/methodology/approach – To test the hypothesis, quantitative data were collected through an online questionnaire survey of OSS web developers with the aim of gathering respondents’ opinions concerning knowledge sharing, motivations to share knowledge and satisfaction with the management of OSS projects. Factor analysis, descriptive analysis, correlation analysis and regression analysis were used to explore the survey data. Findings – The analysis of the data reveals that the individual participant’s satisfaction with the management of an OSS project is an important factor influencing the extent of their personal contribution to a community. Originality/value – Little attention has been devoted to understanding the impact of management in OSS communities. Focused on OSS developers specialising in web development, the findings of this paper offer an important original contribution to understanding the connections between individual members’ satisfaction with management and their motivations to contribute to an OSS project. The findings reveal that motivations to share knowledge in online communities are influenced by the quality of management. Consequently, the findings suggest that appropriate management can enhance knowledge sharing in OSS projects and online communities, and organisations more generally.


2021 ◽  
Author(s):  
Fabio Calefato ◽  
Marco Aurelio Gerosa ◽  
Giuseppe Iaffaldano ◽  
Filippo Lanubile ◽  
Igor Fabio Steinmacher

Abstract Several Open-Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers’ inactive periods by analyzing the individual rhythm of contributions to the projects. Using this method, we quantitatively analyze the inactivity of core developers in 18 OSS organizations hosted on GitHub. We also survey core developers to receive their feedback about the identified breaks and transitions. Our results show that our method was effective for identifying developers’ breaks. About 94% of the surveyed core developers agreed with our state model of inactivity; 71% and 79% of them acknowledged their breaks and state transition, respectively. We also show that all core developers take breaks (at least once) and about a half of them (~ 45%) have completely disengaged from a project for at least one year. We also analyzed the probability of transitions to/from inactivity and found that developers who pause their activity have a ~ 35 to ~ 55% chance to return to an active state; yet, if the break lasts for a year or longer, then the probability of resuming activities drops to ~ 21–26%, with a ~ 54% chance of complete disengagement. These results may support the creation of policies and mechanisms to make OSS community managers aware of breaks and potential project abandonment.


Author(s):  
Nico Wunderling ◽  
Jonathan Krönke ◽  
Valentin Wohlfarth ◽  
Jan Kohler ◽  
Jobst Heitzig ◽  
...  

AbstractTipping elements occur in various systems such as in socio-economics, ecology and the climate system. In many cases, the individual tipping elements are not independent of each other, but they interact across scales in time and space. To model systems of interacting tipping elements, we here introduce the PyCascades open source software package for studying interacting tipping elements (10.5281/zenodo.4153102). PyCascades is an object-oriented and easily extendable package written in the programming language Python. It allows for investigating under which conditions potentially dangerous cascades can emerge between interacting dynamical systems, with a focus on tipping elements. With PyCascades it is possible to use different types of tipping elements such as double-fold and Hopf types and interactions between them. PyCascades can be applied to arbitrary complex network structures and has recently been extended to stochastic dynamical systems. This paper provides an overview of the functionality of PyCascades by introducing the basic concepts and the methodology behind it. In the end, three examples are discussed, showing three different applications of the software package. First, the moisture recycling network of the Amazon rainforest is investigated. Second, a model of interacting Earth system tipping elements is discussed. And third, the PyCascades modelling framework is applied to a global trade network.


Author(s):  
Frühling Rijsdijk ◽  
Paul F. O’Reilly

This chapter demonstrates the principles behind some of the major genetic study designs used in psychiatry research. The first part focuses on behavioural genetic designs, while the second part describes designs for ‘gene mapping’. Behavioural genetics examines the genetic basis of behavioural phenotypes, including both disorders and ‘normal’ dimensional traits. The theoretical basis is derived from population genetics, including properties such as segregation ratios, random mating, genetic variance, and genetic correlation between relatives. The second part of the chapter deals with gene mapping designs, in which specific genetic variants or genomic regions associated with a disorder or trait are identified. A brief outline of the most popular current approaches to the analysis of the genetics of complex human disorders is also provided.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 2-3
Author(s):  
Francisco A Paredes-Sanchez ◽  
Eduardo Casas ◽  
G M Parra-Bracamonte ◽  
W Arellano-Vera ◽  
David G Riley ◽  
...  

Abstract The objective of this study was to identify genomic regions and genes associated with beef cattle temperament. Temperament, measured as exit velocity (EV; m/s), was recorded in 1,370 Brahman cattle from Texas A&M AgriLife Research at Overton, TX. We identified two groups of temperament-contrasting animals. Cows were calm if their EV of 0.16–3.41 m/s and bulls if their EV was 0.4–3.12 m/s (n-119). Cows were temperamental if their EV was 3.55–7.66 m/s and bulls if their EV was 3.13–10.83 m/s (n = 79). The 198 animals were genotyped using the GGP-HD-150K chip. 139,376 SNPs were evaluated for association with temperament. 13 SNP′s were associated with EV (P < 4.0E-05). The SNPs GABRG2-26484, NRXN3-26436 and TBX20-191081 are located in introns of the GABRG2, NRXN3 and TBX20 genes, respectively. The GABRG2 gene encodes a GABA receptor, the major inhibitory neurotransmitter in the mammalian brain. The NRXN3 gene encodes receptor proteins related to chemical transmission at synapses. TBX20 is a member of the T-box transcription factor family expressed in the developing stages of heart, limbs, eye and ventral neural tube. To test the effect of these 3 SNP′s on EV, Pen-Score and Temperament-Score, a general linear model was fitted including the fixed effects of sex of calf and year of birth, and the individual effect of the 3 SNPs. The marker TBX20-191081 was associated with the three traits evaluated (P < 0.01), where the GG genotype was associated with the calmest temperament. The GG genotype had a significant effect on EV (P < 0.0001) that was 1.35 and 1.95 m/s slower than AG and AA, respectively. For TS, the GG genotype had a TS that was 1.41 and 1.24 DS less than those of the AA and GA genotypes. Our study indicates that genetic control of cattle temperament has a wide network of genes with divergent functions and genetic background specificity.


2020 ◽  
Vol 21 (2) ◽  
pp. 543 ◽  
Author(s):  
Berhanu Tadesse Ertiro ◽  
Michael Olsen ◽  
Biswanath Das ◽  
Manje Gowda ◽  
Maryke Labuschagne

Understanding the genetic basis of maize grain yield and other traits under low-nitrogen (N) stressed environments could improve selection efficiency. In this study, five doubled haploid (DH) populations were evaluated under optimum and N-stressed conditions, during the main rainy season and off-season in Kenya and Rwanda, from 2014 to 2015. Identifying the genomic regions associated with grain yield (GY), anthesis date (AD), anthesis-silking interval (ASI), plant height (PH), ear height (EH), ear position (EPO), and leaf senescence (SEN) under optimum and N-stressed environments could facilitate the use of marker-assisted selection to develop N-use-efficient maize varieties. DH lines were genotyped with genotyping by sequencing. A total of 13, 43, 13, 25, 30, 21, and 10 QTL were identified for GY, AD ASI, PH, EH, EPO, and SEN, respectively. For GY, PH, EH, and SEN, the highest number of QTL was found under low-N environments. No common QTL between optimum and low-N stressed conditions were identified for GY and ASI. For secondary traits, there were some common QTL for optimum and low-N conditions. Most QTL conferring tolerance to N stress was on a different chromosome position under optimum conditions.


2015 ◽  
Vol 27 (10) ◽  
pp. 2039-2096 ◽  
Author(s):  
Frank-Michael Schleif ◽  
Peter Tino

Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetric proximity measures that cannot easily be handled by classical learning algorithms. Major efforts have been undertaken to provide approaches that can either directly be used for such data or to make standard methods available for these types of data. We provide a comprehensive survey for the field of learning with nonmetric proximities. First, we introduce the formalism used in nonmetric spaces and motivate specific treatments for nonmetric proximity data. Second, we provide a systematization of the various approaches. For each category of approaches, we provide a comparative discussion of the individual algorithms and address complexity issues and generalization properties. In a summarizing section, we provide a larger experimental study for the majority of the algorithms on standard data sets. We also address the problem of large-scale proximity learning, which is often overlooked in this context and of major importance to make the method relevant in practice. The algorithms we discuss are in general applicable for proximity-based clustering, one-class classification, classification, regression, and embedding approaches. In the experimental part, we focus on classification tasks.


Sign in / Sign up

Export Citation Format

Share Document