Efficient haplotype matching between a query and a panel for genealogical search

Abstract Motivation With the wide availability of whole-genome genotype data, there is an increasing need for conducting genetic genealogical searches efficiently. Computationally, this task amounts to identifying shared DNA segments between a query individual and a very large panel containing millions of haplotypes. The celebrated Positional Burrows-Wheeler Transform (PBWT) data structure is a pre-computed index of the panel that enables constant time matching at each position between one haplotype and an arbitrarily large panel. However, the existing algorithm (Durbin’s Algorithm 5) can only identify set-maximal matches, the longest matches ending at any location in a panel, while in real genealogical search scenarios, multiple ‘good enough’ matches are desired. Results In this work, we developed two algorithmic extensions of Durbin’s Algorithm 5, that can find all L-long matches, matches longer than or equal to a given length L, between a query and a panel. In the first algorithm, PBWT-Query, we introduce ‘virtual insertion’ of the query into the PBWT matrix of the panel, and then scanning up and down for the PBWT match blocks with length greater than L. In our second algorithm, L-PBWT-Query, we further speed up PBWT-Query by introducing additional data structures that allow us to avoid iterating through blocks of incomplete matches. The efficiency of PBWT-Query and L-PBWT-Query is demonstrated using the simulated data and the UK Biobank data. Our results show that our proposed algorithms can detect related individuals for a given query efficiently in very large cohorts which enables a fast on-line query search. Availability and implementation genome.ucf.edu/pbwt-query Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

IBDkin: fast estimation of kinship coefficients from identity by descent segments

Bioinformatics ◽

10.1093/bioinformatics/btaa569 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4519-4520

Author(s):

Ying Zhou ◽

Sharon R Browning ◽

Brian L Browning

Keyword(s):

Software Package ◽

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Uk Biobank ◽

Identity By Descent ◽

Fast Estimation ◽

Kinship Coefficients ◽

Related Individuals ◽

The Uk

Abstract Motivation Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. Results We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. Availability and implementation https://github.com/YingZhou001/IBDkin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A non-linear regression method for estimation of gene-environment heritability

Bioinformatics ◽

10.1093/bioinformatics/btaa1079 ◽

2020 ◽

Author(s):

Matthew Kerin ◽

Jonathan Marchini

Keyword(s):

Linear Regression ◽

Simulated Data ◽

Real Data ◽

Regression Method ◽

Supplementary Information ◽

Linear Regression Method ◽

Uk Biobank ◽

Gene Environment ◽

Non Linear ◽

The Uk

Abstract Motivation Gene-environment (GxE) interactions are one of the least studied aspects of the genetic architecture of human traits and diseases. The environment of an individual is inherently high dimensional, evolves through time and can be expensive and time consuming to measure. The UK Biobank study, with all 500,000 participants having undergone an extensive baseline questionnaire, represents a unique opportunity to assess GxE heritability for many traits and diseases in a well powered setting. Results We have developed a randomized Haseman-Elston non-linear regression method applicable when many environmental variables have been measured on each individual. The method (GPLEMMA) simultaneously estimates a linear environmental score (ES) and its GxE heritability. We compare the method via simulation to a whole-genome regression approach (LEMMA) for estimating GxE heritability. We show that GPLEMMA is more computationally efficient than LEMMA on large datasets, and produces results highly correlated with those from LEMMA when applied to simulated data and real data from the UK Biobank. Availability Software implementing the GPLEMMA method is available from https://jmarchini.org/gplemma/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

French and Spanish Queer Film

10.3366/edinburgh/9780748699193.001.0001 ◽

2016 ◽

Cited By ~ 4

Author(s):

Chris Perriam ◽

Darren Waldron

Keyword(s):

Focus Groups ◽

Sense Of Community ◽

National Identities ◽

Audience Research ◽

Large Sample ◽

Current State ◽

Trans People ◽

Screening Questionnaires ◽

On Line ◽

The Uk

This book advances the current state of film audience research and of our knowledge of sexuality in transnational contexts, by analysing how French LGBTQ films are seen in Spain and Spanish ones in France, as well as how these films are seen in the UK. It studies films from various genres and examines their reception across four languages (Spanish, French, Catalan, English) and engages with participants across a range of digital and physical audience locations. A focus on LGBTQ festivals and on issues relating to LGBTQ experience in both countries allows for the consideration of issues such as ageing, sense of community and isolation, affiliation and investment, and the representation of issues affecting trans people. The book examines films that chronicle the local, national and sub-national identities while also addressing foreign audiences. It draws on a large sample of individual responses through post-screening questionnaires and focus groups as well as on the work of professional film critics and on-line commentators.

Download Full-text

Upgrading to nutrient removal by means of internal carbon from sludge hydrolysis

Water Science & Technology ◽

10.2166/wst.1994.0576 ◽

1994 ◽

Vol 29 (12) ◽

pp. 31-40 ◽

Cited By ~ 13

Author(s):

Pia Prohaska Brinch ◽

Kim Rindel ◽

Kathryn Kalb

Keyword(s):

Reaction Rate ◽

Wastewater Treatment Plants ◽

Secondary Treatment ◽

N Removal ◽

Sludge Hydrolysis ◽

Speed Up ◽

On Line ◽

Full Scale Tests ◽

Biological Phosphorus ◽

Hydrolysis Of

Due to the introduction of stricter nutrient effluent standards, many existing wastewater treatment plants performing only primary or secondary treatment are about to be upgraded. As the space available at the plants is, however, often limited, processes are required which will accommodate the need for increased treatment capacity without requiring much more space. In the hydrolysis of primary or pre-precipitated sludge direct-degradable organic carbon is produced which can speed up the reaction rate and increase both biological phosphorus and nitrogen removal. Full-scale tests with dosing of hydrolysate for biological P and N removal, respectively, have shown that this is a most viable process. The use of on-line monitoring has improved the process further.

Download Full-text

SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules

Nucleic Acids Research ◽

10.1093/nar/gkz382 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W357-W364 ◽

Cited By ~ 121

Author(s):

Antoine Daina ◽

Olivier Michielin ◽

Vincent Zoete

Keyword(s):

Small Molecules ◽

Predictive Performance ◽

Web Interface ◽

Protein Targets ◽

Human Proteins ◽

Speed Up ◽

On Line ◽

Efficient Prediction ◽

2D And 3D ◽

Similarity Thresholds

Abstract SwissTargetPrediction is a web tool, on-line since 2014, that aims to predict the most probable protein targets of small molecules. Predictions are based on the similarity principle, through reverse screening. Here, we describe the 2019 version, which represents a major update in terms of underlying data, backend and web interface. The bioactivity data were updated, the model retrained and similarity thresholds redefined. In the new version, the predictions are performed by searching for similar molecules, in 2D and 3D, within a larger collection of 376 342 compounds known to be experimentally active on an extended set of 3068 macromolecular targets. An efficient backend implementation allows to speed up the process that returns results for a druglike molecule on human proteins in 15–20 s. The refreshed web interface enhances user experience with new features for easy input and improved analysis. Interoperability capacity enables straightforward submission of any input or output molecule to other on-line computer-aided drug design tools, developed by the SIB Swiss Institute of Bioinformatics. High levels of predictive performance were maintained despite more extended biological and chemical spaces to be explored, e.g. achieving at least one correct human target in the top 15 predictions for >70% of external compounds. The new SwissTargetPrediction is available free of charge (www.swisstargetprediction.ch).

Download Full-text

They teach, but do they apply?

International Journal of Lean Six Sigma ◽

10.1108/ijlss-07-2017-0089 ◽

2019 ◽

Vol 10 (3) ◽

pp. 743-766

Author(s):

Anete Petrusch ◽

Guilherme Luís Roehe Vaccaro ◽

Juliane Luchese

Keyword(s):

Design Methodology ◽

Additional Data ◽

Effective Sample Size ◽

Exploratory Research ◽

Lean Thinking ◽

Content Type ◽

Administrative Services ◽

The Usa ◽

And Performance ◽

The Uk

Purpose Although discussed for more than 20 years, information about Lean adoption in higher education institutions (HEIs) is scarce, especially in developing countries. This research aims to investigate the degree of Lean thinking adoption on administrative services of Brazilian private HEIs. The results are compared to studies from USA and UK, highlighting the maturity on enablers, principles, tools and performance measures related to Lean. Design/methodology/approach A quantitative survey research was carried out. The instrument is adapted for HEIs from the proposal of Malmbrandt and Åhlström (2013) for Lean services. Cronbach’s alpha and factor analysis were used to validate the adapted instrument. Additional data analysis was based on non-parametric tests. Findings No evidence of broad implementation of Lean thinking in administrative processes of Brazilian private HEIs was found, with the adoption being incipient. The results are convergent to those presented by other studies in the USA and the UK. There is a gap between the existing knowledge about Lean in the academic sphere of the HEIs and its application on their academic processes. Research limitations/implications The effective sample size was of 47, despite contacts being sent to 2,090 institutions. This sample allows exploratory research, although further research is required. Results are adherent to those found in research from other countries. Originality/value The research presents descriptive and exploratory results regarding the adoption of Lean in Brazilian HEIs. No previous similar research was found in the literature.

Download Full-text

GEMPROT: visualization of the impact on the protein of the genetic variants found on each haplotype

Bioinformatics ◽

10.1093/bioinformatics/bty993 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2492-2494

Author(s):

Tania Cuppens ◽

Thomas E Ludwig ◽

Pascal Trouvé ◽

Emmanuelle Genin

Keyword(s):

Genetic Variants ◽

Protein Sequence ◽

Sequence Data ◽

Protein Sequences ◽

Supplementary Information ◽

Analysis Tool ◽

Functional Protein ◽

Key Players ◽

On Line ◽

The Impact

Abstract Summary When analyzing sequence data, genetic variants are considered one by one, taking no account of whether or not they are found in the same individual. However, variant combinations might be key players in some diseases as variants that are neutral on their own can become deleterious when associated together. GEMPROT is a new analysis tool that allows, from a phased vcf file, to visualize the consequences of the genetic variants on the protein. At the level of an individual, the program shows the variants on each of the two protein sequences and the Pfam functional protein domains. When data on several individuals are available, GEMPROT lists the haplotypes found in the sample and can compare the haplotype distributions between different sub-groups of individuals. By offering a global visualization of the gene with the genetic variants present, GEMPROT makes it possible to better understand the impact of combinations of genetic variants on the protein sequence. Availability and implementation GEMPROT is freely available at https://github.com/TaniaCuppens/GEMPROT. An on-line version is also available at http://med-laennec.univ-brest.fr/GEMPROT/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Analysis of On-Line Clinical Laboratory Manuals and Practical Recommendations

Archives of Pathology & Laboratory Medicine ◽

10.5858/2004-128-476-aooclm ◽

2004 ◽

Vol 128 (4) ◽

pp. 476-479

Author(s):

Bruce Beckwith ◽

Robert Schwartz ◽

Liron Pantanowitz

Keyword(s):

Reference Range ◽

Clinical Laboratory ◽

Critical Values ◽

Supplementary Information ◽

Reference Ranges ◽

Individual Test ◽

Test Methodology ◽

Laboratory Manuals ◽

On Line ◽

Data Elements

Abstract Context.—On-line clinical laboratory manuals are a valuable resource for medical professionals. To our knowledge, no recommendations currently exist for their content or design. Objective.—To analyze publicly accessible on-line clinical laboratory manuals and to propose guidelines for their content. Design.—We conducted an Internet search for clinical laboratory manuals written in English with individual test listings. Four individual test listings in each manual were evaluated for 16 data elements, including sample requirements, test methodology, units of measure, reference range, and critical values. Web sites were also evaluated for supplementary information and search functions. Results.—We identified 48 on-line laboratory manuals, including 24 academic or community hospital laboratories and 24 commercial or reference laboratories. All manuals had search engines and/or test indices. No single manual contained all 16 data elements evaluated. An average of 8.9 (56%) elements were present (range, 4–14). Basic sample requirements (specimen and volume needed) were the elements most commonly present (98% of manuals). The frequency of the remaining data elements varied from 10% to 90%. Conclusions.—On-line clinical laboratory manuals originate from both hospital and commercial laboratories. While most manuals were user-friendly and contained adequate specimen-collection information, other important elements, such as reference ranges, were frequently absent. To ensure that clinical laboratory manuals are of maximal utility, we propose the following 13 data elements be included in individual test listings: test name, synonyms, test description, test methodology, sample requirements, volume requirements, collection guidelines, transport guidelines, units of measure, reference range, critical values, test availability, and date of latest revision.

Download Full-text

Haplotype assembly of autotetraploid potato using integer linear programing

Bioinformatics ◽

10.1093/bioinformatics/btz060 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3279-3286 ◽

Cited By ~ 4

Author(s):

Enrico Siragusa ◽

Niina Haiminen ◽

Richard Finkers ◽

Richard Visser ◽

Laxmi Parida

Keyword(s):

Experimental Data ◽

Experimental Studies ◽

Simulated Data ◽

Supplementary Information ◽

Linear Programs ◽

Plant Genomics ◽

Optimal Method ◽

Haplotype Assembly ◽

Open Issue ◽

Removal Model

Abstract Summary Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. Availability and implementation Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A network of networks approach for modeling interconnected brain tissue-specific networks

Bioinformatics ◽

10.1093/bioinformatics/btz032 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3092-3101 ◽

Cited By ~ 1

Author(s):

Hideko Kawakubo ◽

Yusuke Matsui ◽

Itaru Kushima ◽

Norio Ozaki ◽

Teppei Shimamura

Keyword(s):

Learning Algorithm ◽

Simulated Data ◽

Autism Spectrum ◽

Supplementary Information ◽

Sparse Learning ◽

Topological Information ◽

Infinite Point ◽

Neurogenetic Disorders ◽

Information Matrices ◽

Network Of Networks

Abstract Motivation Recent sequence-based analyses have identified a lot of gene variants that may contribute to neurogenetic disorders such as autism spectrum disorder and schizophrenia. Several state-of-the-art network-based analyses have been proposed for mechanical understanding of genetic variants in neurogenetic disorders. However, these methods were mainly designed for modeling and analyzing single networks that do not interact with or depend on other networks, and thus cannot capture the properties between interdependent systems in brain-specific tissues, circuits and regions which are connected each other and affect behavior and cognitive processes. Results We introduce a novel and efficient framework, called a ‘Network of Networks’ approach, to infer the interconnectivity structure between multiple networks where the response and the predictor variables are topological information matrices of given networks. We also propose Graph-Oriented SParsE Learning, a new sparse structural learning algorithm for network data to identify a subset of the topological information matrices of the predictors related to the response. We demonstrate on simulated data that propose Graph-Oriented SParsE Learning outperforms existing kernel-based algorithms in terms of F-measure. On real data from human brain region-specific functional networks associated with the autism risk genes, we show that the ‘Network of Networks’ model provides insights on the autism-associated interconnectivity structure between functional interaction networks and a comprehensive understanding of the genetic basis of autism across diverse regions of the brain. Availability and implementation Our software is available from https://github.com/infinite-point/GOSPEL. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text