SVhound: Detection of future Structural Variation hotspots

Recent population studies are ever growing in size of samples to investigate the diversity of a given population or species. These studies reveal ever new polymorphism that lead to important insights into the mechanisms of evolution, but are also important for the interpretation of these variations. Nevertheless, while the full catalog of variations across entire species remains unknown, we can predict which regions harbor additional variations that remain hidden and investigate their properties, thereby enhancing the analysis for potentially missed variants. To achieve this we implemented SVhound (https://github.com/lfpaulin/SVhound), which based on a population level SVs dataset can predict regions that harbor novel SV alleles. We tested SVhound using subsets of the 1000 genomes project data and showed that its correlation (average correlation of 2,800 tests r=0.7136) is high to the full data set. Next, we utilized SVhound to investigate potentially missed or understudied regions across 1KGP and CCDG that included multiple genes. Lastly we show the applicability for SVhound also on a small and novel SV call set for rhesus macaque (Macaca mulatta) and discuss the impact and choice of parameters for SVhound. Overall SVhound is a unique method to identify potential regions that harbor hidden diversity in model and non model organisms and can also be potentially used to ensure high quality of SV call sets.

Download Full-text

Bibliometric Indicators of Russian Journals by JCR-Science Edition, 1995-2010

Acta Naturae ◽

10.32607/20758251-2013-5-3-6-12 ◽

2013 ◽

Vol 5 (3) ◽

pp. 6-12 ◽

Cited By ~ 7

Author(s):

A. N. Libkind ◽

V. A. Markusova ◽

L. E. Mindeli

Keyword(s):

Research Performance ◽

Fold Increase ◽

Full Data ◽

Data Set ◽

Journal Citation Reports ◽

Time Period ◽

Subject Categories ◽

The Subject ◽

The Impact

A representative empirical bibliometric analysis of Russian journals included in the Journal Citation Reports-Science Edition (JCR-SE) for the time period 19952010 was conducted at the macro level (excluding the subject categories). It was found that the growth in the number of articles covered by JCR (a 1.8-fold increase compared to 1995) is ahead of the growth rates of Russian publications (1.2-fold increase). Hence, the share of Russian articles covered by JCR-SE was down from 2.5% in 1995 to 1.7% in 2010. It was determined that the number of articles published in an average Russian journal reduced by 20% as compared to the number of articles in an average journal of the full data set. These facts could partly shed light on the question why Russian research performance is staggering (approximately 30,000 articles per year), although the coverage of Russian journals has expanded to 150 titles. Over the past 15 years, a twofold increase in the impact factor of the Russian journals has been observed, which is higher than that for the full data set of journals (a 1.4-fold increase). Measures to improve the quality of Russian journals are proposed.

Download Full-text

Evaluating the quality of the 1000 Genomes Project data

10.1101/383950 ◽

2018 ◽

Cited By ~ 2

Author(s):

Saurabh Belsare ◽

Michal Sakin-Levy ◽

Yulia Mostovoy ◽

Steffen Durinck ◽

Subhra Chaudhry ◽

...

Keyword(s):

Imputation Accuracy ◽

Genomic Analysis ◽

Error Rates ◽

Reference Panel ◽

Specific Reference ◽

1000 Genomes Project ◽

Data Set ◽

1000 Genomes ◽

Project Data

ABSTRACTData from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotype, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

How do you measure up? Methods to assess linkage quality

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.152 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Anna Ferrante ◽

James Boyd ◽

Sean Randall ◽

Adrian Brown ◽

James Semmens

Keyword(s):

Record Linkage ◽

Service Use ◽

Performance Metrics ◽

Population Level ◽

Additional Information ◽

Linkage Quality ◽

Health And Disease ◽

The Impact ◽

Quality Process

ABSTRACT ObjectivesRecord linkage is a powerful technique which transforms discrete episode data into longitudinal person-based records. These records enable the construction and analysis of complex pathways of health and disease progression, and service use. Achieving high linkage quality is essential for ensuring the quality and integrity of research based on linked data. The methods used to assess linkage quality will depend on the volume and characteristics of the datasets involved, the processes used for linkage and the additional information available for quality assessment. This paper proposes and evaluates two methods to routinely assess linkage quality. ApproachLinkage units currently use a range of methods to measure, monitor and improve linkage quality; however, no common approach or standards exist. There is an urgent need to develop “best practices” in evaluating, reporting and benchmarking linkage quality. In assessing linkage quality, of primary interest is in knowing the number of true matches and non-matches identified as links and non-links. Any misclassification of matches within these groups introduces linkage errors. We present efforts to develop sharable methods to measure linkage quality in Australia. This includes a sampling-based method to estimate both precision (accuracy) and recall (sensitivity) following record linkage and a benchmarking method - a transparent and transportable methodology to benchmark the quality of linkages across different operational environments. ResultsThe sampling-based method achieved estimates of linkage quality that were very close to actual linkage quality metrics. This method presents as a feasible means of accurately estimating matching quality and refining linkages in population level linkage studies. The benchmarking method provides a systematic approach to estimating linkage quality with a set of open and shareable datasets and a set of well-defined, established performance metrics. The method provides an opportunity to benchmark the linkage quality of different record linkage operations. Both methods have the potential to assess the inter-rater reliability of clerical reviews. ConclusionsBoth methods produce reliable estimates of linkage quality enabling the exchange of information within and between linkage communities. It is important that researchers can assess risk in studies using record linkage techniques. Understanding the impact of linkage quality on research outputs highlights a need for standard methods to routinely measure linkage quality. These two methods provide a good start to the quality process, but it is important to identify standards and good practices in all parts of the linkage process (pre-processing, standardising activities, linkage, grouping and extracting).

Download Full-text

Influence of the weights in IHS and Brovey methods for pan-sharpening WorldView-3 satellite images

International Journal of Engineering & Technology ◽

10.14419/ijet.v6i3.7702 ◽

2017 ◽

Vol 6 (3) ◽

pp. 71 ◽

Cited By ~ 5

Author(s):

Claudio Parente ◽

Massimiliano Pepe

Keyword(s):

Satellite Images ◽

Urban Landscape ◽

Spectral Response ◽

Rural Landscape ◽

Spectral Radiance ◽

Data Sets ◽

Data Set ◽

Inertial Moment ◽

The Impact

The purpose of this paper is to investigate the impact of weights in pan-sharpening methods applied to satellite images. Indeed, different data sets of weights have been considered and compared in the IHS and Brovey methods. The first dataset contains the same weight for each band while the second takes in account the weighs obtained by spectral radiance response; these two data sets are most common in pan-sharpening application. The third data set is resulting by a new method. It consists to compute the inertial moment of first order of each band taking in account the spectral response. For testing the impact of the weights of the different data sets, WorlView-3 satellite images have been considered. In particular, two different scenes (the first in urban landscape, the latter in rural landscape) have been investigated. The quality of pan-sharpened images has been analysed by three different quality indexes: Root mean square error (RMSE), Relative average spectral error (RASE) and Erreur Relative Global Adimensionnelle de Synthèse (ERGAS).

Download Full-text

The impact of incongruence and exogenous gene fragments on estimates of the eukaryote root

10.1101/2021.04.08.438903 ◽

2021 ◽

Author(s):

Caesar Al Jewari ◽

Sandra L Baldauf

Keyword(s):

Phylogenetic Analyses ◽

Single Gene ◽

Evolutionary Model ◽

Bootstrap Support ◽

Gene Trees ◽

Full Data ◽

Data Set ◽

Gene Level ◽

The Impact ◽

Exogenous Gene

Phylogenomics uses multiple genetic loci to reconstruct evolutionary trees, under the stipulation that all combined loci share a common phylogenetic history, i.e., they are congruent. Congruence is primarily evaluated via single-gene trees, but these trees invariably lack sufficient signal to resolve deep nodes making it difficult to assess congruence at these levels. Two methods were developed to systematically assess congruence in multi-locus data. Protocol 1 uses gene jackknifing to measure deviation from a central mean to identify taxon-specific incongruencies in the form of persistent outliers. Protocol_2 assesses congruence at the sub-gene level using a sliding window. Both protocols were tested on a controversial data set of 76 mitochondrial proteins previously used in various combinations to assess the eukaryote root. Protocol_1 showed a concentration of outliers in under-sampled taxa, including the pivotal taxon Discoba. Further analysis of Discoba using Protocol_2 detected a surprising number of apparently exogenous gene fragments, some of which overlap with Protocol_1 outliers and others that do not. Phylogenetic analyses of the full data using the static LG-gamma evolutionary model support a neozoan-excavate root for eukaryotes (Discoba sister), which rises to 99-100% bootstrap support with data masked according to either Protocol_1 or Protocol_2. In contrast, site-heterogeneous (mixture) models perform inconsistently with these data, yielding all three possible roots depending on presence/absence/type of masking and/or extent of missing data. The neozoan-excavate root places Amorphea (including animals and fungi) and Diaphoretickes (including plants) as more closely related to each other than either is to Discoba (Jakobida, Heterolobosea, and Euglenozoa), regardless of the presence/absence of additional taxa.

Download Full-text

Multidimensional phenotyping predicts lifespan and quantifies health in C. elegans

10.1101/681197 ◽

2019 ◽

Cited By ~ 1

Author(s):

Céline N. Martineau ◽

André E. X. Brown ◽

Patrick Laurent

Keyword(s):

Simple Model ◽

Objective Measure ◽

Quantitative Model ◽

Model Organisms ◽

Wild Type ◽

Data Set ◽

C Elegans ◽

Wide Range ◽

Initial Hypothesis

AbstractAgeing affects a wide range of phenotypes at all scales, but an objective measure of ageing remains challenging, even in simple model organisms. We assumed that a wide range of phenotypes at the organismal scale rather than a limited number of biomarkers of ageing would best describe the ageing process. Hundreds of morphological, postural and behavioural features are extracted at once from high resolutions videos. A quantitative model using this multi-parametric dataset can predict the biological age and lifespan of individual C. elegans. We show that the quality of predictions on a held-out data set increases with the number of features added to the model, supporting our initial hypothesis. Despite the large diversity of ageing mechanisms, including stochastic insults, our results highlight a robust ageing trajectory, but variable ageing rates along that trajectory. We show that healthspan, which we defined as the range of abilities of the animals, is correlated to lifespan in wild-type worms.

Download Full-text

Pilot Study of a Bayesian Approach To Estimate Vancomycin Exposure in Obese Patients with Limited Pharmacokinetic Sampling

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.02478-16 ◽

2017 ◽

Vol 61 (5) ◽

Cited By ~ 11

Author(s):

Joseph J. Carreno ◽

Ben Lomaestro ◽

John Tietjan ◽

Thomas P. Lodise

Keyword(s):

Medical Center ◽

Estimation Method ◽

Area Under The Curve ◽

Predictive Performance ◽

Good Precision ◽

Obese Patients ◽

Full Data ◽

Data Set ◽

The Best Approximation ◽

The Impact

ABSTRACT This study evaluated the predictive performance of a Bayesian PK estimation method (ADAPT V) to estimate the 24-h vancomycin area under the curve (AUC) with limited pharmacokinetic (PK) sampling in adult obese patients receiving vancomycin for suspected or confirmed Gram-positive infections. This was an Albany Medical Center Institutional Review Board-approved prospective evaluation of 12 patients. Patients had a median (95% confidence interval) age of 61 years (39 to 71 years), a median creatinine clearance of 86 ml/min (75 to 120 ml/min), and a median body mass index of 45 kg/m2 (40 to 52 kg/m2). For each patient, five PK concentrations were measured, and four different vancomycin population PK models were used as Bayesian priors to estimate the vancomycin AUC (AUCFULL). Using each PK model as a prior, data-depleted PK subsets were used to estimate the 24-h AUC (i.e., peak and trough data [AUCPT], midpoint and trough data [AUCMT], and trough-only data [AUCT]). The 24-h AUC derived from the full data set (AUCFULL) was compared to the AUC derived from data-depleted subsets (AUCPT, AUCMT, and AUCT) for each model. For the four sets of analyses, AUCFULL estimates ranged from 437 to 489 mg·h/liter. The AUCPT provided the best approximation of the AUCFULL; AUCMT and AUCT tended to overestimate AUCFULL. Further prospective studies are needed to evaluate the impact of AUC monitoring in clinical practice, but the findings from this study suggest that the vancomycin AUC can be estimated with good precision and accuracy with limited PK sampling using Bayesian PK estimation software.

Download Full-text

Decomposing variation in dairy profitability: the impact of output, inputs, prices, labour and management

The Journal of Agricultural Science ◽

10.1017/s0021859610001176 ◽

2011 ◽

Vol 149 (4) ◽

pp. 507-517 ◽

Cited By ~ 27

Author(s):

P. WILSON

Keyword(s):

Structural Change ◽

Holistic Approach ◽

Herd Size ◽

Full Data ◽

Price Variation ◽

Data Set ◽

Economic Return ◽

Net Margin ◽

The Uk ◽

The Impact

SUMMARYThe UK dairy sector has undergone considerable structural change in recent years, with a decrease in the number of producers accompanied by an increased average herd size and increased concentrate use and milk yields. One of the key drivers to producers remaining in the industry is the profitability of their herds. The current paper adopts a holistic approach to decomposing the variation in dairy profitability through an analysis of net margin data explained by physical input–output measures, milk price variation, labour utilization and managerial behaviours and characteristics. Data are drawn from the Farm Business Survey (FBS) for England in 2007/08 for 228 dairy enterprises. Average yields are 7100 litres/cow/yr, from a herd size of 110 cows that use 0·56 forage ha/cow/yr and 43·2 labour h/cow/yr. An average milk price of 22·57 pence per litre (ppl) produced milk output of £1602/cow/yr, which after accounting for calf sales, herd replacements and quota leasing costs, gave an average dairy output of £1516/cow/yr. After total costs of £1464/cow/yr this left an economic return of £52/cow/yr (0·73 ppl) net margin profit. There is wide variation in performance, with the most profitable (as measured by net margin per cow) quartile of producers achieving 2000 litres/cow/yr more than the least profitable quartile, returning a net margin of £335/cow/yr compared to a loss of £361/cow/yr for the least profitable. The most profitable producers operate larger, higher yielding herds and achieve a greater milk price for their output. In addition, a significantly greater number of the most profitable producers undertake financial benchmarking within their businesses and operate specialist dairy farms. When examining the full data set, the most profitable enterprises included significantly greater numbers of organic producers. The most profitable tend to have a greater reliance on independent technical advice, but this finding is not statistically significant. Decomposing the variation in net margin performance between the most and least profitable groups, an approximate ratio of 65:23:12 is observed for higher yields: lower costs: higher milk price. This result indicates that yield differentials are the key performance driver in dairy profitability. Lower costs per cow are dominated by the significantly lower cost of farmer and spouse labour per cow of the most profitable group, flowing directly from the upper quartile expending 37·7 labour h/cow/yr in comparison with 58·8 h/cow/yr for the lower quartile. The upper quartile's greater milk price is argued to be achieved through contract negotiations and higher milk quality, and this accounts for 0·12 of the variation in net margin performance. The average economic return to the sample of dairy enterprises in this survey year was less than £6000/farm/yr. However, the most profitable quartile returned an average economic return of approximately £50 000 per farm/yr. Structural change in the UK dairy sector is likely to continue with the least profitable and typically smaller dairy enterprises being replaced by a smaller number of expanding dairy production units.

Download Full-text

Application of Fitts' Law to Individuals with Cerebral Palsy

Perceptual and Motor Skills ◽

10.2466/pms.2002.94.3.883 ◽

2002 ◽

Vol 94 (3) ◽

pp. 883-895 ◽

Cited By ~ 7

Author(s):

Alison Gump ◽

Miriam Legare ◽

Deborah L. Hunt

Keyword(s):

Cerebral Palsy ◽

Linear Models ◽

Motor Behavior ◽

Error Rates ◽

Subject Group ◽

Full Data ◽

Data Set ◽

Fitts Law ◽

The Impact ◽

Speed Accuracy

Cerebral palsy is a condition that results in motor abnormalities as a direct consequence of injury to the developing brain. Fitts' law, which describes a speed-accuracy tradeoff in visually guided movements, has been shown to characterize the motor behavior of normal subjects during aiming tasks. To assess whether Fitts' law can also describe the aimed movements of persons with cerebral palsy, eight cerebral palsied adults participated in an aimed movement study. 12 targets were used with Indices of Difficulty ranging from 2.19 to 6.00 bits. The impact of Gan and Hoffmann's 1988 ballistic movement factor, A, and Fitts' 1954 Index of Difficulty on subject's movement and reaction times was examined using multivariate linear models. The analysis of the full data set yielded a significant effect of A on movement times and no significant adherence to Fitts' law. However, high error rates that could be the result of oculomotor problems among the subject group were noted, and the method of handling errors had a large effect on the results. Tracking eye position during a Fitts' law task would provide information regarding the effect of oculomotor difficulties on aiming tasks in the cerebral palsied subject group.

Download Full-text