scholarly journals f-statistics estimation and admixture graph construction with Pool-Seq or allele count data using the R package poolfstat

2021 ◽  
Author(s):  
Mathieu Gautier ◽  
Renaud VITALIS ◽  
Laurence Flori ◽  
Arnaud Estoup

By capturing various patterns of the structuring of genetic variation across populations, f-statistics have proved highly effective for the inference of demographic history. Such statistics are defined as covariance of SNP allele frequency differences among sets of populations without requiring haplotype information and are hence particularly relevant for the analysis of pooled sequencing (Pool-Seq) data. We here propose a reinterpretation of the F (and D) parameters in terms of probability of gene identity and derive from this unified definition unbiased estimators for both Pool-Seq and standard allele count data obtained from individual genotypes. We implemented these estimators in a new version of the R package poolfstat, which now includes a wide range of inference methods: (i) three-population test of admixture; (ii) four-population test of treeness; (iii) F4-ratio estimation of admixture rates; and (iv) fitting, visualization and (semi-automatic) construction of admixture graphs. A comprehensive evaluation of the methods implemented in poolfstat on both simulated Pool-Seq (with various sequencing coverages and error rates) and allele count data confirmed the accuracy of these approaches, even for the most cost-effective Pool-Seq design involving low sequencing coverages. We further analyzed a real Pool-Seq data made of 14 populations of the invasive species Drosophila suzukii which allowed refining both the demographic history of native populations and the invasion routes followed by this emblematic pest. Our new package poolfstat provides the community with a user-friendly and efficient all-in-one tool to unravel complex population genetic histories from large-size Pool-Seq or allele count SNP data.

Author(s):  
Subrata Mukherjee ◽  
Xuhui Huang ◽  
Lalita Udpa ◽  
Yiming Deng

Abstract Systems in service continue to degrade with passage of time. Pipelines are among the most common systems that wear away with usage. For public safety it is of utmost importance to monitor pipelines and detect new defects within the pipelines. Magnetic flux leakage (MFL) testing is a widely used nondestructive evaluation (NDE) technique for defect detections within the pipelines, particularly those composed of ferromagnetic materials. Pipeline inspection gauge (PIG) procedure based on line-scans or 2D-scans can collect accurate MFL readings for defect detection. However, in real world applications involving large pipe-sectors such extensive scanning techniques are extremely time consuming and costly. In this paper, we develop a fast and cheap methodology that does not need MFL readings at all the points used in traditional PIG procedures but conducts defect detection with similar accuracy. We consider an under-sampling based scheme that collects MFL at uniformly chosen random scan-points over large lattices instead of extensive PIG scans over all lattice points. Based on readings for the chosen random scan points, we use Kriging to reconstruct MFL readings over the entire pipe-sectors. Thereafter, we use thresholding-based segmentation on the reconstructed data for detecting defective areas. We demonstrate the applicability of our methodology on synthetic data generated using popular finite element models as well as on MFL data collected via laboratory experiments. In these experiments spanning a wide range of defect types, our proposed novel MFL based NDE methodology is witnessed to have operating characteristics within the acceptable threshold of PIG based traditional methods and thus provide an extremely cost-effective, fast procedure with competing error rates that can be successfully used for scanning massive pipeline sectors.


Author(s):  
Subrata Mukherjee ◽  
Xuhui Huang ◽  
Lalita Udpa ◽  
Yiming Deng

Abstract Systems in service continue to degrade with passage of time. Pipelines are among the most common systems that wear away with usage. Magnetic flux leakage (MFL) testing is a widely used non-destructive evaluation (NDE) technique for defect detections within the pipelines, particularly those composed of ferromagnetic materials. Pipeline inspection gauge (PIG) procedure based on line-scans can collect accurate MFL readings for defect detection. However, in real world applications involving large pipe-sectors such extensive scanning techniques are extremely time consuming and costly. In this paper, we develop a fast and cheap methodology that does not need MFL readings at all the points used in traditional PIG procedures but conducts defect detection with similar accuracy. We consider an under-sampling based scheme that collects MFL at uniformly chosen random scan-points over large lattices instead of extensive PIG scans over all lattice points. Based on readings for the chosen random scan points, we use Kriging to reconstruct MFL readings. Thereafter, we use thresholding-based segmentation on the reconstructed data for detecting defective areas. We demonstrate the applicability of our methodology on synthetic data generated using finite element models as well as on MFL data collected via laboratory experiments. In these experiments spanning a wide range of defect types, our proposed novel MFL based NDE methodology is witnessed to have operating characteristics within the acceptable threshold of PIG based traditional methods and thus provide an extremely cost-effective, fast procedure with competing error rates.


2020 ◽  
Vol 37 (12) ◽  
pp. 3684-3698 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Jinfeng Chen ◽  
Shibo Wang ◽  
John M Chater ◽  
...  

Abstract Compared with genomic data of individual markers, haplotype data provide higher resolution for DNA variants, advancing our knowledge in genetics and evolution. Although many computational and experimental phasing methods have been developed for analyzing diploid genomes, it remains challenging to reconstruct chromosome-scale haplotypes at low cost, which constrains the utility of this valuable genetic resource. Gamete cells, the natural packaging of haploid complements, are ideal materials for phasing entire chromosomes because the majority of the haplotypic allele combinations has been preserved. Therefore, compared with the current diploid-based phasing methods, using haploid genomic data of single gametes may substantially reduce the complexity in inferring the donor’s chromosomal haplotypes. In this study, we developed the first easy-to-use R package, Hapi, for inferring chromosome-length haplotypes of individual diploid genomes with only a few gametes. Hapi outperformed other phasing methods when analyzing both simulated and real single gamete cell sequencing data sets. The results also suggested that chromosome-scale haplotypes may be inferred by using as few as three gametes, which has pushed the boundary to its possible limit. The single gamete cell sequencing technology allied with the cost-effective Hapi method will make large-scale haplotype-based genetic studies feasible and affordable, promoting the use of haplotype data in a wide range of research.


2019 ◽  
Vol 115 (531) ◽  
pp. 1472-1487 ◽  
Author(s):  
Jack Kamm ◽  
Jonathan Terhorst ◽  
Richard Durbin ◽  
Yun S. Song

2021 ◽  
Author(s):  
Emily LaVerriere ◽  
Philipp Schwabl ◽  
Manuela Carrasquilla ◽  
Aimee R. Taylor ◽  
Zachary M. Johnson ◽  
...  

Multiplexed PCR amplicon sequencing (AmpSeq) is an increasingly popular application for cost-effective monitoring of threatened species and managed wildlife populations, and shows strong potential for genomic epidemiology of infectious disease. AmpSeq data for infectious microbes can inform disease control in multiple ways, including measuring drug resistance marker prevalence, distinguishing imported from local cases, and determining the effectiveness of therapeutics. We describe the design and comparative evaluation of two new AmpSeq assays for Plasmodium falciparum malaria parasites: a four-locus panel ('4CAST') composed of highly diverse antigens, and a 129-locus panel ('AMPLseq') composed of drug resistance markers, highly diverse loci for measuring relatedness, and a locus to detect Plasmodium vivax co-infections. We explore the performance of each panel in various public health use cases with in silico simulations as well as empirical experiments. We find that the smaller 4CAST panel performs reliably across a wide range of parasitemia levels without DNA pre-amplification, and could be highly informative for evaluating the number of distinct parasite strains within samples (complexity of infection) and distinguishing recrudescent infections from new infections in therapeutic efficacy studies. The AMPLseq panel performs similarly to two existing panels of comparable size for relatedness measurement, despite differences in the data and approach used for designing each panel. Finally, we describe an R package (paneljudge) that facilitates design and comparative evaluation of AmpSeq panels for relatedness estimation, and we provide general guidance on the design and implementation of AmpSeq panels for genomic epidemiology of infectious disease.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 450-450
Author(s):  
Tatiana Evgenievna Deniskova ◽  
Arsen V Dotsev ◽  
Marina I Selionova ◽  
Margaret S Fornara ◽  
Henry Reyer ◽  
...  

Abstract Specific environmental conditions and local livestock management systems resulted in creation of valuable native breeds. The timely monitoring of genetic diversity within native breeds based on using high-throughput DNA arrays will prevent their irreparable loss. In this regard, we aimed to assess genome-wide diversity and to study demographic history of Russian native goat breeds (Altai Mountain, Orenburg, Soviet Mohair, Dagestan Milk, Dagestan Local, Dagestan Fluff and Karachaev) based on SNP-data. A total of 200 goats were genotyped using Goat 50K SNP BeadChip (Illumina, USA). Quality control and SNP-filtering were performed in PLINKv1.9. R package ‘diveRsity’ was used to calculate observed heterozygosity (Ho), expected heterozygosity (He), and inbreeding coefficient (Fis). Effective population sizes (Ne) were estimated in SneP software. Observed heterozygosity was high and exceeded 0.402 in five out of seven breeds. Orenburg, Soviet Mohair, Dagestan Milk, and Karachaev breeds showed slight excess of heterozygotes varied from 0.6% (Fis= -0.015) in Orenburg to 1.7% (Fis= -0.04) in Karachaev breed. The traces of insignificant inbreeding were found in Dagestan Local (Fis=0.005) and Dagestan Fluff (Fis= 0.01) breeds. The recent effective population sizes estimated for four generations ago varied from 140 in Karachaev to 472 in Orenburg breed. Analysis of historical trends in effective population sizes estimated for sixty generations ago revealed obvious decrease ranging from 10.25% in Dagestan Local to 34.65% in Orenburg breed. However, recent effective sizes in Russian native goats are higher than critical threshold (Ne= 100) that is essential to breed maintenance in the future. Our research findings provide an evidence that Russian native goat breeds are not in endangered status, but development of the effective utilization programs is highly recommended. The genotyping of 96 goats was funded by RSF No. 19-76-20006. The reported study was funded by RFBR according to the research project № 18-316-20006.


2020 ◽  
pp. 1192-1198
Author(s):  
M.S. Mohammad ◽  
Tibebe Tesfaye ◽  
Kim Ki-Seong

Ultrasonic thickness gauges are easy to operate and reliable, and can be used to measure a wide range of thicknesses and inspect all engineering materials. Supplementing the simple ultrasonic thickness gauges that present results in either a digital readout or as an A-scan with systems that enable correlating the measured values to their positions on the inspected surface to produce a two-dimensional (2D) thickness representation can extend their benefits and provide a cost-effective alternative to expensive advanced C-scan machines. In previous work, the authors introduced a system for the positioning and mapping of the values measured by the ultrasonic thickness gauges and flaw detectors (Tesfaye et al. 2019). The system is an alternative to the systems that use mechanical scanners, encoders, and sophisticated UT machines. It used a camera to record the probe’s movement and a projected laser grid obtained by a laser pattern generator to locate the probe on the inspected surface. In this paper, a novel system is proposed to be applied to flat surfaces, in addition to overcoming the other limitations posed due to the use of the laser projection. The proposed system uses two video cameras, one to monitor the probe’s movement on the inspected surface and the other to capture the corresponding digital readout of the thickness gauge. The acquired images of the probe’s position and thickness gauge readout are processed to plot the measured data in a 2D color-coded map. The system is meant to be simpler and more effective than the previous development.


Author(s):  
Allan Matthews ◽  
Adrian Leyland

Over the past twenty years or so, there have been major steps forward both in the understanding of tribological mechanisms and in the development of new coating and treatment techniques to better “engineer” surfaces to achieve reductions in wear and friction. Particularly in the coatings tribology field, improved techniques and theories which enable us to study and understand the mechanisms occurring at the “nano”, “micro” and “macro” scale have allowed considerable progress to be made in (for example) understanding contact mechanisms and the influence of “third bodies” [1–5]. Over the same period, we have seen the emergence of the discipline which we now call “Surface Engineering”, by which, ideally, a bulk material (the ‘substrate’) and a coating are combined in a way that provides a cost-effective performance enhancement of which neither would be capable without the presence of the other. It is probably fair to say that the emergence and recognition of Surface Engineering as a field in its own right has been driven largely by the availability of “plasma”-based coating and treatment processes, which can provide surface properties which were previously unachievable. In particular, plasma-assisted (PA) physical vapour deposition (PVD) techniques, allowing wear-resistant ceramic thin films such as titanium nitride (TiN) to be deposited on a wide range of industrial tooling, gave a step-change in industrial productivity and manufactured product quality, and caught the attention of engineers due to the remarkable cost savings and performance improvements obtained. Subsequently, so-called 2nd- and 3rd-generation ceramic coatings (with multilayered or nanocomposite structures) have recently been developed [6–9], to further extend tool performance — the objective typically being to increase coating hardness further, or extend hardness capabilities to higher temperatures.


Biostatistics ◽  
2019 ◽  
Author(s):  
Dane R Van Domelen ◽  
Emily M Mitchell ◽  
Neil J Perkins ◽  
Enrique F Schisterman ◽  
Amita K Manatunga ◽  
...  

SUMMARYMeasuring a biomarker in pooled samples from multiple cases or controls can lead to cost-effective estimation of a covariate-adjusted odds ratio, particularly for expensive assays. But pooled measurements may be affected by assay-related measurement error (ME) and/or pooling-related processing error (PE), which can induce bias if ignored. Building on recently developed methods for a normal biomarker subject to additive errors, we present two related estimators for a right-skewed biomarker subject to multiplicative errors: one based on logistic regression and the other based on a Gamma discriminant function model. Applied to a reproductive health dataset with a right-skewed cytokine measured in pools of size 1 and 2, both methods suggest no association with spontaneous abortion. The fitted models indicate little ME but fairly severe PE, the latter of which is much too large to ignore. Simulations mimicking these data with a non-unity odds ratio confirm validity of the estimators and illustrate how PE can detract from pooling-related gains in statistical efficiency. These methods address a key issue associated with the homogeneous pools study design and should facilitate valid odds ratio estimation at a lower cost in a wide range of scenarios.


Sign in / Sign up

Export Citation Format

Share Document