scholarly journals Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing

2014 ◽  
Vol 31 (4) ◽  
pp. 515-522 ◽  
Author(s):  
Chang-Chang Cao ◽  
Xiao Sun
2016 ◽  
Vol 14 (04) ◽  
pp. 1650017
Author(s):  
Chang-Chang Cao ◽  
Xiao Sun

To reduce the cost of large-scale re-sequencing, multiple individuals are pooled together and sequenced called pooled sequencing. Pooled sequencing could provide a cost-effective alternative to sequencing individuals separately. To facilitate the application of pooled sequencing in haplotype-based diseases association analysis, the critical procedure is to accurately estimate haplotype frequencies from pooled samples. Here we present Ehapp2 for estimating haplotype frequencies from pooled sequencing data by utilizing a database which provides prior information of known haplotypes. We first translate the problem of estimating frequency for each haplotype into finding a sparse solution for a system of linear equations, where the NNREG algorithm is employed to achieve the solution. Simulation experiments reveal that Ehapp2 is robust to sequencing errors and able to estimate the frequencies of haplotypes with less than 3% average relative difference for pooled sequencing of mixture of real Drosophila haplotypes with 50× total coverage even when the sequencing error rate is as high as 0.05. Owing to the strategy that proportions for local haplotypes spanning multiple SNPs are accurately calculated first, Ehapp2 retains excellent estimation for recombinant haplotypes resulting from chromosomal crossover. Comparisons with present methods reveal that Ehapp2 is state-of-the-art for many sequencing study designs and more suitable for current massive parallel sequencing.


2021 ◽  
Author(s):  
Sonia E Eynard ◽  
Alain Vignal ◽  
Benjamin Basso ◽  
Yves Le Conte ◽  
Axel Decourtye ◽  
...  

Background: Eusocial insects play a central role in many ecosystems, and particularly the important pollinator honeybee (Apis mellifera). One approach to facilitate their study in molecular genetics, is to consider whole colonies as single individuals by combining DNA of multiple individuals in a single pool sequencing experiment. Such a technique comes with the drawback of producing data requiring dedicated analytical methods to be fully exploited. Despite this limitation, pool sequencing data has been shown to be informative and cost-effective when working on random mating populations. Here, we present new statistical methods for exploiting pool sequencing data of eusocial colonies in order to reconstruct the genotype of the colony founder, the queen. This leverages the possibility to monitor genetic diversity, perform genomic-based studies or implement selective breeding. Results: Using simulations and honeybee real data, we show that the methods allow for a fast and accurate estimation of the genetic ancestry, with correlations of 0.9 with that obtained from individual genotyping, and for an accurate reconstruction of the queen genotype, with 2% genotyping error. We further validate the inference using experimental data on colonies with both pool sequencing and individual genotyping of drones. Conclusion: In this study we present statistical models to accurately estimate the genetic ancestry and reconstruct the genotype of the queen from pool sequencing data from workers of an eusocial colony. Such information allows to exploit pool sequencing for traditional population genetics, association studies and selective breeding. While validated in Apis mellifera, these methods are applicable to other eusocial hymenoptera species.


2017 ◽  
Vol 18 (2) ◽  
pp. 194-203 ◽  
Author(s):  
Nicolas O. Rode ◽  
Yan Holtz ◽  
Karine Loridon ◽  
Sylvain Santoni ◽  
Joëlle Ronfort ◽  
...  

Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Author(s):  
Miss Payal W. Paratpure

Tracking of public bus location requires a GPS device to be installed, and lots of bus operators in developing countries don't have such an answer in situ to supply an accurate estimation of bus time of arrival (ETA). Without ETA information, it's very difficult for the overall public to plan their journey effectively. In this paper, implementation of an innovative IOT solution to trace the real time location of buses without requiring the deployment of a GPS device is discussed. It uses Bluetooth Low Energy (BLE) proximity beacon to trace the journey of a bus by deploying an Estimate location beacon on the bus. BLE detection devices (Raspberry Pi 4) are installed at selected bus stops along the path to detect the arrival of buses. Once detected, the situation of the bus is submitted to a cloud server to compute the bus ETAs. A field trial is currently being conducted in Johor, Malaysia together with an area bus operator on one single path. Our test results showed that the detection of BLE beacons is extremely accurate and it's feasible to trace the situation of buses without employing a GPS device during a cost-effective way.


2015 ◽  
Vol 8 (11) ◽  
pp. 4817-4830 ◽  
Author(s):  
X. Xi ◽  
V. Natraj ◽  
R. L. Shia ◽  
M. Luo ◽  
Q. Zhang ◽  
...  

Abstract. The Geostationary Fourier Transform Spectrometer (GeoFTS) is designed to measure high-resolution spectra of reflected sunlight in three near-infrared bands centered around 0.76, 1.6, and 2.3 μm and to deliver simultaneous retrievals of column-averaged dry air mole fractions of CO2, CH4, CO, and H2O (denoted XCO2, XCH4, XCO, and XH2O, respectively) at different times of day over North America. In this study, we perform radiative transfer simulations over both clear-sky and all-sky scenes expected to be observed by GeoFTS and estimate the prospective performance of retrievals based on results from Bayesian error analysis and characterization. We find that, for simulated clear-sky retrievals, the average retrieval biases and single-measurement precisions are < 0.2 % for XCO2, XCH4, and XH2O, and < 2 % for XCO, when the a priori values have a bias of 3 % and an uncertainty of 3 %. In addition, an increase in the amount of aerosols and ice clouds leads to a notable increase in the retrieval biases and slight worsening of the retrieval precisions. Furthermore, retrieval precision is a strong function of signal-to-noise ratio and spectral resolution. This simulation study can help guide decisions on the design of the GeoFTS observing system, which can result in cost-effective measurement strategies while achieving satisfactory levels of retrieval precisions and biases. The simultaneous retrievals at different times of day will be important for more accurate estimation of carbon sources and sinks on fine spatiotemporal scales and for studies related to the atmospheric component of the water cycle.


2020 ◽  
Author(s):  
Timour Baslan ◽  
Sam Kovaka ◽  
Fritz J. Sedlazeck ◽  
Yanming Zhang ◽  
Robert Wappel ◽  
...  

ABSTRACTGenome copy number is an important source of genetic variation in health and disease. In cancer, clinically actionable Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in terms of the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms. This represents a challenge for applications that require high read counts such as CNA inference. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that sequencing short DNA molecules reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for the sequencing of relatively short DNA molecules on nanopore devices with applications in research and medicine, that include but are not limited to, CNAs.


2021 ◽  
Author(s):  
Karan K. Budhraja ◽  
Bradon R. McDonald ◽  
Michelle D. Stephens ◽  
Tania Contente-Cuomo ◽  
Havell Markus ◽  
...  

AbstractFragmentation patterns observed in plasma DNA reflect chromatin accessibility in contributing cells. Since DNA shed from cancer cells and blood cells may differ in fragmentation patterns, we investigated whether analysis of genomic positioning and nucleotide sequence at fragment ends can reveal the presence of tumor DNA in blood and aid cancer diagnostics. We analyzed whole genome sequencing data from >2700 plasma DNA samples including healthy individuals and patients with 11 different cancer types. We observed higher fractions of fragments with aberrantly positioned ends in patients with cancer, driven by contribution of tumor DNA into plasma. Genomewide analysis of fragment ends using machine learning showed overall area under the receiver operative characteristic curve of 0.96 for detection of cancer. Our findings remained robust with as few as 1 million fragments analyzed per sample, suggesting that analysis of fragment ends can become a cost-effective and accessible approach for cancer detection and monitoring.One-sentence summaryAnalyzing the positioning and nucleotide sequence at fragment ends in plasma DNA may enable cancer diagnostics.


Author(s):  
S. Rubinacci ◽  
D.M. Ribeiro ◽  
R. Hofmeister ◽  
O. Delaneau

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.


2021 ◽  
Vol 22 (16) ◽  
pp. 8498
Author(s):  
Margaritis Avgeris ◽  
Panagiotis G. Adamopoulos ◽  
Aikaterini Galani ◽  
Marieta Xagorari ◽  
Dimitrios Gourgiotis ◽  
...  

Considering the lack of effective treatments against COVID-19, wastewater-based epidemiology (WBE) is emerging as a cost-effective approach for real-time population-wide SARS-CoV-2 monitoring. Here, we report novel molecular assays for sensitive detection and mutational/variant analysis of SARS-CoV-2 in wastewater. Highly stable regions of SARS-CoV-2 RNA were identified by RNA stability analysis and targeted for the development of novel nested PCR assays. Targeted DNA sequencing (DNA-seq) was applied for the analysis and quantification of SARS-CoV-2 mutations/variants, following hexamers-based reverse transcription and nested PCR-based amplification of targeted regions. Three-dimensional (3D) structure models were generated to examine the predicted structural modification caused by genomic variants. WBE of SARS-CoV-2 revealed to be assay dependent, and significantly improved sensitivity achieved by assay combination (94%) vs. single-assay screening (30%–60%). Targeted DNA-seq allowed the quantification of SARS-CoV-2 mutations/variants in wastewater, which agreed with COVID-19 patients’ sequencing data. A mutational analysis indicated the prevalence of D614G (S) and P323L (RdRP) variants, as well as of the Β.1.1.7/alpha variant of concern, in agreement with the frequency of Β.1.1.7/alpha variant in clinical samples of the same period of the third pandemic wave at the national level. Our assays provide an innovative cost-effective platform for real-time monitoring and early-identification of SARS-CoV-2 variants at community/population levels.


Sign in / Sign up

Export Citation Format

Share Document