scholarly journals nQuire: A Statistical Framework For Ploidy Estimation Using Next Generation Sequencing

2017 ◽  
Author(s):  
Clemens L. Weiß ◽  
Marina Pais ◽  
Liliana M. Cano ◽  
Sophien Kamoun ◽  
Hernán A. Burbano

AbstractIntraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats.We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker’s yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log-likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies.nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at github.com/clwgg/nQuire under the MIT license.

Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Clemens L. Weiß ◽  
Marina Pais ◽  
Liliana M. Cano ◽  
Sophien Kamoun ◽  
Hernán A. Burbano

2021 ◽  
Vol 12 ◽  
Author(s):  
Tyler Dang ◽  
Irene Lavagi-Craddock ◽  
Sohrab Bodaghi ◽  
Georgios Vidalakis

Citrus dwarfing viroid (CDVd) induces stunting on sweet orange trees [Citrus sinensis (L.) Osbeck], propagated on trifoliate orange rootstock [Citrus trifoliata (L.), syn. Poncirus trifoliata (L.) Raf.]. MicroRNAs (miRNAs) are a class of non-coding small RNAs (sRNAs) that play important roles in the regulation of tree gene expression. To identify miRNAs in dwarfed citrus trees, grown in high-density plantings, and their response to CDVd infection, sRNA next-generation sequencing was performed on CDVd-infected and non-infected controls. A total of 1,290 and 628 miRNAs were identified in stem and root tissues, respectively, and among those, 60 were conserved in each of these two tissue types. Three conserved miRNAs (csi-miR479, csi-miR171b, and csi-miR156) were significantly downregulated (adjusted p-value < 0.05) in the stems of CDVd-infected trees compared to the non-infected controls. The three stem downregulated miRNAs are known to be involved in various physiological and developmental processes some of which may be related to the characteristic dwarfed phenotype displayed by CDVd-infected C. sinensis on C. trifoliata rootstock field trees. Only one miRNA (csi-miR535) was significantly downregulated in CDVd-infected roots and it was predicted to target genes controlling a wide range of cellular functions. Reverse transcription quantitative polymerase chain reaction analysis performed on selected miRNA targets validated the negative correlation between the expression levels of these targets and their corresponding miRNAs in CDVd-infected trees. Our results indicate that CDVd-responsive plant miRNAs play a role in regulating important citrus growth and developmental processes that may participate in the cellular changes leading to the observed citrus dwarf phenotype.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S281-S282
Author(s):  
Heather L Wells ◽  
Joseph Barrows ◽  
Mara Couto-Rodriguez ◽  
Xavier O Jirau Serrano ◽  
Marilyne Debieu ◽  
...  

Abstract Background The quantitative level of pathogens present in a host is a major driver of infectious disease (ID) state and outcome. However, the majority of ID diagnostics are qualitative. Next-generation sequencing (NGS) is an emerging ID diagnostics and research tool to provide insights, including tracking transmission, evolution, and identifying novel strains. Methods We built a novel likelihood-based computational method to leverage pathogen-specific genome-wide NGS data to detect SARS-CoV-2, profile genetic variants, and furthermore quantify levels of these pathogens. We used de-identified clinical specimens tested for SARS-CoV-2 using RT-PCR, SARS-CoV-2 NGS Assay (hybrid capture, Twist Bioscience), or ARTIC (amplicon-based) platform, and COVID-DX software. A training (n=87) and validation (n=22) set was selected to establish the strength of our quantification model. We fit non-uniform probabilistic error profiles to a deterministic sigmoidal equation that more realistically represents observed data and used likelihood maximized over several different read depths to improve accuracy over a wide range of values of viral load. Given the proportion of the genome covered at varying depths for a single sample as input data, our model estimated the Ct of that sample as the value that produces the maximum likelihood of generating the observed genome coverage data. Results The model fit on 87 SARS-CoV-2 NGS Assay training samples produced a good fit to the 22 validation samples, with a coefficient of correlation (r2) of ~0.8. The accuracy of the model was high (mean absolute % error of ~10%, meaning our model is able to predict the Ct value of each sample within a margin of ±10% on average). Because of the nature of the commonly used ARTIC protocol, we found that all quantitative signals in this data were lost during PCR amplification and the model is not applicable for quantification of samples captured this way. The ability to model quantification is a major advantage of the SARS-CoV-2 NGS assay protocol. The likelihood-based model to estimate SARS-CoV-2 viral titer Left Observed genome coverage (y-axis) plotted against Ct value (x-axis). The best-fitting logistic curve is demonstrated with a red line with shaded areas above and below representing the fitted error profile. RIGHT: Model-estimated Ct values (y-axis) compared to laboratory Ct values (x-axis) with grey bars representing estimated confidence intervals. The 1:1 diagonal is shown as a dotted line. Conclusion To our knowledge, this is the first model to incorporate sequence data mapped across the genome of a pathogen to quantify the level of that pathogen in a clinical specimen. This has implications in ID diagnostics, research, and metagenomics. Disclosures Heather L. Wells, MPH, Biotia, Inc. (Consultant) Joseph Barrows, MS, Biotia (Employee) Mara Couto-Rodriguez, MS, Biotia (Employee) Xavier O. Jirau Serrano, B.S., Biotia (Employee) Marilyne Debieu, PhD, Biotia (Employee) Karen Wessel, PhD, Labor Zotz/Klimas (Employee) Christopher Mason, PhD, Biotia (Board Member, Advisor or Review Panel member, Shareholder) Dorottya Nagy-Szakal, MD PhD, Biotia Inc (Employee, Shareholder) Niamh B. O’Hara, PhD, Biotia (Board Member, Employee, Shareholder)


2015 ◽  
Vol 6 (1) ◽  
pp. 29-40 ◽  
Author(s):  
Julien Boutte ◽  
Benoît Aliaga ◽  
Oscar Lima ◽  
Julie Ferreira de Carvalho ◽  
Abdelkader Ainouche ◽  
...  

2018 ◽  
Vol 56 (7) ◽  
pp. 1046-1053 ◽  
Author(s):  
Anne Bergougnoux ◽  
Valeria D’Argenio ◽  
Stefanie Sollfrank ◽  
Fanny Verneau ◽  
Antonella Telese ◽  
...  

Abstract Background: Many European laboratories offer molecular genetic analysis of the CFTR gene using a wide range of methods to identify mutations causative of cystic fibrosis (CF) and CFTR-related disorders (CFTR-RDs). Next-generation sequencing (NGS) strategies are widely used in diagnostic practice, and CE marking is now required for most in vitro diagnostic (IVD) tests in Europe. The aim of this multicenter study, which involved three European laboratories specialized in CF molecular analysis, was to evaluate the performance of Multiplicom’s CFTR MASTR Dx kit to obtain CE-IVD certification. Methods: A total of 164 samples, previously analyzed with well-established “reference” methods for the molecular diagnosis of the CFTR gene, were selected and re-sequenced using the Illumina MiSeq benchtop NGS platform. Sequencing data were analyzed using two different bioinformatic pipelines. Annotated variants were then compared to the previously obtained reference data. Results and conclusions: The analytical sensitivity, specificity and accuracy rates of the Multiplicom CFTR MASTR assay exceeded 99%. Because different types of CFTR mutations can be detected in a single workflow, the CFTR MASTR assay simplifies the overall process and is consequently well suited for routine diagnostics.


2016 ◽  
Vol 192 ◽  
pp. 788-798 ◽  
Author(s):  
Sander Willems ◽  
Marie-Alice Fraiture ◽  
Dieter Deforce ◽  
Sigrid C.J. De Keersmaecker ◽  
Marc De Loose ◽  
...  

2021 ◽  
Author(s):  
Jasmina Damnjanović ◽  
Nana Odake ◽  
Jicheng Fan ◽  
Beixi Jia ◽  
Takaaki Kojima ◽  
...  

AbstractcDNA display is an in vitro display technology based on a covalent linkage between a protein and its corresponding mRNA/cDNA, where a stable complex is formed suitable for a wide range of selection conditions. A great advantage of cDNA display is the ability to handle enormous library size (1012) in a microtube scale, in a matter of days. To harness its benefits, we aimed at developing a platform which combines the advantages of cDNA display with high-throughput and accuracy of next-generation sequencing (NGS) for the selection of preferred substrate peptides of transglutaminase 2 (TG2), a protein cross-linking enzyme. After the optimization of the platform by the repeated screening of binary model libraries consisting of the substrate and non-substrate peptides at different ratios, screening and selection of combinatorial peptide library randomized at positions -1, +1, +2, and +3 from the glutamine residue was carried out. Enriched cDNA complexes were analyzed by NGS and bioinformatics, revealing the comprehensive amino acid preference of the TG2 at targeted positions of the peptide backbone. This is the first report on the cDNA display/NGS screening system to yield comprehensive data on TG substrate preference. Although some issues remain to be solved, this platform can be applied to the selection of other TGs and easily adjusted for the selection of other peptide substrates and even larger biomolecules.


2016 ◽  
Author(s):  
Marco Fantini ◽  
Luca Pandolfini ◽  
Simonetta Lisi ◽  
Michele Chirichella ◽  
Ivan Arisi ◽  
...  

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its diversity, i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library diversity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Diversity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring diversity from such a small sample is, however, very rudimental and gives limited information about the real complexity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library diversity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the diversity, taking in consideration the sequencing error.


Sign in / Sign up

Export Citation Format

Share Document