scholarly journals AmpUMI: Design and analysis of unique molecular identifiers for deep amplicon sequencing

2018 ◽  
Author(s):  
Kendell Clement ◽  
Rick Farouni ◽  
Daniel E. Bauer ◽  
Luca Pinello

AbstractMotivationUnique molecular identifiers (UMIs) are added to DNA fragments before PCR amplification to discriminate between alleles arising from the same genomic locus and sequencing reads produced by PCR amplification. While computational methods have been developed to take into account UMI information in genome-wide and single-cell sequencing studies, they are not designed for modern amplicon based sequencing experiments, especially in cases of high allelic diversity. Importantly, no guidelines are provided for the design of optimal UMI length for amplicon-based sequencing experiments.ResultsBased on the total number of DNA fragments and the distribution of allele frequencies, we present a model for the determination of the minimum UMI length required to prevent UMI collisions and reduce allelic distortion. We also introduce a user-friendly software tool called AmpUMI to assist in the design and the analysis of UMI-based amplicon sequencing studies. AmpUMI provides quality control metrics on frequency and quality of UMIs, and trims and deduplicates amplicon sequences with user specified parameters for use in downstream analysis. AmpUMI is open-source and freely available at http://github.com/pinellolab/[email protected]


2019 ◽  
Vol 47 (W1) ◽  
pp. W530-W535 ◽  
Author(s):  
Ernesto Aparicio-Puerta ◽  
Ricardo Lebrón ◽  
Antonio Rueda ◽  
Cristina Gómez-Martín ◽  
Stavros Giannoukakos ◽  
...  

Abstract Since the original publication of sRNAtoolbox in 2015, small RNA research experienced notable advances in different directions. New protocols for small RNA sequencing have become available to address important issues such as adapter ligation bias, PCR amplification artefacts or to include internal controls such as spike-in sequences. New microRNA reference databases were developed with different foci, either prioritizing accuracy (low number of false positives) or completeness (low number of false negatives). Additionally, other small RNA molecules as well as microRNA sequence and length variants (isomiRs) have continued to gain importance. Finally, the number of microRNA sequencing studies deposited in GEO nearly triplicated from 2014 (280) to 2018 (764). These developments imply that fast and easy-to-use tools for expression profiling and subsequent downstream analysis of miRNA-seq data are essential to many researchers. Key features in this sRNAtoolbox release include addition of all major RNA library preparation protocols to sRNAbench and improvements in sRNAde, a tool that summarizes several aspects of small RNA sequencing studies including the detection of consensus differential expression. A special emphasis was put on the user-friendliness of the tools, for instance sRNAbench now supports parallel launching of several jobs to improve reproducibility and user time efficiency.



2014 ◽  
Vol 10 ◽  
pp. 1826-1833 ◽  
Author(s):  
Kevin M Bradley ◽  
Steven A Benner

Synthetic biologists wishing to self-assemble large DNA (L-DNA) constructs from small DNA fragments made by automated synthesis need fragments that hybridize predictably. Such predictability is difficult to obtain with nucleotides built from just the four standard nucleotides. Natural DNA's peculiar combination of strong and weak G:C and A:T pairs, the context-dependence of the strengths of those pairs, unimolecular strand folding that competes with desired interstrand hybridization, and non-Watson–Crick interactions available to standard DNA, all contribute to this unpredictability. In principle, adding extra nucleotides to the genetic alphabet can improve the predictability and reliability of autonomous DNA self-assembly, simply by increasing the information density of oligonucleotide sequences. These extra nucleotides are now available as parts of artificially expanded genetic information systems (AEGIS), and tools are now available to generate entirely standard DNA from AEGIS DNA during PCR amplification. Here, we describe the OligArch (for "oligonucleotide architecting") software, an application that permits synthetic biologists to engineer optimally self-assembling DNA constructs from both six- and eight-letter AEGIS alphabets. This software has been used to design oligonucleotides that self-assemble to form complete genes from 20 or more single-stranded synthetic oligonucleotides. OligArch is therefore a key element of a scalable and integrated infrastructure for the rapid and designed engineering of biology.



2021 ◽  
Author(s):  
Afonso Bravo ◽  
Athanasios Typas ◽  
Jan-Willem Veening

The increasingly widespread use of next generation sequencing protocols has brought the need for the development of user-friendly raw data processing tools. Here, we present 2FAST2Q, a versatile and intuitive standalone program capable of extracting and counting feature occurrences in FASTQ files. 2FAST2Q can be used in any experimental setup that requires feature extraction from raw reads, being able to quickly handle mismatch alignments, nucleotide wise Phred score filtering, custom read trimming, and sequence searching within a single program. Using published CRISPRi datasets in which Escherichia coli and Mycobacterium tuberculosis gene essentiality, as well as host-cell sensitivity towards SARS-CoV2 infectivity were tested, we demonstrate that 2FAST2Q efficiently recapitulates the output in read counts per provided feature as with traditional pipelines. Moreover, we show how different FASTQ read filtering parameters impact downstream analysis, and suggest a default usage protocol. 2FAST2Q has a familiar user interface and uses a custom sequence mismatch search algorithm, taking advantage of Pythons numba module JIT runtime speeds. It is thus easier to use and faster than currently available tools, efficiently processing large CRISPRi-Seq or random-barcode sequencing datasets on any up-to-date laptop. 2FAST2Q is available as an executable file for all current operating systems without installation and as a Python3 module on the PyPI repository (available at https://veeninglab.com/2fast2q). We expect that 2FAST2Q will not only be useful for people working in microbiology but also for other fields in which amplicon sequencing data is generated.



Genetics ◽  
2002 ◽  
Vol 160 (1) ◽  
pp. 305-311
Author(s):  
G Pielberg ◽  
C Olsson ◽  
A-C Syvänen ◽  
L Andersson

Abstract Mutations in KIT encoding the mast/stem cell growth factor receptor (MGF) are responsible for coat color variation in domestic pigs. The dominant white phenotype is caused by two mutations, a gene duplication and a splice mutation in one of the copies leading to skipping of exon 17. Here we applied minisequencing and pyrosequencing for quantitative analysis of the number of copies with the splice form. An unexpectedly high genetic diversity was revealed in white pigs. We found four different KIT alleles in a small sample of eight Large White females used as founder animals in a wild boar intercross. A similar number of KIT alleles was found in commercial populations of white Landrace and Large White pigs. We provide evidence for at least two new KIT alleles in pigs, both with a triplication of the gene. The results imply that KIT alleles with the duplication are genetically unstable and new alleles are most likely generated by unequal crossing over. This study provides an improved method for genotyping the complicated Dominant white/KIT locus in pigs. The results also suggest that some alleles may be associated with negative pleiotropic effects on other traits.



2014 ◽  
Vol 10 ◽  
pp. 2348-2360 ◽  
Author(s):  
Kristen K Merritt ◽  
Kevin M Bradley ◽  
Daniel Hutter ◽  
Mariko F Matsuura ◽  
Diane J Rowold ◽  
...  

Background: Many synthetic biologists seek to increase the degree of autonomy in the assembly of long DNA (L-DNA) constructs from short synthetic DNA fragments, which are today quite inexpensive because of automated solid-phase synthesis. However, the low information density of DNA built from just four nucleotide “letters”, the presence of strong (G:C) and weak (A:T) nucleobase pairs, the non-canonical folded structures that compete with Watson–Crick pairing, and other features intrinsic to natural DNA, generally prevent the autonomous assembly of short single-stranded oligonucleotides greater than a dozen or so. Results: We describe a new strategy to autonomously assemble L-DNA constructs from fragments of synthetic single-stranded DNA. This strategy uses an artificially expanded genetic information system (AEGIS) that adds nucleotides to the four (G, A, C, and T) found in standard DNA by shuffling hydrogen-bonding units on the nucleobases, all while retaining the overall Watson–Crick base-pairing geometry. The added information density allows larger numbers of synthetic fragments to self-assemble without off-target hybridization, hairpin formation, and non-canonical folding interactions. The AEGIS pairs are then converted into standard pairs to produce a fully natural L-DNA product. Here, we report the autonomous assembly of a gene encoding kanamycin resistance using this strategy. Synthetic fragments were built from a six-letter alphabet having two AEGIS components, 5-methyl-2’-deoxyisocytidine and 2’-deoxyisoguanosine (respectively S and B), at their overlapping ends. Gaps in the overlapped assembly were then filled in using DNA polymerases, and the nicks were sealed by ligase. The S:B pairs in the ligated construct were then converted to T:A pairs during PCR amplification. When cloned into a plasmid, the product was shown to make Escherichia coli resistant to kanamycin. A parallel study that attempted to assemble similarly sized genes with optimally designed standard nucleotides lacking AEGIS components gave successful assemblies of up to 16 fragments, but generally failed when larger autonomous assemblies were attempted. Conclusion: AEGIS nucleotides, by increasing the information density of DNA, allow larger numbers of DNA fragments to autonomously self-assemble into large DNA constructs. This technology can therefore increase the size of DNA constructs that might be used in synthetic biology.



2020 ◽  
Author(s):  
Chen Chen ◽  
Wanyu Xu ◽  
Ningning Gou ◽  
Lasu Bai ◽  
Lin Wang ◽  
...  

Abstract Background Bud dormancy in deciduous fruit trees enables plants to survive cold weather. The buds adopt dormant state and resume growth after satisfying the chilling requirements. Chilling requirements play a key role in flowering time. So far, several chilling models, including ≤ 7.2 °C model, the 0–7.2 °C model, Utah model, and Dynamic Model, have been developed; however, it is still time-consuming to determine the chilling requirements employing any model. This calls for efficient tools that can analyze data. Results In this study, we developed novel software Chilling and Heat Requirement (CHR), by flexibly integrating data conversions, model selection, calculations, statistical analysis, and plotting. Conclusion CHR is a tool for chilling requirements estimation, which will be very useful to researchers. It is very simple, easy, and user-friendly.



2010 ◽  
Vol 163-167 ◽  
pp. 4564-4569 ◽  
Author(s):  
Ahmad Firman Masudi ◽  
Che Rosmani Che Hassan ◽  
Noor Zalina Mahmood ◽  
Siti Nazziera Mokhtar ◽  
Nik Meriam Sulaiman

Estimation of construction and demolition (C&D) waste amount is crucial for implementing waste minimization program. Estimation of C&D waste amount generated is a mean in assessing the potential for waste reduction. Thus, a better understanding of C&D waste generation in terms of causes and sources can be achieved. The aim of this paper is to conduct a review on available construction waste quantification methods from previous studies, which have been utilized in certain countries, while attempting to choose the most suitable and applicable method, and to direct future studies for better quantification methods. This review is applicable only for building construction projects and did not include civil/infrastructure, demolition, renovation, and excavation projects. Six quantification methods and/or waste audit tool available from literatures are discussed, which include their limitation and future direction for this study. It is believed that some combination of these quantification methods could make a good impact in accurate numerical estimation of construction waste amount generated in building construction projects. A strong and accurate database as presented by Soliz-Guzman, combined with effective, vital, and resourceful estimation suggested by Jalali’s Global Index (GI), also with the aid of user-friendly software tool like the SMARTAudit could provide an effective and reliable waste quantification.





2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Marius Welzel ◽  
Anja Lange ◽  
Dominik Heider ◽  
Michael Schwarz ◽  
Bernd Freisleben ◽  
...  

Abstract Background Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. Results We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). Conclusion Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.



Sign in / Sign up

Export Citation Format

Share Document