base calling
Recently Published Documents


TOTAL DOCUMENTS

131
(FIVE YEARS 42)

H-INDEX

20
(FIVE YEARS 5)

2022 ◽  
Vol 2 ◽  
Author(s):  
August Yue Huang ◽  
Eunjung Alice Lee

Somatic mutations are DNA variants that occur after the fertilization of zygotes and accumulate during the developmental and aging processes in the human lifespan. Somatic mutations have long been known to cause cancer, and more recently have been implicated in a variety of non-cancer diseases. The patterns of somatic mutations, or mutational signatures, also shed light on the underlying mechanisms of the mutational process. Advances in next-generation sequencing over the decades have enabled genome-wide profiling of DNA variants in a high-throughput manner; however, unlike germline mutations, somatic mutations are carried only by a subset of the cell population. Thus, sensitive bioinformatic methods are required to distinguish mutant alleles from sequencing and base calling errors in bulk tissue samples. An alternative way to study somatic mutations, especially those present in an extremely small number of cells or even in a single cell, is to sequence single-cell genomes after whole-genome amplification (WGA); however, it is critical and technically challenging to exclude numerous technical artifacts arising during error-prone and uneven genome amplification in current WGA methods. To address these challenges, multiple bioinformatic tools have been developed. In this review, we summarize the latest progress in methods for identification of somatic mutations and the challenges that remain to be addressed in the future.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Alessio Marcozzi ◽  
Myrthe Jager ◽  
Martin Elferink ◽  
Roy Straver ◽  
Joost H. van Ginkel ◽  
...  

AbstractLevels of circulating tumor DNA (ctDNA) in liquid biopsies may serve as a sensitive biomarker for real-time, minimally-invasive tumor diagnostics and monitoring. However, detecting ctDNA is challenging, as much fewer than 5% of the cell-free DNA in the blood typically originates from the tumor. To detect lowly abundant ctDNA molecules based on somatic variants, extremely sensitive sequencing methods are required. Here, we describe a new technique, CyclomicsSeq, which is based on Oxford Nanopore sequencing of concatenated copies of a single DNA molecule. Consensus calling of the DNA copies increased the base-calling accuracy ~60×, enabling accurate detection of TP53 mutations at frequencies down to 0.02%. We demonstrate that a TP53-specific CyclomicsSeq assay can be successfully used to monitor tumor burden during treatment for head-and-neck cancer patients. CyclomicsSeq can be applied to any genomic locus and offers an accurate diagnostic liquid biopsy approach that can be implemented in clinical workflows.


2021 ◽  
Author(s):  
Kathy E Raven ◽  
Danielle Leek ◽  
Beth Blane ◽  
Sophia Girgis ◽  
Asha Akram ◽  
...  

Enterococcus faecium is an important nosocomial pathogen associated with hospital transmission and outbreaks. Based on growing evidence that bacterial whole genome sequencing enhances hospital outbreak investigation of other bacterial species, our aim was to develop and evaluate methods for low volume clinical sequencing of E. faecium. Using a test panel of 22 E. faecium isolates associated previously with hospital transmission, we developed laboratory protocols for DNA extraction and library preparation, which in combination with the Illumina MiniSeq can generate sequence data within 24 hours. The final laboratory protocol took 3.5 hours and showed 98% reproducibility in producing sufficient DNA for sequencing. Repeatability and reproducibility assays based on the laboratory protocol and sequencing demonstrated 100% accuracy in assigning species, sequence type (ST) and (when present) detecting vanA or vanB, with all isolates passing the quality control metrics. Minor variation was detected in base calling of the same isolate genome when tested repeatedly due to variations in mapping and base calling, but application of a SNP cut-off (<15 SNPs) to assign isolates to outbreak clusters showed 100% reproducibility. An evaluation of contamination showed that controls and test E. faecium sequence files contained <0.34% and <2.12% of fragments matching another species, respectively. Deliberate contamination experiments confirmed that this was insufficient to impact on data interpretation. Further work is required to develop informatic tools prior to implementation into clinical practice.


2021 ◽  
Author(s):  
Sepideh Tavakoli ◽  
Mohammad Nabizadehmashhadtoroghi ◽  
Amr Makhamreh ◽  
Howard Gamper ◽  
Neda Rezapour ◽  
...  

Enzyme-mediated chemical modifications to mRNAs have the potential to fine-tune gene expression in response to environmental stimuli. Notably, pseudouridine-modified mRNAs are more resistant to RNase-mediated degradation, more responsive to cellular stress, and have the potential to modulate immunogenicity and enhance translation in vivo. However, the precise biological functions of pseudouridine modification on mRNAs remain unclear due to the lack of sensitive and accurate tools for mapping. We developed a semi-quantitative method for mapping pseudouridylated sites with high confidence directly on mammalian mRNA transcripts via direct RNA, long-read nanopore sequencing. By analysis of a modification-free transcriptome, we demonstrate that the depth of coverage and intrinsic errors associated with specific k-mer sequences are critical parameters for accurate base-calling. We adjust these parameters for high-confidence U-to-C base-calling errors that occur at pseudouridylated sites, which are benchmarked against sites that were identified previously by biochemical methods. We also uncovered new pseudouridylated sites, many of which fall on genes that encode RNA binding proteins and on uridine-rich k-mers. Sites identified by U-to-C base calling error were verified using 1000mer synthetic RNA controls bearing a single pseudouridine in the center position, demonstrating that 1. the U-to-C base-calling error occurs at the site of pseudouridylation, and 2. the basecalling error is systematically under-calling the pseudouridylated sites. High-occupancy sites with >40% U-to-C basecalling error are classified as sites of hyper modification type I, whereas genes with more than one site of pseudouridylation are classified as having type II hyper modification which is confirmed by single-molecule analysis. We report the discovery of mRNAs with up to 7 unique sites of pseudouridine modification. Here we establish an innovative pipeline for direct identification, quantification, and detection of pseudouridine modifications and type I/II hypermodifications on native RNA molecules using long-read sequencing without resorting to RNA amplification, chemical reactions on RNA, enzyme-based replication, or DNA sequencing steps.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jin Young Lee ◽  
Minyoung Kong ◽  
Jinjoo Oh ◽  
JinSoo Lim ◽  
Sung Hee Chung ◽  
...  

AbstractAssembling high-quality microbial genomes using only cost-effective Nanopore long-read systems such as Flongle is important to accelerate research on the microbial genome and the most critical point for this is the polishing process. In this study, we performed an evaluation based on BUSCO and Prokka gene prediction in terms of microbial genome assembly for eight state-of-the-art Nanopore polishing tools and combinations available. In the evaluation of individual tools, Homopolish, PEPPER, and Medaka demonstrated better results than others. In combination polishing, the second round Homopolish, and the PEPPER × medaka combination also showed better results than others. However, individual tools and combinations have specific limitations on usage and results. Depending on the target organism and the purpose of the downstream research, it is confirmed that there remain some difficulties in perfectly replacing the hybrid polishing carried out by the addition of a short-read. Nevertheless, through continuous improvement of the protein pores, related base-calling algorithms, and polishing tools based on improved error models, a high-quality microbial genome can be achieved using only Nanopore reads without the production of additional short-read data. The polishing strategy proposed in this study is expected to provide useful information for assembling the microbial genome using only Nanopore reads depending on the target microorganism and the purpose of the research.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 257-258
Author(s):  
Hanna Ostrovski ◽  
Rodrigo Pelicioni Savegnago ◽  
Wen Huang ◽  
Cedric Gondro

Abstract Most quantitative geneticists are traditionally trained for data analysis in genetic evaluation and genomic prediction, but rarely have extensive knowledge of molecular genetics or experience in experimental labs. Recent products, such as those launched by Oxford Nanopore Technologies (ONT), give those quantitative geneticists a comprehensible and hands-on toolkit to explore DNA sequencing. The ‘MinION’, a small DNA sequencer, is of interest for quantitative geneticists due to both the minimal learning curve and the non-proprietary USB connectivity. This device is small enough to be portable, allowing for potential real-time, on-farm sequencing. The objective of this project is to compare the whole genome sequence (WGS) output of the MinION sequencer to that of the Illumina HiSeq 4000. Blood was collected from a 6-month-old Akaushi calf born on a Michigan State University farm. DNA was extracted from the sample using the QIAamp DNA Blood Kit from Qiagen, and library DNA ligation preparation (SQK-LSK109) from ONT was used. After base-calling with guppy software (provided by ONT), the data were preprocessed and experimental runs with the MinION were compared using quality control. Finally, the data were aligned with guppy software, and was compared to the aligned WGS obtained with Illumina HiSeq. Quality results from each MinION indicate that, despite the low amount of sequence collected in each run (~225,303 reads per run), the quality of bases sequenced was high (Q≥7). The aligned data from the Illumina sequencer provided 40x coverage of the genome, with a total of 739,339,742 reads. Although the amount of data obtained with MinION is much smaller than that of Illumina HiSeq, the high quality of MinION’s data combined with its ease of use give an opportunity of genomic sequencing for users who are either inexperienced or do not have access to large genomic sequencing devices.


2021 ◽  
Author(s):  
Mikhail Pavlenok ◽  
Luning Yu ◽  
Dominik Herrmann ◽  
Meni Wanunu ◽  
Michael Niederweis

Transmembrane protein channels enable fast and highly sensitive electrical detection of single molecules. Nanopore sequencing of DNA was achieved using an engineered Mycobacterium smegmatis porin A (MspA) in combination with a motor enzyme. Due to its favorable channel geometry, the octameric MspA pore exhibits the highest current level as compared to other pore proteins. To date, MspA is the only protein nanopore with a published record of DNA sequencing. While widely used in commercial devices, nanopore sequencing of DNA suffers from significant base-calling errors due to stochastic events of the complex DNA-motor-pore combination and the contribution of up to five nucleotides to the signal at each position. Asymmetric mutations within subunits of the channel protein offer an enormous potential to improve nucleotide resolution and sequencing accuracy. However, random subunit assembly does not allow control of the channel composition of MspA and other oligomeric protein pores. In this study, we showed that it is feasible to convert octameric MspA into a single-chain pore by connecting eight subunits using peptide linkers. We constructed single-chain MspA trimers, pentamers, hexamers and heptamers to demonstrate that it is feasible to alter the subunit stoichiometry and the MspA pore diameter. All single-chain MspA proteins formed functional channels in lipid bilayer experiments. Importantly, we demonstrated that single-chain MspA discriminated all four nucleotides identical to MspA produced from monomers. Thus, single-chain MspA constitutes a new milestone in its development and adaptation as a biosensor for DNA sequencing and many other applications.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hatim Almutairi ◽  
Michael D. Urbaniak ◽  
Michelle D. Bates ◽  
Narissara Jariyapan ◽  
Godwin Kwakye-Nuako ◽  
...  

AbstractWe provide the raw and processed data produced during the genome sequencing of isolates from six species of parasites from the sub-family Leishmaniinae: Leishmania martiniquensis (Thailand), Leishmania orientalis (Thailand), Leishmania enriettii (Brazil), Leishmania sp. Ghana, Leishmania sp. Namibia and Porcisia hertigi (Panama). De novo assembly was performed using Nanopore long reads to construct chromosome backbone scaffolds. We then corrected erroneous base calling by mapping short Illumina paired-end reads onto the initial assembly. Data has been deposited at NCBI as follows: raw sequencing output in the Sequence Read Archive, finished genomes in GenBank, and ancillary data in BioSample and BioProject. Derived data such as quality scoring, SAM files, genome annotations and repeat sequence lists have been deposited in Lancaster University’s electronic data archive with DOIs provided for each item. Our coding workflow has been deposited in GitHub and Zenodo repositories. This data constitutes a resource for the comparative genomics of parasites and for further applications in general and clinical parasitology.


2021 ◽  
Author(s):  
Hui-Su Kim ◽  
Changjae Kim ◽  
George McDonald Church ◽  
Jong Bhak

PGP1 is the first participant of Personal Genome Project. We present the PGP1′s chromosome-scale genome assembly. It was constructed using 255 Gb ultra-long PromethION reads and 97 Gb short paired-end reads. For reducing base calling errors, we corrected PromethION reads using 72 Gb PacBio HiFi reads. 327 Gb Hi-C chromosomal mapping data were utilized to maximize the assembly′s contiguity. PGP1′s contig assembly was 3.01 Gb in length comprising of 4,234 contigs with an N50 value of 33.8 Mb. After scaffolding with Hi-C data and extensive manual curation, we obtained a chromosome-scale assembly that represents 3,880 scaffolds with an N50 value of 142 Mb. From the Merqury assessment, PGP1 assembly achieved a high QV score of Q45.45. For a gene annotation, we predicted 106,789 genes with a liftover from the Gencode 38 and an assembly of transcriptome data.


2021 ◽  
Author(s):  
Cynthia J Burrows ◽  
Nicole J Mathewson ◽  
Aaron M Fleming

Nanopore devices can directly sequence RNA, and the method has the potential to determine locations of epitranscriptomic modifications that have grown in significance because of their roles in cell regulation and stress response. Pseudouridine (Ψ), the most common modification in RNA, was sequenced with a nanopore system using a protein sensor with a helicase brake in synthetic RNAs with 100% modification at 18 known human pseudouridinylation sites. The new signals were compared to native uridine (U) control strands to characterize base calling and associated errors as well as ion current and dwell time changes. The data point to strong sequence context effects in which Ψ can easily be detected in some contexts while in others Ψ yields signals similar to U that would be false negatives in an unknown sample. We identified that the passage of Ψ through the helicase brake slowed the translocation kinetics compared to U and showed a smaller sequence bias that could permit detection of this modification in RNA. The unique signals from Ψ relative to U are proposed to reflect the syn-anti conformational flexibility of Ψ not found in U, and the difference in π stacking between these bases. This observation permitted analysis of SARS-CoV-2 nanopore sequencing data to identify five conserved Ψ sites on the 3′ end of the viral sub-genomic RNAs, and other less conserved Ψ sites. Using the helicase as a sensor protein in nanopore sequencing experiments enables detection of this modification in a greater number of relevant sequence contexts. The data are discussed concerning their analytical and biological significance.


Sign in / Sign up

Export Citation Format

Share Document