scholarly journals Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform

GigaScience ◽  
2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Marcela Sandoval-Velasco ◽  
Juan Antonio Rodríguez ◽  
Cynthia Perez Estrada ◽  
Guojie Zhang ◽  
Erez Lieberman Aiden ◽  
...  

Abstract Background Hi-C experiments couple DNA-DNA proximity with next-generation sequencing to yield an unbiased description of genome-wide interactions. Previous methods describing Hi-C experiments have focused on the industry-standard Illumina sequencing. With new next-generation sequencing platforms such as BGISEQ-500 becoming more widely available, protocol adaptations to fit platform-specific requirements are useful to give increased choice to researchers who routinely generate sequencing data. Results We describe an in situ Hi-C protocol adapted to be compatible with the BGISEQ-500 high-throughput sequencing platform. Using zebra finch (Taeniopygia guttata) as a biological sample, we demonstrate how Hi-C libraries can be constructed to generate informative data using the BGISEQ-500 platform, following circularization and DNA nanoball generation. Our protocol is a modification of an Illumina-compatible method, based around blunt-end ligations in library construction, using un-barcoded, distally overhanging double-stranded adapters, followed by amplification using indexed primers. The resulting libraries are ready for circularization and subsequent sequencing on the BGISEQ series of platforms and yield data similar to what can be expected using Illumina-compatible approaches. Conclusions Our straightforward modification to an Illumina-compatible in situHi-C protocol enables data generation on the BGISEQ series of platforms, thus expanding the options available for researchers who wish to utilize the powerful Hi-C techniques in their research.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yongfeng Liu ◽  
Ran Han ◽  
Letian Zhou ◽  
Mingjie Luo ◽  
Lidong Zeng ◽  
...  

Abstract Background GenoLab M is a recently established next-generation sequencing platform from GeneMind Biosciences. Presently, Illumina sequencers are the globally leading sequencing platform in the next-generation sequencing market. Here, we present the first report to compare the transcriptome and LncRNA sequencing data of the GenoLab M sequencer to NovaSeq 6000 platform in various types of analysis. Results We tested 16 libraries in three species using various library kits from different companies. We compared the data quality, genes expression, alternatively spliced (AS) events, single nucleotide polymorphism (SNP), and insertions–deletions (InDel) between two sequencing platforms. The data suggested that platforms have comparable sensitivity and accuracy in terms of quantification of gene expression levels with technical compatibility. Conclusions Genolab M is a promising next-generation sequencing platform for transcriptomics and LncRNA studies with high performance at low costs.


2021 ◽  
Author(s):  
Yongfeng Liu ◽  
Ran Han ◽  
Letian Zhou ◽  
Mingjie Luo ◽  
Lidong Zeng ◽  
...  

Abstract Background: GenoLab M is a recently established next-generation sequencing platform from GeneMind Biosciences. Presently, Illumina sequencers are the globally leading sequencing platform in the next-generation sequencing market. Here, we present the first report to compare the transcriptome and LncRNA sequencing data of the GenoLab M sequencer to NovaSeq 6000 platform in various types of analysis.Results: We tested 16 libraries in three species using various library kits from different companies. We compared the data quality, genes expression, alternatively spliced (AS) events, single nucleotide polymorphism (SNP), and insertions–deletions (InDel) between two sequencing platforms. The data suggested that platforms have comparable sensitivity and accuracy in terms of quantification of gene expression levels with technical compatibility. Conclusions: Genolab M is a promising sequencing platform for transcriptomics and LncRNA studies with high performance at low costs.


2019 ◽  
Author(s):  
Kate Chkhaidze ◽  
Timon Heide ◽  
Benjamin Werner ◽  
Marc J. Williams ◽  
Weini Huang ◽  
...  

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ulykbek Kairov ◽  
Askhat Molkenov ◽  
Saule Rakhimova ◽  
Ulan Kozhamkulov ◽  
Aigul Sharip ◽  
...  

Abstract Objectives Kazakhstan is a Central Asian crossroad of European and Asian populations situated along the way of the Great Silk Way. The territory of Kazakhstan has historically been inhabited by nomadic tribes and today is the multi-ethnic country with the dominant Kazakh ethnic group. We sequenced and analyzed the whole-genomes of five ethnic healthy Kazakh individuals with high coverage using next-generation sequencing platform. This whole-genome sequence data of healthy Kazakh individuals can be a valuable reference for biomedical studies investigating disease associations and population-wide genomic studies of ethnically diverse Central Asian region. Data description Blood samples have been collected from five ethnic healthy Kazakh individuals living in Kazakhstan. The genomic DNA was extracted from blood and sequenced. Sequencing was performed on Illumina HiSeq2000 next-generation sequencing platform. We sequenced and analyzed the whole-genomes of ethnic Kazakh individuals with the coverage ranging from 26 to 32X. Ranging from 98.85 to 99.58% base pairs were totally mapped and aligned on the human reference genome GRCh37 hg19. Het/Hom and Ts/Tv ratios for each whole genome ranged from 1.35 to 1.49 and from 2.07 to 2.08, respectively. Sequencing data are available in the National Center for Biotechnology Information SRA database under the accession number PRJNA374772.


2018 ◽  
Author(s):  
Jesse Farek ◽  
Daniel Hughes ◽  
Adam Mansfield ◽  
Olga Krasheninina ◽  
Waleed Nasser ◽  
...  

AbstractMotivationThe rapid development of next-generation sequencing (NGS) technologies has lowered the barriers to genomic data generation, resulting in millions of samples sequenced across diverse experimental designs. The growing volume and heterogeneity of these sequencing data complicate the further optimization of methods for identifying DNA variation, especially considering that curated highconfidence variant call sets commonly used to evaluate these methods are generally developed by reference to results from the analysis of comparatively small and homogeneous sample sets.ResultsWe have developed xAtlas, an application for the identification of single nucleotide variants (SNV) and small insertions and deletions (indels) in NGS data. xAtlas is easily scalable and enables execution and retraining with rapid development cycles. Generation of variant calls in VCF or gVCF format from BAM or CRAM alignments is accomplished in less than one CPU-hour per 30× short-read human whole-genome. The retraining capabilities of xAtlas allow its core variant evaluation models to be optimized on new sample data and user-defined truth sets. Obtaining SNV and indels calls from xAtlas can be achieved more than 40 times faster than established methods while retaining the same accuracy.AvailabilityFreely available under a BSD 3-clause license at https://github.com/jfarek/[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Xing Wu ◽  
Christopher Heffelfinger ◽  
Hongyu Zhao ◽  
Stephen L. Dellaporta

Abstract Background The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. Results A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. Conclusions Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Xing Wu ◽  
Christopher Heffelfinger ◽  
Hongyu Zhao ◽  
Stephen L. Dellaporta

Abstract Background The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. Results A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. Conclusions Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Xiaohong Wang ◽  
Yaw A. Afrane ◽  
Guiyun Yan ◽  
Jun Li

Anopheles gambiaeis the major malaria vector in Africa. Examining the molecular basis ofA. gambiaetraits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wildA. gambiaepopulations from malaria-endemic areas. We sequenced the genomes of nine wildA. gambiaemosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed acrossA. gambiaechromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wildA. gambiaefrom malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around theparagene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect theA. gambiaegenome.


2015 ◽  
Vol 33 (3_suppl) ◽  
pp. 67-67
Author(s):  
Carlos Gomez-Martin ◽  
Roberto A. Pazo Cid ◽  
Antonieta Salud ◽  
Paula J. Fonseca ◽  
Ana Leon ◽  
...  

67 Background: HER2 amplified cases are the only subset of gastric carcinoma (GC) patients with an approved targeted therapy (≈20%). In GC it is still unknown if there is a mutually exclusive pattern of mutations in major driver oncogenes. We performed a systematic search for targetable oncogenes in a cohort of HER2 amplified GC patients. Methods: 53 Formalin-fixed paraffin embedded samples from HER2 amplified GC patients (43 tumor and 10 normal samples) were selected for next generation sequencing (NGS). Before DNA extraction a macrodissection procedure was performed to guarantee at least 30% tumor in all cases. DNA samples were sequenced using the Ion Torrent Personal Genome Machine (PGM) sequencing platform (Life Technologies, Carlsbad, CA, USA). The Ion AmpliSeq Cancer Hotspot Panel v2 was used. This panel encompasses more than 2800 mutational hotspots of 50 oncogenes and tumor suppressor genes. Data were processed using the Ion Torrent platform-specific pipeline software Torrent suite v4.2. Moreover, sequencing data were analyzed with Ion Reporter software 4.2 to detect any copy number alteration of the genes included in the panel. Results: We successfully sequenced all samples. We identified 89 mutations in 12 genes (range from 1 ~ 9). The most frequent significant mutations included TP53 mutations (30), PI3KCA (3), SMAD4 (3), CDKN2A (4), CTNNB1 (3) and MET (3). Other mutations were found in KRAS, NOTCH, APC, and VHL genes. We also detected potential amplifications in the KRAS (4), EGFR (9), PI3KCA (11), AKT (6), FGFR (6), CDKN2A (4) and CDH1(8) genes. Among 43 tumor specimens, 86% of specimens harbored at least one genetic alteration, most of them linked to actionable mutations or amplifications Conclusions: Within HER2 amplified GCs, there are additional subsets with a potentially targetable oncogene. Future testing for these targets will benefit from including HER2 amplified GC patients Supported by the Spanish Ministry of Health, Fondo de Investigaciones Sanitarias grant PI11/01005 and European FEDER (PN I+D+I 2008‐20011).


Sign in / Sign up

Export Citation Format

Share Document