scholarly journals Database construction for Vietnamese catfish genome

2020 ◽  
Vol 17 (3) ◽  
pp. 449-454
Author(s):  
Nguyen Hoang Vu ◽  
Nguyen Thanh Phuong ◽  
Le Thi Nguyen Binh ◽  
Kim Thi Phuong Oanh

Molecular biological research plays an important role in aquaculture, contributes to the improvement ofbroodstocks efficiently. Recently, with the development of next-generation sequencing (NGS) technology,genomic studies have been rapidly increased, in which data organisation and management hold a crucialposition. After obtaining NGS sequencing data of Vietnamese catfish (Pangasianodon hypophthalmus), wehave analysed and annotated the catfish genome, from which we have constructed a database for efficientusage. The database is built upon open source software following a three-layer model (interface, Web serviceand database) with a convenient interface through Web browsers. Users can look up sequence and annotationdata as well as visualize sequences through the Jbrowse genome browser. This database is important resourcefor functional genome and genetic improvement of the catfish.

2017 ◽  
Vol 2 ◽  
pp. 35 ◽  
Author(s):  
Shazia Mahamdallie ◽  
Elise Ruark ◽  
Shawn Yost ◽  
Emma Ramsay ◽  
Imran Uddin ◽  
...  

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.


2020 ◽  
Author(s):  
Dongqiang Zeng ◽  
Zilan Ye ◽  
Guangchuang Yu ◽  
Jiani Wu ◽  
Yi Xiong ◽  
...  

Motivation: Recent advance in next generation sequencing has triggered the rapid accumulation of publicly available multi-omics datasets. The application of integrated omics to exploring robust signatures for clinical translation is increasingly highlighted, attributed to the clinical success of immune checkpoint blockade in diverse malignancies. However, effective tools to comprehensively interpret multi-omics data is still warranted to provide increased granularity into intrinsic mechanism of oncogenesis and immunotherapeutic sensitivity. Results: We developed a computational tool for effective Immuno-Oncology Biological Research (IOBR), providing comprehensive investigation of estimation of reported or user-built signatures, TME deconvolution and signature construction base on multi-omics data. Notably, IOBR offers batch analyses of these signatures and their correlations with clinical phenotypes, lncRNA profiling, genomic characteristics and signatures generated from single-cell RNA sequencing data in different cancer settings. Additionally, IOBR also integrates multiple existing microenvironmental deconvolution methodologies and signature construction tools for convenient comparison and selection. Collectively, IOBR is a user-friendly tool, to leverage multi-omics data to facilitate immuno-oncology exploration and unveiling of tumor-immune interactions and accelerating precision immunotherapy.


Genome ◽  
2016 ◽  
Vol 59 (1) ◽  
pp. 51-58 ◽  
Author(s):  
Sabah AlMomin ◽  
Vinod Kumar ◽  
Sami Al-Amad ◽  
Mohsen Al-Hussaini ◽  
Talal Dashti ◽  
...  

Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.


Open Biology ◽  
2012 ◽  
Vol 2 (8) ◽  
pp. 120093 ◽  
Author(s):  
Markus Ralser ◽  
Heiner Kuhl ◽  
Meryem Ralser ◽  
Martin Werber ◽  
Hans Lehrach ◽  
...  

Saccharomyces cerevisiae strain W303 is a widely used model organism. However, little is known about its genetic origins, as it was created in the 1970s from crossing yeast strains of uncertain genealogy. To obtain insights into its ancestry and physiology, we sequenced the genome of its variant W303-K6001, a yeast model of ageing research. The combination of two next-generation sequencing (NGS) technologies (Illumina and Roche/454 sequencing) yielded an 11.8 Mb genome assembly at an N50 contig length of 262 kb. Although sequencing was substantially more precise and sensitive than whole-genome tiling arrays, both NGS platforms produced a number of false positives. At a 378× average coverage, only 74 per cent of called differences to the S288c reference genome were confirmed by both techniques. The consensus W303-K6001 genome differs in 8133 positions from S288c, predicting altered amino acid sequence in 799 proteins, including factors of ageing and stress resistance. The W303-K6001 (85.4%) genome is virtually identical (less than equal to 0.5 variations per kb) to S288c, and thus originates in the same ancestor. Non-S288c regions distribute unequally over the genome, with chromosome XVI the most (99.6%) and chromosome XI the least (54.5%) S288c-like. Several of these clusters are shared with Σ 1278B, another widely used S288c-related model, indicating that these strains share a second ancestor. Thus, the W303-K6001 genome pictures details of complex genetic relationships between the model strains that date back to the early days of experimental yeast genetics. Moreover, this study underlines the necessity of combining multiple NGS and genome-assembling techniques for achieving accurate variant calling in genomic studies.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Santosh Anand ◽  
Eleonora Mangano ◽  
Nadia Barizzone ◽  
Roberta Bordoni ◽  
Melissa Sorosina ◽  
...  

Abstract Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3982 ◽  
Author(s):  
RuiJuan Feng ◽  
Xin Wang ◽  
Min Tao ◽  
Guanchao Du ◽  
Qishuo Wang

Vallisneria spinulosa is a freshwater aquatic plant of ecological and economic importance. However, there is limited cytogenetic and genomics information on Vallisneria. In this study, we measured the nuclear DNA content of Vallisneria spinulosa by flow cytometry, performed a de novo assembly, and annotated repetitive sequences by using a combination of next-generation sequencing (NGS) and bioinformatics tools. The genome size of Vallisneria spinulosa is approximately 3,595 Mbp, in which nearly 60% of the genome consists of repetitive sequences. The majority of the repetitive sequences are LTR-retrotransposons comprising 43% of the genome. Although the amount of sequencing data used in this study was not sufficient for a whole-genome assembly, it could generate an overview of representative elements in the genome. These results will lay a new foundation for further studies on various species that belong to the Vallisneria genus.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0252414
Author(s):  
Mônica Silva de Oliveira ◽  
Jorianne Thyeska Castro Alves ◽  
Pablo Henrique Caracciolo Gomes de Sá ◽  
Adonney Allan de Oliveira Veras

Advances in next-generation sequencing (NGS) platforms have had a positive impact on biological research, leading to the development of numerous omics approaches, including genomics, transcriptomics, metagenomics, and pangenomics. These analyses provide insights into the gene contents of various organisms. However, to understand the evolutionary processes of these genes, comparative analysis, which is an important tool for annotation, is required. Using comparative analysis, it is possible to infer the functions of gene contents and identify orthologs and paralogous genes via their homology. Although several comparative analysis tools currently exist, most of them are limited to complete genomes. PAN2HGENE, a computational tool that allows identification of gene products missing from the original genome sequence, with automated comparative analysis for both complete and draft genomes, can be used to address this limitation. In this study, PAN2HGENE was used to identify new products, resulting in altering the alpha value behavior in the pangenome without altering the original genomic sequence. Our findings indicate that this tool represents an efficient alternative for comparative analysis, with a simple and intuitive graphical interface. The PAN2HGENE have been uploaded to SourceForge and are available via: https://sourceforge.net/projects/pan2hgene-software


2019 ◽  
Vol 35 (21) ◽  
pp. 4419-4421 ◽  
Author(s):  
Sun Ah Kim ◽  
Myriam Brossard ◽  
Delnaz Roshandel ◽  
Andrew D Paterson ◽  
Shelley B Bull ◽  
...  

Abstract Summary For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. Availability and implementation The R package is available at https://bioconductor.org/packages/gpart. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 61 (1) ◽  
pp. 124-135 ◽  
Author(s):  
Gavin R Oliver ◽  
Steven N Hart ◽  
Eric W Klee

Abstract BACKGROUND Next generation sequencing (NGS)-based assays continue to redefine the field of genetic testing. Owing to the complexity of the data, bioinformatics has become a necessary component in any laboratory implementing a clinical NGS test. CONTENT The computational components of an NGS-based work flow can be conceptualized as primary, secondary, and tertiary analytics. Each of these components addresses a necessary step in the transformation of raw data into clinically actionable knowledge. Understanding the basic concepts of these analysis steps is important in assessing and addressing the informatics needs of a molecular diagnostics laboratory. Equally critical is a familiarity with the regulatory requirements addressing the bioinformatics analyses. These and other topics are covered in this review article. SUMMARY Bioinformatics has become an important component in clinical laboratories generating, analyzing, maintaining, and interpreting data from molecular genetics testing. Given the rapid adoption of NGS-based clinical testing, service providers must develop informatics work flows that adhere to the rigor of clinical laboratory standards, yet are flexible to changes as the chemistry and software for analyzing sequencing data mature.


2021 ◽  
Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats is an essential but difficult task in biological data analysis. The easyfm ( easy f ile m anipulation) toolkit ( https://github.com/TaekAndBrendan/easyfm ) makes manipulating commonly used NGS files more accessible to biologists. It enables them to perform end-to-end reproducible data analyses using a free standalone desktop application (available on Windows, Mac and Linux). Unlike existing tools (e.g. Galaxy), the Graphical User Interface (GUI)-based easyfm is not dependent on any high-performance computing (HPC) system and can be operated without an internet connection. This specific benefit allow easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


Sign in / Sign up

Export Citation Format

Share Document