genome analysis toolkit
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 20)

H-INDEX

6
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Qian Zhang ◽  
Hao Liu ◽  
Fengxiao Bu

Rapid advances in next-generation sequencing (NGS) have facilitated ultralarge population and cohort studies that utilized whole-genome sequencing (WGS) to identify DNA variants that may impact gene function. Massive sequencing data require highly efficient bioinformatics tools to complete read alignment and variant calling as the fundamental analysis. Multiple software and hardware acceleration strategies have been developed to boost the analysis speed. This study comprehensively evaluated the germline variant calling of a GPU-based acceleration tool, BaseNumber, using WGS datasets from several sources, including gold-standard samples from the Genome in a Bottle (GIAB) project and the Golden Standard of China Genome (GSCG) project, resequenced GSCG samples, and 100 in-house samples from the China Deafness Genetics Consortium (CDGC) project. Sequencing data were analyzed on the GPU server using BaseNumber, the variant calling outputs of which were compared to the reference VCF or the results generated by the Burrows-Wheeler Aligner (BWA) + Genome Analysis Toolkit (GATK) pipeline on a generic CPU server. BaseNumber demonstrated high precision (99.32%) and recall (99.86%) rates in variant calls compared to the standard reference. The variant calling outputs of the BaseNumber and GATK pipelines were very similar, with a mean F1 of 99.69%. Additionally, BaseNumber took only 23 minutes on average to analyze a 48X WGS sample, which was 215.33 times shorter than the GATK workflow. The GPU-based BaseNumber provides a highly accurate and ultrafast variant calling capability, significantly improving the WGS analysis efficiency and facilitating time-sensitive tests, such as clinical WGS genetic diagnosis, and sheds light on the GPU-based acceleration of other omics data analyses.


Cancers ◽  
2021 ◽  
Vol 13 (20) ◽  
pp. 5233
Author(s):  
Jucimara Colombo ◽  
Marina Gobbe Moschetta-Pinheiro ◽  
Adriana Alonso Novais ◽  
Bruna Ribeiro Stoppe ◽  
Enrico Dumbra Bonini ◽  
...  

Introduction: Breast cancer (BC) is the malignant neoplasm with the highest mortality rate in women and female dogs are good models to study BC. Objective: We investigated the efficacy of liquid biopsy to detect gene mutations in the diagnosis and follow-up of women and female dogs with BC. Materials and Methods: In this study, 57 and 37 BC samples were collected from women and female dogs, respectively. After core biopsy and plasma samples were collected, the DNA and ctDNA of the tumor fragments and plasma were processed for next generation sequencing (NGS) assay. After preprocessing of the data, they were submitted to the Genome Analysis ToolKit (GATK). Results: In women, 1788 variants were identified in tumor fragments and 221 variants in plasma; 66 variants were simultaneously detected in tumors and plasma. Conversely, in female dogs, 1430 variants were found in plasma and 695 variants in tumor fragments; 59 variants were simultaneously identified in tumors and plasma. The most frequently mutated genes in the tumor fragments of women were USH2A, ATM, and IGF2R; in female dogs, they were USH2A, BRCA2, and RRM2. Plasma of women showed the most frequent genetic variations in the MAP3K1, BRCA1, and GRB7 genes, whereas plasma from female dogs had variations in the NF1, ERBB2, and KRT17 genes. Mutations in the AKT1, PIK3CA, and BRIP genes were associated with tumor recurrence, with a highly pathogenic variant in PIK3CA being particularly prominent. We also detected a gain-of-function mutation in the GRB7, MAP3K1, and MLH1 genes. Conclusion: Liquid biopsy is useful to identify specific genetic variations at the beginning of BC manifestation and may be accompanied over the entire follow-up period, thereby supporting the clinicians in refining interventions.


2021 ◽  
Author(s):  
Murat Karamese ◽  
Didem Ozgur ◽  
Emin E Tutuncu

Aims: We present the sequence and single-nucleotide polymorphism (SNP) analysis for 47 complete genomes for SARS-CoV-2 isolates on Turkish patients. Methods: The Illumina MiSeq platform was used for sequencing the libraries. The SNPs were detected by using Genome Analysis Toolkit – HaplotypeCaller v.3.8.0 and were inspected on GenomeBrowse v2.1.2. Results: All viral genome sequences of our isolates were located in lineage B under the different clusters, such as B.1 (n = 3), B.1.1 (n = 28) and B.1.9 (n = 16). According to the Global Initiative on Sharing All Influenza Data nomenclature, all of our complete genomes were placed in G, GR and GH clades. In our study, 549 total and 53 unique SNPs were detected. Conclusion: The results indicate that the SARS-CoV-2 sequences of our isolates have great similarity with all Turkish and European sequences.


BMC Medicine ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Timothy Farinholt ◽  
Harsha Doddapaneni ◽  
Xiang Qin ◽  
Vipin Menon ◽  
Qingchang Meng ◽  
...  

Abstract Background This study aims to identify the causative strain of SARS-CoV-2 in a cluster of vaccine breakthroughs. Vaccine breakthrough by a highly transmissible SARS-CoV-2 strain is a risk to global public health. Methods Nasopharyngeal swabs from suspected vaccine breakthrough cases were tested for SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) by qPCR (quantitative polymerase chain reaction) for Wuhan-Hu1 and alpha variant. Positive samples were then sequenced by Swift Normalase Amplicon Panels to determine the causal variant. GATK (genome analysis toolkit) variants were filtered with allele fraction ≥80 and min read depth 30x. Results Viral sequencing revealed an infection cluster of 6 vaccinated patients infected with the delta (B.1.617.2) SARS-CoV-2 variant. With no history of vaccine breakthrough, this suggests the delta variant may possess immune evasion in patients that received the Pfizer BNT162b2, Moderna mRNA-1273, and Covaxin BBV152. Conclusions Delta variant may pose the highest risk out of any currently circulating SARS-CoV-2 variants, with previously described increased transmissibility over alpha variant and now, possible vaccine breakthrough. Funding Parts of this work was supported by the National Institute of Allergy and Infectious Diseases (1U19AI144297) and Baylor College of Medicine internal funding.


Author(s):  
Jun-Yu Li ◽  
Wei-Xuan Li ◽  
An-Tai Wang ◽  
Zhang Yu

Abstract Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jeong Hoon Lee ◽  
Solbi Kweon ◽  
Yu Rang Park

AbstractGenetic variants causing underlying pharmacogenetic and disease phenotypes have been used as the basis for clinical decision-making. However, due to the lack of standards for next-generation sequencing (NGS) pipelines, reproducing genetic variants among institutions is still difficult. The aim of this study is to show how many important variants for clinical decisions can be individually detected using different pipelines. Genetic variants were derived from 105 breast cancer patient target DNA sequences via three different variant-calling pipelines. HaplotypeCaller, Mutect2 tumor-only mode in the Genome Analysis ToolKit (GATK), and VarScan were used in variant calling from the sequence read data processed by the same NGS preprocessing tools using Variant Effect Predictor. GATK HaplotypeCaller, VarScan, and MuTect2 found 25,130, 16,972, and 4232 variants, comprising 1491, 1400, and 321 annotated variants with ClinVar significance, respectively. The average number of ClinVar significant variants in the patients was 769.43, 16.50% of the variants were detected by only one variant caller. Despite variants with significant impact on clinical decision-making, the detected variants are different for each algorithm. To utilize genetic variants in the clinical field, a strict standard for NGS pipelines is essential.


2021 ◽  
Vol 11 ◽  
Author(s):  
Jin Ok Yang ◽  
Min-Hyuk Choi ◽  
Ji-Yong Yoon ◽  
Jeong-Ju Lee ◽  
Sang Ook Nam ◽  
...  

Lennox-Gastaut syndrome (LGS) is a severe type of childhood-onset epilepsy characterized by multiple types of seizures, specific discharges on electroencephalography, and intellectual disability. Most patients with LGS do not respond well to drug treatment and show poor long-term prognosis. Approximately 30% of patients without brain abnormalities have unidentifiable causes. Therefore, accurate diagnosis and treatment of LGS remain challenging. To identify causative mutations of LGS, we analyzed the whole-exome sequencing data of 17 unrelated Korean families, including patients with LGS and LGS-like epilepsy without brain abnormalities, using the Genome Analysis Toolkit. We identified 14 mutations in 14 genes as causes of LGS or LGS-like epilepsy. 64 percent of the identified genes were reported as LGS or epilepsy-related genes. Many of these variations were novel and considered as pathogenic or likely pathogenic. Network analysis was performed to classify the identified genes into two network clusters: neuronal signal transmission or neuronal development. Additionally, knockdown of two candidate genes with insufficient evidence of neuronal functions, SLC25A39 and TBC1D8, decreased neurite outgrowth and the expression level of MAP2, a neuronal marker. These results expand the spectrum of genetic variations and may aid the diagnosis and management of individuals with LGS.


2020 ◽  
Vol 10 ◽  
Author(s):  
Yongbo Yu ◽  
Chengwen Gao ◽  
Yuanbin Chen ◽  
Meilan Wang ◽  
Jianfeng Zhang ◽  
...  

ObjectivesTo evaluate copy number alterations (CNAs) in genes associated with penile cancer (PeC) and determine their correlation and prognostic ability with PeC.MethodsWhole-exome sequencing was performed for tumor tissue and matched normal DNA of 35 patients diagnosed with penile squamous cell carcinoma from 2011 to 2016. Somatic CNAs were detected using the Genome Analysis Toolkit (GATK). Retrospective clinical data were collected and analyzed. All the data were statistically analyzed using SPSS 16.0 software. The cancer-specific survival rates were estimated by Kaplan-Meier curves and compared with the log-rank test.ResultsCNAs in the MYCN gene was detected in 19 (amplification: 54.29%) patients. Other CNAs gene targets were FAK (amplification: 45.72%, deletion: 8.57%), TP53 (amplification: 2.86%, deletion: 51.43%), TRKA (amplification: 34.29%, deletion: 2.86%), p75NTR (amplification: 5.71%, deletion: 42.86%), Miz-1 (amplification: 14.29%, deletion: 20.00%), Max (amplification: 17.14%, deletion: 2.86%), Bmi1 (amplification:14.29%, deletion: 48.57%), and MDM2 (amplification: 5.71%, deletion: 45.72%). The CNAs in MYCN and FAK correlated significantly with patient prognosis (P<0.05). The 3-year Recurrence-free survival rate was 87.10% among patients followed up. The 5-year survival rate of patients with MYCN amplification was 69.2%, compared to 94.4% in the non-amplification group. The 5-year survival rate of patients with FAK amplification was 65.6%, compared to 94.7% in the non-amplification group. The PPI network showed that TP53 and MYCN might play meaningful functional roles in PeC.ConclusionMYCN and FAK amplification and TP53 deletion were apparent in PeC. MYCN and TP53 were hub genes in PeC. MYCN and FAK amplification was also detected and analyzed, and the findings indicated that these two genes are predictors of poor prognosis in PeC.


2020 ◽  
Author(s):  
Karamese Murat ◽  
Ozgur Didem ◽  
Tutuncu Emin Ediz

ABSTRACTIntroductionWe present the sequence analysis for 47 complete genomes for SARS-CoV-2 isolates on Turkish patients. To identify their genetic similarity, phylogenetic analysis was performed by comparing the worldwide SARS-CoV-2 sequences, selected from GISAID, to the complete genomes from Turkish isolates. In addition, we focused on the variation analysis to show the mutations on SARS-CoV-2 genomes.MethodsIllumina MiSeq platform was used for sequencing the libraries. The raw reads were aligned to the known SARS-CoV-2 genome (GenBank: MN908947.3) using the Burrows-Wheeler aligner (v.0.7.1). The phylogenetic tree was constructer using Phylip v.3.6 with Neighbor-Joining and composite likelihood method. The variants were detected by using Genome Analysis Toolkit-HaplotypeCaller v.3.8.0 and were inspected on GenomeBrowse v2.1.2.ResultsAll viral genome sequences of our isolates was located in lineage B under the different clusters such as B.1 (n=3), B.1.1 (n=28), and B.1.9 (n=16). According to the GISAID nomenclature, all our complete genomes were placed in G, GR and GH clades. Five hundred forty-nine total and 53 unique variants were detected. All 47 genomes exhibited different kinds of variants. The distinct variants consist of 274 missense, 225 synonymous, and 50 non-coding alleles.ConclusionThe results indicated that the SARS-CoV-2 sequences of our isolates have great similarity with all Turkish and European sequences. Further studies should be performed for better comparison of strains, after more complete genome sequences will be released. We also believe that collecting and sharing any data about SARS-CoV-2 virus and COVID-19 will be effective and may help the related studies.


Author(s):  
Zekun Yin ◽  
Xiaoming Xu ◽  
Jinxiao Zhang ◽  
Yanjie Wei ◽  
Bertil Schmidt ◽  
...  

Abstract Motivation Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets. Results We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furthermore, RabbitMash is able to compute the all-versus-all distances of 100 321 genomes in <5 min on a 40-core workstation while Mash requires over 40 min. Availability and implementation RabbitMash is available at https://github.com/ZekunYin/RabbitMash. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document