Genomics & Informatics

Development of an RNA sequencing panel to detect gene fusions in thyroid cancer

Genomics & Informatics ◽

10.5808/gi.21061 ◽

2021 ◽

Vol 19 (4) ◽

pp. e41

Author(s):

Dongmoung Kim ◽

Seung-Hyun Jung ◽

Yeun-Jun Chung

Keyword(s):

Quality Control ◽

Thyroid Cancer ◽

Rna Sequencing ◽

Copy Number ◽

Gene Fusion ◽

Sanger Sequencing ◽

Limit Of Detection ◽

Housekeeping Genes ◽

Gene Fusions ◽

Copy Number Alterations

In addition to mutations and copy number alterations, gene fusions are commonly identified in cancers. In thyroid cancer, fusions of important cancer-related genes have been commonly reported; however, extant panels do not cover all clinically important gene fusions. In this study, we aimed to develop a custom RNA-based sequencing panel to identify the key fusions in thyroid cancer. Our ThyChase panel was designed to detect 87 types of gene fusion. As quality control of RNA sequencing, five housekeeping genes were included in this panel. When we applied this panel for the analysis of fusions containing reference RNA (HD796), three expected fusions (EML4-ALK, CCDC6-RET, and TPM3-NTRK1) were successfully identified. We confirmed the fusion breakpoint sequences of the three fusions from HD796 by Sanger sequencing. Regarding the limit of detection, this panel could detect the target fusions from a tumor sample containing a 1% fusion-positive tumor cellular fraction. Taken together, our ThyChase panel would be useful to identify gene fusions in the clinical field.

Editor’s introduction to this issue (G&I 19:4, 2021)

Genomics & Informatics ◽

10.5808/gi.19.4.e1 ◽

2021 ◽

Vol 19 (4) ◽

pp. e35

Author(s):

Taesung Park

High-performance computing for SARS-CoV-2 RNAs clustering: a data science‒based genomics approach

Genomics & Informatics ◽

10.5808/gi.21056 ◽

2021 ◽

Vol 19 (4) ◽

pp. e49

Author(s):

Anas Oujja ◽

Mohamed Riduan Abid ◽

Jaouad Boumhidi ◽

Safae Bourhnane ◽

Asmaa Mourhir ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Data Science ◽

Longest Common Subsequence ◽

Rna Sequences ◽

Hadoop Mapreduce ◽

Common Subsequence ◽

Ict Tools ◽

Clustering Approach ◽

Performance Computing

Nowadays, Genomic data constitutes one of the fastest growing datasets in the world. As of 2025, it is supposed to become the fourth largest source of Big Data, and thus mandating adequate high-performance computing (HPC) platform for processing. With the latest unprecedented and unpredictable mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the research community is in crucial need for ICT tools to process SARS-CoV-2 RNA data, e.g., by classifying it (i.e., clustering) and thus assisting in tracking virus mutations and predict future ones. In this paper, we are presenting an HPC-based SARS-CoV-2 RNAs clustering tool. We are adopting a data science approach, from data collection, through analysis, to visualization. In the analysis step, we present how our clustering approach leverages on HPC and the longest common subsequence (LCS) algorithm. The approach uses the Hadoop MapReduce programming paradigm and adapts the LCS algorithm in order to efficiently compute the length of the LCS for each pair of SARS-CoV-2 RNA sequences. The latter are extracted from the U.S. National Center for Biotechnology Information (NCBI) Virus repository. The computed LCS lengths are used to measure the dissimilarities between RNA sequences in order to work out existing clusters. In addition to that, we present a comparative study of the LCS algorithm performance based on variable workloads and different numbers of Hadoop worker nodes.

COVID-19 pandemic: Is it the right time to develop interconnected national biomedical registries?

Genomics & Informatics ◽

10.5808/gi.21021 ◽

2021 ◽

Vol 19 (4) ◽

pp. e50

Author(s):

Athanasios S. Kotoulas

Keyword(s):

The Right

Estimation of the journal distance of Genomics & Informatics from other bioinformatics journals, 2003-2018

Genomics & Informatics ◽

10.5808/gi.21074 ◽

2021 ◽

Vol 19 (4) ◽

pp. e51

Author(s):

Ji-Hye Oh ◽

Hee-Jo Nam ◽

Hyun-Seok Park

Keyword(s):

Content Analysis ◽

Deep Learning ◽

Clustering Analysis ◽

Human Genetics ◽

Descriptive Analysis ◽

Research Articles ◽

Scholarly Journals ◽

Main Method ◽

Disease Markers ◽

Learning Techniques

This study explored the trends of Genomics & Informatics during the period of 2003-2018 in comparison with 11 other scholarly journals: BMC Bioinformatics, Algorithms for Molecular Biology: AMB, BMC Systems Biology, Journal of Computational Biology, Briefings in Bioinformatics, BMC Genomics, Nucleic Acids Research, American Journal of Human Genetics, Oncogenesis, Disease Markers, and Microarrays. In total, 22,423 research articles were reviewed. Content analysis was the main method employed in the current research. The results were interpreted using descriptive analysis, a clustering analysis, word embedding, and deep learning techniques. Trends are discussed for the 12 journals, both individually and collectively. This is an extension of our previous study (PMCID: PMC6808643).

Potential biomarkers and signaling pathways associated with the pathogenesis of primary salivary gland carcinoma: a bioinformatics study

Genomics & Informatics ◽

10.5808/gi.21052 ◽

2021 ◽

Vol 19 (4) ◽

pp. e42

Author(s):

Zeynab Bayat ◽

Fatemeh Ahmadi-Motamayel ◽

Mohadeseh Salimi Parsa ◽

Amir Taherkhani

Keyword(s):

Salivary Gland ◽

Signaling Pathways ◽

Gene Expression Omnibus ◽

Differentially Expressed ◽

P Value ◽

Cancer Tissue ◽

Salivary Gland Carcinoma ◽

Hub Genes ◽

Master Regulators ◽

Gland Carcinoma

Salivary gland carcinoma (SGC) is rare cancer, constituting 6% of neoplasms in the head and neck area. The most responsible genes and pathways involved in the pathology of this disorder have not been fully understood. We aimed to identify differentially expressed genes (DEGs), the most critical hub genes, transcription factors, signaling pathways, and biological processes (BPs) associated with the pathogenesis of primary SGC. The mRNA dataset GSE153283 in the Gene Expression Omnibus database was re-analyzed for determining DEGs in cancer tissue of patients with primary SGC compared to the adjacent normal tissue (adjusted p-value < 0.001; |Log2 fold change| > 1). A protein interaction map (PIM) was built, and the main modules within the network were identified and focused on the different pathways and BP analyses. The hub genes of PIM were discovered, and their associated gene regulatory network was built to determine the master regulators involved in the pathogenesis of primary SGC. A total of 137 genes were found to be differentially expressed in primary SGC. The most significant pathways and BPs that were deregulated in the primary disease condition were associated with the cell cycle and fibroblast proliferation procedures. TP53, EGF, FN1, NOTCH1, EZH2, COL1A1, SPP1, CDKN2A, WNT5A, PDGFRB, CCNB1, and H2AFX were demonstrated to be the most critical genes linked with the primary SGC. SPIB, FOXM1, and POLR2A significantly regulate all the hub genes. This study illustrated several hub genes and their master regulators that might be appropriate targets for the therapeutic aims of primary SGC.

Validation and genetic heritability estimation of known type 2 diabetes related variants in the Korean population

Genomics & Informatics ◽

10.5808/gi.21071 ◽

2021 ◽

Vol 19 (4) ◽

pp. e37

Author(s):

Hye-Mi Jang ◽

Mi Yeong Hwang ◽

Bong-Jo Kim ◽

Young Jin Kim

Keyword(s):

Type 2 Diabetes ◽

Precision Medicine ◽

Association Studies ◽

Scientific Evidence ◽

Korean Population ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genetic Heritability ◽

Heritability Estimation

Genome-wide association studies (GWASs) facilitated the discovery of countless disease-associated variants. However, GWASs have mostly been conducted in European ancestry samples. Recent studies have reported that these European-based association results may reduce disease prediction accuracy when applied in non-Europeans. Therefore, previously reported variants should be validated in non-European populations to establish reliable scientific evidence for precision medicine. In this study, we validated known associations with type 2 diabetes (T2D) and related metabolic traits in 125,850 samples from a Korean population genotyped by the Korea Biobank Array (KBA). At the end of December 2020, there were 8,823 variants associated with glycemic traits, lipids, liver enzymes, and T2D in the GWAS catalog. Considering the availability of imputed datasets in the KBA genome data, publicly available East-Asian T2D summary statistics, and the linkage disequilibrium among the variants (r2 < 0.2), 2,900 independent variants were selected for further analysis. Among these, 1,837 variants (63.3%) were statistically significant (p < 0.05). Most of the non-replicated variants (n = 1,063) showed insufficient statistical power and decreased minor allele frequencies compared with the replicated variants. Moreover, known variants showed <10% genetic heritability. These results could provide valuable scientific evidence for future study designs, the current power of GWASs, and future applications in precision medicine in the Korean population.

Genetic analysis of the postsynaptic transmembrane X-linked neuroligin 3 gene in autism

Genomics & Informatics ◽

10.5808/gi.21029 ◽

2021 ◽

Vol 19 (4) ◽

pp. e44

Author(s):

Rajat Hegde ◽

Smita Hegde ◽

Suyamindra S. Kulkarni ◽

Aditya Pandurangi ◽

Pramod B. Gai ◽

...

Keyword(s):

Transmembrane Protein ◽

Neurodevelopmental Disorder ◽

Cognitive Disorders ◽

Missense Variant ◽

Sequence Variant ◽

Type I ◽

The North ◽

Increased Risk ◽

Coding Variants ◽

Autistic Population

Autism is a complex neurodevelopmental disorder, the prevalence of which has increased drastically in India in recent years. Neuroligin is a type I transmembrane protein that plays a crucial role in synaptogenesis. Alterations in synaptic genes are most commonly implicated in autism and other cognitive disorders. The present study investigated the neuroligin 3 gene in the Indian autistic population by sequencing and in silico pathogenicity prediction of molecular changes. In total, 108 clinically described individuals with autism were included from the North Karnataka region of India, along with 150 age-, sex-, and ethnicity-matched healthy controls. Genomic DNA was extracted from peripheral blood, and exonic regions were sequenced. The functional and structural effects of variants of the neuroligin 3 protein were predicted. One coding sequence variant (a missense variant) and four non-coding variants (two 5'-untranslated region [UTR] variants and two 3'-UTR variants) were recorded. The novel missense variant was found in 25% of the autistic population. The C/C genotype of c.551T>C was significantly more common in autistic children than in controls (p = 0.001), and a significantly increased risk of autism (24.7-fold) was associated with this genotype (p = 0.001). The missense variant showed pathogenic effects and high evolutionary conservation over the functions of the neuroligin 3 protein. In the present study, we reported a novel missense variant, V184A, which causes abnormal neuroligin 3 and was found with high frequency in the Indian autistic population. Therefore, neuroligin is a candidate gene for future molecular investigations and functional analysis in the Indian autistic population.

Microsecond molecular dynamics simulations revealed the inhibitory potency of amiloride analogs against SARS-CoV-2 E viroporin

Genomics & Informatics ◽

10.5808/gi.21040 ◽

2021 ◽

Vol 19 (4) ◽

pp. e48

Author(s):

Abdullah All Jaber ◽

Zeshan Mahmud Chowdhury ◽

Arittra Bhattacharjee ◽

Muntahi Mourin ◽

Chaman Ara Keya ◽

...

Keyword(s):

Molecular Dynamics ◽

Molecular Docking ◽

Envelope Protein ◽

Docking Simulation ◽

E Proteins ◽

Molecular Docking Simulation ◽

Dynamics Simulations ◽

Molecular Docking And Dynamics ◽

Do So ◽

Amiloride Analogs

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) encodes small envelope protein (E) that plays a major role in viral assembly, release, pathogenesis, and host inflammation. Previous studies demonstrated that pyrazine ring containing amiloride analogs inhibit this protein in different types of coronavirus including SARS-CoV-1 small envelope protein E (SARS-CoV-1 E). SARS-CoV-1 E has 93.42% sequence identity with SARS-CoV-2 E and shared a conserved domain NS3/small envelope protein (NS3_envE). Amiloride analog hexamethylene amiloride (HMA) can inhibit SARS-CoV-1 E. Therefore, we performed molecular docking and dynamics simulations to explore whether amiloride analogs are effective in inhibiting SARS-CoV-2 E. To do so, SARS-CoV-1 E and SARS-CoV-2 E proteins were taken as receptors while HMA and 3-amino-5-(azepan-1-yl)-N-(diaminomethylidene)-6-pyrimidin-5-ylpyrazine-2-carboxamide (3A5NP2C) were selected as ligands. Molecular docking simulation showed higher binding affinity scores of HMA and 3A5NP2C for SARS-CoV-2 E than SARS-CoV-1 E. Moreover, HMA and 3A5NP2C engaged more amino acids in SARS-CoV-2 E. Molecular dynamics (MD) simulation for 1 μs (1,000 ns) revealed that these ligands could alter the native structure of the proteins and their flexibility. Our study suggests that suitable amiloride analogs might yield a prospective drug against coronavirus disease 2019.

Statistical models and computational tools for predicting complex traits and diseases

Genomics & Informatics ◽

10.5808/gi.21053 ◽

2021 ◽

Vol 19 (4) ◽

pp. e36

Author(s):

Wonil Chung

Keyword(s):

Genetic Variants ◽

Statistical Models ◽

Complex Traits ◽

Association Studies ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Computational Tools ◽

Genome Wide ◽

Individual Traits ◽

Wide Range

Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.

Genomics & Informatics
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Korea Genome Organization

Development of an RNA sequencing panel to detect gene fusions in thyroid cancer

Editor’s introduction to this issue (G&I 19:4, 2021)

High-performance computing for SARS-CoV-2 RNAs clustering: a data science‒based genomics approach

COVID-19 pandemic: Is it the right time to develop interconnected national biomedical registries?

Estimation of the journal distance of Genomics & Informatics from other bioinformatics journals, 2003-2018

Potential biomarkers and signaling pathways associated with the pathogenesis of primary salivary gland carcinoma: a bioinformatics study

Validation and genetic heritability estimation of known type 2 diabetes related variants in the Korean population

Genetic analysis of the postsynaptic transmembrane X-linked neuroligin 3 gene in autism

Microsecond molecular dynamics simulations revealed the inhibitory potency of amiloride analogs against SARS-CoV-2 E viroporin

Statistical models and computational tools for predicting complex traits and diseases

Export Citation Format

Genomics & InformaticsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Korea Genome Organization

Development of an RNA sequencing panel to detect gene fusions in thyroid cancer

Editor’s introduction to this issue (G&I 19:4, 2021)

High-performance computing for SARS-CoV-2 RNAs clustering: a data science‒based genomics approach

COVID-19 pandemic: Is it the right time to develop interconnected national biomedical registries?

Estimation of the journal distance of Genomics & Informatics from other bioinformatics journals, 2003-2018

Potential biomarkers and signaling pathways associated with the pathogenesis of primary salivary gland carcinoma: a bioinformatics study

Validation and genetic heritability estimation of known type 2 diabetes related variants in the Korean population

Genetic analysis of the postsynaptic transmembrane X-linked neuroligin 3 gene in autism

Microsecond molecular dynamics simulations revealed the inhibitory potency of amiloride analogs against SARS-CoV-2 E viroporin

Statistical models and computational tools for predicting complex traits and diseases

Genomics & Informatics
Latest Publications