scholarly journals Community Research Amid COVID-19 Pandemic: Genomics Analysis of SARS-CoV-2 over Public GALAXY server

Author(s):  
Ambarish Kumar ◽  
Ali Haider Bangash ◽  
Bjoern Gruening

Citizen Science has come up to perform analytics over the SARS-CoV-2 genome. Public GALAXY servers provide an automated platform for genomics analysis. Study includes design of GALAXY workflows for RNASEQ assembly and annotation as well as genomic variant discovery and perform analysis across four samples of SARS-CoV-2 infected humans obtained from the local population of Wuhan, China. It provides information about transcriptomics and genomic variants across the SARS-CoV-2 genome. Study can be extended to perform evolutionary and comparative study across each species of coronaviruses. Augmented and integrated study with cheminformatics and immunoinformatics will be a way forward for drug discovery and vaccine development.

2020 ◽  
Author(s):  
Ambarish Kumar ◽  
Ali Haider Bangash

AbstractGenomics has emerged as one of the major sources of big data. The task of augmenting data-driven challenges into bioinformatics can be met using technologies of parallel and distributed computing. GATK4 tools for genomic variants detection are enabled for high-performance computing platforms – SPARK Map Reduce framework. GATK4+WDL+CROMWELL+SPARK+DOCKER is proposed as the way forward in achieving automation, reproducibility, reusability, customization, portability and scalability. SPARK-based tools perform equally well in genomic variants detection with that of standard implementation of GATK4 tools over a command-line interface. Implementation of workflows over cloud-based high-performance computing platforms will enhance usability and will be a way forward in community research and infrastructure development for genomic variant discovery.


Biometrika ◽  
2021 ◽  
Author(s):  
Lorenzo Masoero ◽  
Federico Camerlenghi ◽  
Stefano Favaro ◽  
Tamara Broderick

Abstract While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, scientists face a natural trade-off between quantity and quality: spending resources to sequence a greater number of genomes or spending resources to sequence genomes with increased accuracy. Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible. In this paper, we introduce a Bayesian nonparametric methodology to predict the number of new variants in a follow-up study based on a pilot study. When experimental conditions are kept constant between the pilot and follow-up, we find that our prediction is competitive with the best existing methods. Unlike current methods, though, our new method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for more realistic predictions and for optimal allocation of a fixed budget between quality and quantity.


2020 ◽  
Vol 34 (01) ◽  
pp. 598-605
Author(s):  
Chaoran Cheng ◽  
Fei Tan ◽  
Zhi Wei

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 11045-11045
Author(s):  
Nathan David Seligson ◽  
Achal Awasthi ◽  
Sherri Z. Millis ◽  
David A. Liebner ◽  
John L. Hays ◽  
...  

11045 Background: Epithelioid hemangioendothelioma (EHE) is a rare vascular sarcoma characterized by the WWTR1- CAMTA1 fusion ( WC-F) in a majority of cases. EHE demonstrates a biphasic clinical course; remaining indolent for many years before suddenly demonstrating aggressive progression. Cell cycle mutations have been previously noted to account for some secondary alterations; however, little is known regarding the chronicity of these secondary alterations or their clinical implications. Here we present the largest assessment of secondary genomic variants and their clinical import. Methods: Comprehensive genomic profiling from 45 WC-F positive EHE patients (pts) were provided by Foundation Medicine (FMI). 8 of these 45 pts were treated at The Ohio State University (OSU) and were evaluated retrospectively through chart review. Known deleterious alterations, variants of unknown significance (VUS), and genomic copy quantification for the WC-F was included in our analysis. Targetable gene variants were defined by OncoKB. Chi-square and student’s t-tests were used as appropriate. Results: Genomic copy number of the WC-F best fit a log-normal distribution (range: 13-2,131 copies). 20 pts (44%) did not exhibit any secondary genomic variants. The most commonly altered genes included: CDKN2A/B (7 variants), RB1 (3 variants), and ATRX (3 variants). Commonly identified pathways included: cell cycle (9 pts, 20.0%), epigenetic modulators (7 pts, 15.6%), and DNA damage repair (7 pts, 15.6%). Eight pts exhibited targetable gene variants (18%) as defined by OncoKB. Subjects ≥50 years of age exhibited a greater proportion of clinically targetable variants (27.6% vs 0%; p = 0.02). Pts with a secondary genomic variant exhibited elevated WC-F copy numbers (p < 0.001). OSU pts with aggressive EHE were more likely to have a second genomic variant (80% vs 0%; p = 0.03) when compared to indolent EHE, with trends toward higher WC-F copy numbers (809±315 vs 207±147; p = 0.2) and older age at diagnosis (59.5±5.5 vs 36.7±8.8; p = 0.1). Conclusions: In this study, secondary genomic variants in WC-F driven EHE are more common in older patients ( > 50 yo). Further, the presence of secondary genomic variants is associated with an aggressive phenotype and may drive poor prognosis. Prospective research is needed to confirm these findings.


2015 ◽  
Vol 59 (5) ◽  
pp. 755-769 ◽  
Author(s):  
Corinna Jentzsch ◽  
Stathis N. Kalyvas ◽  
Livia Isabella Schubiger

Militias are an empirical phenomenon that has been overlooked by current research on civil war. Yet, it is a phenomenon that is crucial for understanding political violence, civil war, post-conflict politics, and authoritarianism. Militias or paramilitaries are armed groups that operate alongside regular security forces or work independently of the state to shield the local population from insurgents. We review existing uses of the term, explore the range of empirical manifestations of militias, and highlight recent findings, including those supplied by the articles in this special issue. We focus on areas where the recognition of the importance of militias challenges and complements current theories of civil war. We conclude by introducing a research agenda advocating the integrated study of militias and rebel groups.


2021 ◽  
Author(s):  
Mohammad Fazle Alam Rabbi ◽  
Md. Imran Khan ◽  
Saam Hasan ◽  
Mauricio Chalita ◽  
Kazi Nadim Hasan ◽  
...  

AbstractRationaleThe global public health is in serious crisis due to emergence of SARS-CoV-2 virus. Studies are ongoing to reveal the genomic variants of the virus circulating in various parts of the world. However, data generated from low- and middle-income countries are scarce due to resource limitation. This study was focused to perform whole genome sequencing of 151 SARS-CoV-2 isolates from COVID-19 positive Bangladeshi patients. The goal of this study was to identify the genomic variants among the SARS-CoV-2 virus isolates in Bangladesh, to determine the molecular epidemiology and to develop a relationship between host clinical trait with the virus genomic variants.MethodSuspected patients were tested for COVID-19 using one step commercial qPCR kit for SARS-CoV-2 Virus. Viral RNA was extracted from positive patients, converted to cDNA which was amplified using Ion AmpliSeq™ SARS-CoV-2 Research Panel. Massive parallel sequencing was carried out using Ion AmpliSeq™ Library Kit Plus. Assembly of raw data is done by aligning the reads to a pre-defined reference genome (NC_045512.2) while retaining the unique variations of the input raw data by creating a consensus genome. A random forest-based association analysis was carried out to correlate the viral genomic variants with the clinical traits present in the host.ResultAmong the 151 viral isolates, we observed the 413 unique variants. Among these 8 variants occurred in more than 80 % of cases which include 241C to T, 1163A to T, 3037C to T,14408C to T, 23403A to G, 28881G to A, 28882 G to A, and finally the 28883G to C. Phylogenetic analysis revealed a predominance of variants belonging to GR clade, which have a strong geographical presence in Europe, indicating possible introduction of the SARS-CoV-2 virus into Bangladesh through a European channel. However, other possibilities like a route of entry from China cannot be ruled out as viral isolate belonging to L clade with a close relationship to Wuhan reference genome was also detected. We observed a total of 37 genomic variants to be strongly associated with clinical symptoms such as fever, sore throat, overall symptomatic status, etc. (Fisher’s Exact Test p-value<0.05). The most mention-worthy among those were the 3916CtoT (associated with causing sore throat, p-value 0.0005), the 14408C to T (associated with protection from developing cough, p-value= 0.027), and the 28881G to A, 28882G to A, and 28883G to C variant (associated with causing chest pain, p-value 0.025).ConclusionTo our knowledge, this study is the first large scale phylogenomic studies of SARS-CoV-2 virus circulating in Bangladesh. The observed epidemiological and genomic features may inform future research platform for disease management, vaccine development and epidemiological study.


Author(s):  
George S Heriot ◽  
Euzebiusz Jamrozik ◽  
Michael J Selgelid

Background: Human infection challenge studies (HICS) with SARS-CoV-2 are under consideration as a way of accelerating vaccine development. We evaluate potential vaccine research strategies under a range of epidemic conditions determined, in part, by the intensity of public health interventions. Methods: We constructed a compartmental epidemiological model incorporating public health interventions, vaccine efficacy trials and a post-trial population vaccination campaign. The model was used to estimate the duration and benefits of large-scale field trials in comparison with HICS accompanied by an expanded safety trial, and to assess the marginal risk faced by HICS participants. Results: Field trials may demonstrate vaccine efficacy more rapidly than a HICS strategy under epidemic conditions consistent with moderate mitigation policies. A HICS strategy is the only feasible option for testing vaccine efficacy under epidemic suppression, and maximises the benefits of post-trial vaccination. Less successful or absent mitigation results in minimal or no benefit from post-trial vaccination, irrespective of trial design. Conclusions: SARS-CoV-2 HICS are the optimal method of vaccine testing for populations maintained under epidemic suppression, where vaccination offers the greatest benefits to the local population.


2019 ◽  
Author(s):  
Joris Galland ◽  
Stéphanie Ducharme-Bénard

UNSTRUCTURED The exponential development of AI is due to the improvement of processor computing power, deep learning technology, and the free sharing of Big Data. AI can learn independently, without human intervention, and gains ground in medicine: image processing, diagnosis and treatment of cancer through genome study, vaccine development, histological analyses, predictive analyses, etc. Nevertheless, the medical literature considered that AI cannot replace the physician who is essential for social interaction and clinical examination. What will happen in the 2020s? Many improvements are reported in the development of AI: the increase of computing power, the use of new algorithm technologies based on neuroscience, etc. We imagine 4 possible hypotheses for the future: 1) the physician and AI are complementary; the physician examines and interacts with the patient, and AI helps for diagnosis and treatment, 2) AI becomes a “strong” AI, mimics empathy and feelings, and replaces the physician, 3) AI does not progress. The practice of medicine changes very little. AI only interprets imaging studies or acts as a prognostic aid, 4) AI allows transhumanism to flourish. Humans are grafted with neural implants that increase their cognitive functions, allowing them to remain competitive against AI.


2020 ◽  
Vol 221 (11) ◽  
pp. 1855-1863
Author(s):  
Cory J Arrouzet ◽  
Karen Ellis ◽  
Anita Kambhampati ◽  
Yingxi Chen ◽  
Molly Steele ◽  
...  

Abstract Background Noroviruses are a leading cause of acute gastroenteritis. Genogroup 2 type 4 (GII.4) has been the dominant norovirus genotype worldwide since its emergence in the mid-1990s. Individuals with a functional fucosyltransferase-2 gene, known as secretors, have increased susceptibility to GII.4 noroviruses. We hypothesized that this individual-level trait may drive GII.4 norovirus predominance at the human population level. Methods We conducted a systematic review for studies reporting norovirus outbreak or sporadic case genotypes and merged this with data on proportions of human secretor status in various countries from a separate systematic review. We used inverse variance-weighted linear regression to estimate magnitude of the population secretor-GII.4 proportion association. Results Two hundred nineteen genotype and 112 secretor studies with data from 38 countries were included in the analysis. Study-level GII.4 proportion among all noroviruses ranged from 0% to 100%. Country secretor proportion ranged from 43.8% to 93.9%. We observed a 0.69% (95% confidence interval, 0.19–1.18) increase in GII.4 proportion for each percentage increase in human secretor proportion, controlling for Human Development Index. Conclusions Norovirus evolution and diversity may be driven by local population human host genetics. Our results may have vaccine development implications including whether specific antigenic formulations would be required for different populations.


2021 ◽  
Vol 41 ◽  
pp. 02001
Author(s):  
Mayumi Kamada

In genome medicine, which is now being implemented in medical care, variants detected by genome analysis such as next-generation sequencers are clinically interpreted to determine the diagnosis and treatment plan. The clinical interpretation is performed based on the detailed clinical background and the information from journal papers and public databases, such as frequencies in the population and their relationship to the disease. A large amount of genomic data has been accumulated so far, and many genomic variant databases related to diseases have been developed, including ClinVar. On the other hand, the genes and variants involved in diseases are different between populations with different genetic backgrounds. Furthermore, it has been reported that there is a racial bias in the information shared in current public databases, which affects clinical interpretation. Therefore, increasing the diversity of genomic variant data has become an important issue worldwide. In Japan, the Japan Agency for Medical Research and Development (AMED) launched a project to develop an integrated clinical genome information database in 2016. This project targeted “Cancer,” “Rare/Intractable diseases,” “Infectious diseases,” “Dementia,” and “Hearing loss”, and in collaboration with research institutes that provide genomic medicine in Japan, we developed an integrated database named MGeND (Medical Genomics Japan Database). The MGeND is a freely accessible database, which provides disease-related genomic information detected from the Japanese population. The MGeND widely collects variant data for monogenic diseases represented by rare diseases and polygenic diseases such as dementia and infectious disease. The genome variant data are integrated by genomic position for these diseases and can be searched across diseases. The useful genome analysis methods differ depending on the disease area. Therefore, in addition to “SNV, short indel, SV, and CNV” data handled by ClinVar, MGeND includes GWAS (Genome-Wide Association Study) data, which is widely used in studies of polygenic diseases, and HLA (Human Leukemia Virus) allele frequency data, which is used in immune-related diseases such as infectious diseases. As of September 2021, more than 150,000 variants have been registered in MGeND, and 60,000 unique variants have been made public. Of these variants, about 70% were variants registered only in MGeND and not registered in ClinVar. This fact shows the importance of the efforts to collect genomic information by each ethnic group. On the other hands, many variants have not been annotated with any clinical interpretation because the effects on molecular function and the mechanisms of disease are not clear at this time. These variants of uncertain significance (VUS) are a bottleneck for genomic medicine because they cannot be used for diagnosis or treatment selection. The evaluation of VUS requires detailed experimental validation and a vast amount of knowledge integration, which is costly. In order to understand the molecular function and disease relevance of VUS and to enable optimal drug selection, we have been developing a machine learning-based method for predicting the pathogenicity of variants and a computational platform for estimating the effect of variants on drug sensitivity. Many methods for predicting the pathogenicity of genomic variants using machine learning have been developed. Most of them use the conservation of amino acid or nucleotide sequences among closely related species, physicochemical properties of proteins as features for prediction. There are also many prediction methods based on ensemble learning that aggregate the predicted scores by existing tools. These approaches focus on individual genes and variants and evaluate their effects. However, in many diseases, multiple molecules play a complex role in the pathogenesis of the disease. In other words, to assess the pathological significance of variants more accurately, it is necessary to consider the molecular association. Therefore, we constructed a knowledge graph based on molecular networks, genomic variants, and predicted scores by existing methods and proposed a prediction model using Graph Convolutional Network (GCN). The prediction performance evaluation using a benchmark set showed that the GCN-based method outperformed existing methods. It is known that variants can affect the interaction between a molecule and a drug. For optimal drug selection, it is necessary to clarify the effect of the variant on drug affinity. It is time-consuming and costly to perform experiments on a large number of VUSs. Our previous studies show that molecular dynamics calculation can evaluate the affinity between mutants and drugs energetically and estimate with high accuracy. We are currently working on a project to estimate the effects of a large number of VUSs using the supercomputer Fugaku. To realize calculations for many VUS in this project, we are developing a data platform for seamlessly performing molecular dynamics simulation from genome information. Moreover, we are constructing a database to publish calculation results and their outcomes for contributing a selection of optimal drugs. In the presentation, I will introduce the development of the databases and prediction methods to improve the efficiency of genomic medicine.


Sign in / Sign up

Export Citation Format

Share Document