Generation of small interfering RNA (siRNA) database from SARS-CoV-2 genome sequences

Abstract This protocol aims to describe the building of a database of SARS-CoV-2 targets for siRNA approaches. Starting from the virus reference genome, we will derive sequences from 18 to 21nt-long and verify their similarity against the human genome and coding and non-coding transcriptome, as well as genomes from related viruses. We will also calculate a set of thermodynamic features for those sequences and will infer their efficiencies using three different predictors. The protocol has two main phases: at first, we align sequences against reference genomes. In the second one, we extract the features. The first phase varies in terms of duration, depending on computational power from the running machine and the number of reference genomes. Despite that, the second phase lasts about thirty minutes of execution, also depending on the number of cores of running machine. The constructed database aims to speed the design process by providing a broad set of possible SARS-CoV-2 sequences targets and siRNA sequences.

Download Full-text

A small interfering RNA (siRNA) database for SARS-CoV-2

10.1101/2020.09.30.321596 ◽

2020 ◽

Cited By ~ 1

Author(s):

Inácio Gomes Medeiros ◽

André Salim Khayat ◽

Beatriz Stransky ◽

Sidney Emanuel Batista dos Santos ◽

Paulo Pimentel de Assumpção ◽

...

Keyword(s):

Human Genome ◽

Design Process ◽

Small Interfering Rna ◽

Target Genes ◽

Information Base ◽

Global Pandemic ◽

The World ◽

Rna Genome ◽

Target Sequences ◽

Interfering Rna

ABSTRACTCoronavirus disease 2019 (COVID-19) rapidly transformed into a global pandemic, for which a demand for developing antivirals capable of targeting the SARS-CoV-2 RNA genome and blocking the activity of its genes has emerged. In this work, we propose a database of SARS-CoV-2 targets for siRNA approaches, aiming to speed the design process by providing a broad set of possible targets and siRNA sequences. Beyond target sequences, it also displays more than 170 features, including thermodynamic information, base context, target genes and alignment information of sequences against the human genome, and diverse SARS-CoV-2 strains, to assess whether siRNAs targets bind or not off-target sequences. This dataset is available as a set of four tables in a single spreadsheet file, each table corresponding to sequences of 18, 19, 20, and 21 nucleotides length, respectively, aiming to meet the diversity of technology and expertise among labs around the world concerning siRNAs design of varied sizes, more specifically between 18 and 21nt length. We hope that this database helps to speed the development of new target antivirals for SARS-CoV-2, contributing to more rapid and effective responses to the COVID-19 pandemic.

Download Full-text

benchNGS : An approach to benchmark short reads alignment tools

10.7287/peerj.preprints.1007v1 ◽

2015 ◽

Author(s):

Farzana Rahman ◽

Mehedi Hassan ◽

Alona Kryshchenko ◽

Inna Dubchak ◽

Tatiana V Tatarinova ◽

...

Keyword(s):

Reference Genome ◽

Global Alignment ◽

Short Report ◽

Related Genome ◽

Genome Sequences ◽

Short Reads ◽

Relevant Reference ◽

Next Generation Sequencing Ngs ◽

Reference Genomes ◽

Generation Sequencing

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools .

Download Full-text

benchNGS : An approach to benchmark short reads alignment tools

10.1101/018234 ◽

2015 ◽

Cited By ~ 2

Author(s):

Farzana Rahman ◽

Mehedi Hassan ◽

Alona Martin Kryshchenko ◽

Inna Dubchak ◽

Nikolai Nickolai Alexandrov ◽

...

Keyword(s):

Reference Genome ◽

Global Alignment ◽

Short Report ◽

Related Genome ◽

Genome Sequences ◽

Short Reads ◽

Relevant Reference ◽

Next Generation Sequencing Ngs ◽

Reference Genomes ◽

Generation Sequencing

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the precomputed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools.

Download Full-text

benchNGS : An approach to benchmark short reads alignment tools

10.7287/peerj.preprints.1007 ◽

2015 ◽

Author(s):

Farzana Rahman ◽

Mehedi Hassan ◽

Alona Kryshchenko ◽

Inna Dubchak ◽

Tatiana V Tatarinova ◽

...

Keyword(s):

Reference Genome ◽

Global Alignment ◽

Short Report ◽

Related Genome ◽

Genome Sequences ◽

Short Reads ◽

Relevant Reference ◽

Next Generation Sequencing Ngs ◽

Reference Genomes ◽

Generation Sequencing

Download Full-text

Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

F1000Research ◽

10.12688/f1000research.2-244.v1 ◽

2013 ◽

Vol 2 ◽

pp. 244 ◽

Cited By ~ 4

Author(s):

Ted Kalbfleisch ◽

Michael P. Heaton

Keyword(s):

Gene Function ◽

Reference Genome ◽

Sequence Data ◽

Association Studies ◽

Mammalian Species ◽

Variant Calling ◽

Ovis Aries ◽

Genome Sequences ◽

Mammalian Gene ◽

Reference Genomes

Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these species have provided unique insights into mammalian gene function. However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life. For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project. Only six of these have reference genomes: cattle, swine, sheep, goat, water buffalo, and bison. Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade. In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species’ reference genome (Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago (Bos taurus UMD3.1). In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding genome repeat regions and sex chromosomes, approximately 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep. Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous. These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.

Download Full-text

Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

F1000Research ◽

10.12688/f1000research.2-244.v2 ◽

2014 ◽

Vol 2 ◽

pp. 244 ◽

Cited By ~ 5

Author(s):

Ted Kalbfleisch ◽

Michael P. Heaton

Keyword(s):

Gene Function ◽

Reference Genome ◽

Sequence Data ◽

Association Studies ◽

Mammalian Species ◽

Variant Calling ◽

Ovis Aries ◽

Genome Sequences ◽

Mammalian Gene ◽

Reference Genomes

Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these species have provided unique insights into mammalian gene function. However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life. For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project. Only six of these have reference genomes: cattle, swine, sheep, goat, water buffalo, and bison. Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade. In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species’ reference genome (Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago (Bos taurus UMD3.1). In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep. Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous. These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.

Download Full-text

Featherweight long read alignment using partitioned reference indexes

10.1101/386847 ◽

2018 ◽

Author(s):

Hasindu Gamaarachchi ◽

Sri Parameswaran ◽

Martin A. Smith

Keyword(s):

Mobile Computing ◽

Human Genome ◽

Parameter Optimization ◽

Reference Genome ◽

State Of The Art ◽

Genomic Research ◽

Nanopore Sequencing ◽

Read Alignment ◽

Long Read ◽

Reference Genomes

AbstractThe advent of nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimization and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We extend the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2GB RAM with negligible impact on accuracy.

Download Full-text

Eleven High-Quality Reference Genome Sequences and 360 Draft Assemblies of Shiga Toxin-Producing Escherichia coli Isolates from Human, Food, Animal, and Environmental Sources in Canada

Microbiology Resource Announcements ◽

10.1128/mra.00625-19 ◽

2019 ◽

Vol 8 (41) ◽

Cited By ~ 2

Author(s):

Shari Tyson ◽

Christy-Lynn Peterson ◽

Adam Olson ◽

Shaun Tyler ◽

Natalie Knox ◽

...

Keyword(s):

Escherichia Coli ◽

Shiga Toxin ◽

Reference Genome ◽

Genome Sequences ◽

High Quality ◽

Content Type ◽

Food Animal ◽

Bovine Strain ◽

Human Infections ◽

Reference Genomes

We report high-quality closed reference genomes for 1 bovine strain and 10 human Shiga toxin (Stx)-producing Escherichia coli (STEC) strains from serogroups O26, O45, O91, O103, O104, O111, O113, O121, O145, and O157. We also report draft assemblies, with standardized metadata, for 360 STEC strains isolated from watersheds, animals, farms, food, and human infections.

Download Full-text