scholarly journals A Fast Parallel Algorithm for Indexing Human Genome Sequences

2014 ◽  
Vol E97.D (5) ◽  
pp. 1345-1348
Author(s):  
Woong-Kee LOH ◽  
Kyoung-Soo HAN
2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2013 ◽  
Vol 11 (2) ◽  
pp. 77-85 ◽  
Author(s):  
Ben C. Shirley ◽  
Eliseos J. Mucaki ◽  
Tyson Whitehead ◽  
Paul I. Costea ◽  
Pelin Akan ◽  
...  

2010 ◽  
Vol 11 (8) ◽  
pp. R88 ◽  
Author(s):  
Martin G Reese ◽  
Barry Moore ◽  
Colin Batchelor ◽  
Fidel Salas ◽  
Fiona Cunningham ◽  
...  

2020 ◽  
Author(s):  
Inácio Gomes Medeiros ◽  
André Salim Khayat ◽  
Beatriz Stransky ◽  
Sidney Emanuel Batista dos Santos ◽  
Paulo Pimentel de Assumpção ◽  
...  

Abstract This protocol aims to describe the building of a database of SARS-CoV-2 targets for siRNA approaches. Starting from the virus reference genome, we will derive sequences from 18 to 21nt-long and verify their similarity against the human genome and coding and non-coding transcriptome, as well as genomes from related viruses. We will also calculate a set of thermodynamic features for those sequences and will infer their efficiencies using three different predictors. The protocol has two main phases: at first, we align sequences against reference genomes. In the second one, we extract the features. The first phase varies in terms of duration, depending on computational power from the running machine and the number of reference genomes. Despite that, the second phase lasts about thirty minutes of execution, also depending on the number of cores of running machine. The constructed database aims to speed the design process by providing a broad set of possible SARS-CoV-2 sequences targets and siRNA sequences.


2017 ◽  
Vol 26 (16) ◽  
pp. 4145-4157 ◽  
Author(s):  
Michael D. Martin ◽  
Flora Jay ◽  
Sergi Castellano ◽  
Montgomery Slatkin

2001 ◽  
Vol 12 (9) ◽  
pp. 673-677 ◽  
Author(s):  
Shinji Kondo ◽  
Akira Shinagawa ◽  
Tetsuya Saito ◽  
Hidenori Kiyosawa ◽  
Itaru Yamanaka ◽  
...  

Mutagenesis ◽  
2002 ◽  
Vol 17 (6) ◽  
pp. 457-461 ◽  
Author(s):  
I. Dunham

2017 ◽  
Vol 13 (2) ◽  
Author(s):  
Monika Piwowar ◽  
Ewa Matczyńska ◽  
Maciej Malawski ◽  
Tomasz Szapieniec ◽  
Irena Roterman-Konieczna

AbstractThe presented results cover issues related to proteins that were “never born in nature”. The paper is focused on identifying genetic information stretches of protein sequences that were not identified to be existing in nature. The aim of the work was finding traces of “never born proteins” (NBP) everywhere in completely sequenced genomes including regions not expected as carrying the genetic information. The results of analyses relate to the search of the genetic material of species from different levels of the evolutionary tree from yeast through plant organisms up to the human genome. The analysis concerns searching the genome sequences. There are presented statistical details such as sequence frequencies, their length, percent identity and similarity of alignments, as well as E value of sequences found. Computations were performed on gLite-based grid environment. The results of the analyses showed that the NBP genetic record in the genomes of the studied organisms is absent at a significant level in terms of identity of contents and length of the sequences found. Most of the found sequences considered to be similar do not exceed 50% of the length of the NBP output sequences, which confirms that the genetic record of proteins is not accidental in terms of composition of gene sequences but also as regards the place of recording in genomes of living organisms.


Sign in / Sign up

Export Citation Format

Share Document