repeat detection
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 13)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Vol 1 ◽  
Author(s):  
Matteo Delucchi ◽  
Paulina Näf ◽  
Spencer Bliven ◽  
Maria Anisimova

The Tandem Repeat Annotation Library (TRAL) focuses on analyzing tandem repeat units in genomic sequences. TRAL can integrate and harmonize tandem repeat annotations from a large number of external tools, and provides a statistical model for evaluating and filtering the detected repeats. TRAL version 2.0 includes new features such as a module for identifying repeats from circular profile hidden Markov models, a new repeat alignment method based on the progressive Poisson Indel Process, an improved installation procedure and a docker container. TRAL is an open-source Python 3 library and is available, together with documentation and tutorials viavital-it.ch/software/tral.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
David Mary Rajathei ◽  
Subbiah Parthasarathy ◽  
Samuel Selvaraj

AbstractAmino acid repeats are found to play important roles in both structures and functions of the proteins. These are commonly found in all kingdoms of life, especially in eukaryotes and a larger fraction of human proteins composed of repeats. Further, the abnormal expansions of shorter repeats cause various diseases to humans. Therefore, the analysis of repeats of the entire human proteome along with functional, mutational and disease information would help to better understand their roles in proteins. To fulfill this need, we developed a web database HPREP (http://bioinfo.bdu.ac.in/hprep) for human proteome repeats using Perl and HTML programming. We identified different categories of well-characterized repeats and domain repeats that are present in the human proteome of UniProtKB/Swiss-Prot by using in-house Perl programming and novel repeats by using the repeat detection T-REKS tool as well as XSTREAM web server. Further, these proteins are annotated with functional, mutational and disease information and grouped according to specific repeat types. The developed database enables the users to search by specific repeat type in order to understand their involvement in proteins. Thus, the HPREP database is expected to be a useful resource to gain better insight regarding the different repeats in human proteome and their biological roles.


Marine Drugs ◽  
2020 ◽  
Vol 18 (9) ◽  
pp. 464
Author(s):  
Xinjia Li ◽  
Wanyi Chen ◽  
Dongting Zhangsun ◽  
Sulan Luo

The venom of various Conus species is composed of a rich variety of unique bioactive peptides, commonly referred to as conotoxins (conopeptides). Most conopeptides have specific receptors or ion channels as physiologically relevant targets. In this paper, high-throughput transcriptome sequencing was performed to analyze putative conotoxin transcripts from the venom duct of a vermivorous cone snail species, Conus litteratus native to the South China Sea. A total of 128 putative conotoxins were identified, most of them belonging to 22 known superfamilies, with 43 conotoxins being regarded as belonging to new superfamilies. Notably, the M superfamily was the most abundant in conotoxins among the known superfamilies. A total of 15 known cysteine frameworks were also described. The largest proportion of cysteine frameworks were VI/VII (C-C-CC-C-C), IX (C-C-C-C-C-C) and XIV (C-C-C-C). In addition, five novel cysteine patterns were also discovered. Simple sequence repeat detection results showed that di-nucleotide was the major type of repetition, and the codon usage bias results indicated that the codon usage bias of the conotoxin genes was weak, but the M, O1, O2 superfamilies differed in codon preference. Gene cloning indicated that there was no intron in conotoxins of the B1- or J superfamily, one intron with 1273–1339 bp existed in a mature region of the F superfamily, which is different from the previously reported gene structure of conotoxins from other superfamilies. This study will enhance our understanding of conotoxin diversity, and the new conotoxins discovered in this paper will provide more potential candidates for the development of pharmacological probes and marine peptide drugs.


Author(s):  
Cong Feng ◽  
Min Dai ◽  
Yongjing Liu ◽  
Ming Chen

Abstract DNA repeats are abundant in eukaryotic genomes and have been proved to play a vital role in genome evolution and regulation. A large number of approaches have been proposed to identify various repeats in the genome. Some de novo repeat identification tools can efficiently generate sequence repetitive scores based on k-mer counting for repeat detection. However, we noticed that these tools can still be improved in terms of repetitive score calculation, sensitivity to segmental duplications and detection specificity. Therefore, here, we present a new computational approach named Repeat Locator (RepLoc), which is based on weighted k-mer coverage to quantify the genome sequence repetitiveness and locate the repetitive sequences. According to the repetitiveness map of the human genome generated by RepLoc, we found that there may be relationships between sequence repetitiveness and genome structures. A comprehensive benchmark shows that RepLoc is a more efficient k-mer counting based tool for de novo repeat detection. The RepLoc software is freely available at http://bis.zju.edu.cn/reploc.


2020 ◽  
Vol 36 (10) ◽  
pp. 3260-3262 ◽  
Author(s):  
Vladimir Perovic ◽  
Jeremy Y Leclercq ◽  
Neven Sumonja ◽  
Francois D Richard ◽  
Nevena Veljkovic ◽  
...  

Abstract Motivation Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. Results Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. Availability and implementation Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8169 ◽  
Author(s):  
Lore Goetschalckx ◽  
Johan Wagemans

Images differ in their memorability in consistent ways across observers. What makes an image memorable is not fully understood to date. Most of the current insight is in terms of high-level semantic aspects, related to the content. However, research still shows consistent differences within semantic categories, suggesting a role for factors at other levels of processing in the visual hierarchy. To aid investigations into this role as well as contributions to the understanding of image memorability more generally, we present MemCat. MemCat is a category-based image set, consisting of 10K images representing five broader, memorability-relevant categories (animal, food, landscape, sports, and vehicle) and further divided into subcategories (e.g., bear). They were sampled from existing source image sets that offer bounding box annotations or more detailed segmentation masks. We collected memorability scores for all 10 K images, each score based on the responses of on average 99 participants in a repeat-detection memory task. Replicating previous research, the collected memorability scores show high levels of consistency across observers. Currently, MemCat is the second largest memorability image set and the largest offering a category-based structure. MemCat can be used to study the factors underlying the variability in image memorability, including the variability within semantic categories. In addition, it offers a new benchmark dataset for the automatic prediction of memorability scores (e.g., with convolutional neural networks). Finally, MemCat allows the study of neural and behavioral correlates of memorability while controlling for semantic category.


2019 ◽  
Vol 35 (14) ◽  
pp. i200-i207 ◽  
Author(s):  
Yan Gao ◽  
Bo Liu ◽  
Yadong Wang ◽  
Yi Xing

Abstract Motivation Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. Results We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. Availability and implementation TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter


2019 ◽  
Author(s):  
Lore Goetschalckx ◽  
Johan Wagemans

This is a preprint. Please find the published, peer reviewed version of the paper here: https://peerj.com/articles/8169/. Images differ in their memorability in consistent ways across observers. What makes an image memorable is not fully understood to date. Most of the current insight is in terms of high-level semantic aspects, related to the content. However, research still shows consistent differences within semantic categories, suggesting a role for factors at other levels of processing in the visual hierarchy. To aid investigations into this role as well as contributions to the understanding of image memorability more generally, we present MemCat. MemCat is a category-based image set, consisting of 10K images representing five broader, memorability-relevant categories (animal, food, landscape, sports, and vehicle) and further divided into subcategories (e.g., bear). They were sampled from existing source image sets that offer bounding box annotations or more detailed segmentation masks. We collected memorability scores for all 10K images, each score based on the responses of on average 99 participants in a repeat-detection memory task. Replicating previous research, the collected memorability scores show high levels of consistency across observers. Currently, MemCat is the second largest memorability image set and the largest offering a category-based structure. MemCat can be used to study the factors underlying the variability in image memorability, including the variability within semantic categories. In addition, it offers a new benchmark dataset for the automatic prediction of memorability scores (e.g., with convolutional neural networks). Finally, MemCat allows to study neural and behavioral correlates of memorability while controlling for semantic category.


Sign in / Sign up

Export Citation Format

Share Document