scholarly journals OPTIMIR, a novel algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis

2018 ◽  
Author(s):  
Florian Thibord ◽  
Claire Perret ◽  
Maguelonne Roux ◽  
Pierre Suchon ◽  
Marine Germain ◽  
...  

AbstractNext-generation sequencing is an increasingly popular and efficient approach to characterize the full set of microRNAs (miRNAs) present in human biosamples. MiRNAs’ detection and quantification still remain a challenge as they can undergo different post transcriptional modifications and might harbor genetic variations (polymiRs) that may impact on the alignment step. We present a novel algorithm, OPTIMIR, that incorporates biological knowledge on miRNA editing and genome-wide genotype data available in the processed samples to improve alignment accuracy.OPTIMIR was applied to 391 human plasma samples that had been typed with genome-wide genotyping arrays. OPTIMIR was able to detect genotyping errors, suggested the existence of novel miRNAs and highlighted the allelic imbalance expression of polymiRs in heterozygous carriers.OPTIMIR is written in python, and freely available on the GENMED website (http://www.genmed.fr/index.php/fr/) and on Github (github.com/FlorianThibord/OptimiR).

RNA ◽  
2019 ◽  
Vol 25 (6) ◽  
pp. 657-668 ◽  
Author(s):  
Florian Thibord ◽  
Claire Perret ◽  
Maguelonne Roux ◽  
Pierre Suchon ◽  
Marine Germain ◽  
...  

2018 ◽  
Author(s):  
Alexander Lachmann ◽  
Zhuorui Xie ◽  
Avi Ma’ayan

MotivationRNA-sequencing (RNA-seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by a variety of aligners and pipelines. The diversity of processing options reduces interoperability. In addition, the alignment step requires significant computational resources and basic programming knowledge. Elysium enables users of all skill levels to perform a uniform and free RNA-seq alignment in the cloud.ResultsThe Elysium infrastructure is comprised of four components: A file upload API that enables storage of FASTQ files on Amazon S3 without Amazon credentials; an API to handle the cloud alignment job scheduling for uploaded files; and a graphical user interface (GUI) to provide intuitive access to users that do not have command-line access skills.AvailabilityThe Elysium source code is available under the Apache Licence 2.0 on GitHub at: https://github.com/maayanlab/elysiumThe service of cloud based RNA-seq alignment is freely accessible through the Elysium GUI at: http://elysium.cloud


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1294 ◽  
Author(s):  
Ilya Y. Zhbannikov ◽  
Konstantin G. Arbeev ◽  
Anatoliy I. Yashin

Simulation is important in evaluating novel methods when input data is not easily obtainable or specific assumptions are needed. We present cophesim, a software to add the phenotype to generated genotype data prepared with a genetic simulator. The output of cophesim can be used as a direct input for different genome wide association study tools. cophesim is available from https://bitbucket.org/izhbannikov/cophesim.


Author(s):  
Dominic A. Stoll ◽  
Nicolas Danylec ◽  
Christina Grimmler ◽  
Sabine E. Kulling ◽  
Melanie Huch

The strain Adlercreutzia caecicola DSM 22242T (=CCUG 57646T=NR06T) was taxonomically described in 2013 and named as Parvibacter caecicola Clavel et al. 2013. In 2018, the name of the strain DSM 22242T was changed to Adlercreutzia caecicola (Clavel et al. 2013) Nouioui et al. 2018 due to taxonomic investigations of the closely related genera Adlercreutzia, Asaccharobacter and Enterorhabdus within the phylum Actinobacteria . However, the first whole draft genome of strain DSM 22242T was published by our group in 2019. Therefore, the genome was not available within the study of Nouioui et al. (2018). The results of the polyphasic approach within this study, including phenotypic and biochemical analyses and genome-based taxonomic investigations [genome-wide average nucleotide identity (gANI), alignment fraction (AF), average amino acid identity (AAI), percentage of orthologous conserved proteins (POCP) and genome blast distance phylogeny (GBDP) tree], indicated that the proposed change of the name Parvibacter caecicola to Adlercreutzia caecicola was not correct. Therefore, it is proposed that the correct name of Adlercreutzia caecicola (Clavel et al. 2013) Nouioui et al. 2018 strain DSM 22242T is Parvibacter caecicola Clavel et al. 2013.


2019 ◽  
Author(s):  
Ying Sheng ◽  
Chiung-Yu Huang ◽  
Siarhei Lobach ◽  
Lydia Zablotska ◽  
Iryna Lobach ◽  
...  

ABSTRACTLarge-scale genome-wide analyses scans provide massive volumes of genetic variants on large number of cases and controls that can be used to estimate the genetic effects. Yet, the sets of non-genetic variables available in publicly available databases are often brief. It is known that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR) (e.g., Gail et al (1984), Neuhaus et al (1993), Hauck et al (1991), Zeger et al (1988)). We are interested to assess what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable or knowing the distribution of the omitted variable. Importantly, our derivations show that omitting a continuous variable can result in either under- or over-estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false positive rate, and power in the model that omits a continuous variable. We show the application to two genome-wide studies of Alzheimer’s disease.Data Availability StatementThe data that support the findings of this study are openly available in the Database of Genotypes and Phenotypes at [https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000372.v1.p1], reference number [phs000372.v1.p1] and at the Alzheimer’s Disease Neuroimaging Initiative http://adni.loni.usc.edu/.


Author(s):  
Jiao Huang ◽  
Ying Huang

A novel filamentous Actinobacterium, designated strain FXJ1.1311T, was isolated from soil collected in Ngari (Ali) Prefecture, Qinghai-Tibet Plateau, western PR China. The strain showed antimicrobial activity against Gram-positive bacteria and Fusarium oxysporum. Results of phylogenetic analysis based on 16S rRNA gene sequences indicated that strain FXJ1.1311T belonged to the genus Lentzea and showed the highest sequence similarity to Lentzea guizhouensis DHS C013T (98.04%). Morphological and chemotaxonomic characteristics supported its assignment to the genus Lentzea . The genome-wide average nucleotide identity between strain FXJ1.1311T and L. guizhouensis DHS C013T as well as other Lentzea type strains was <82.2 %. Strain FXJ1.1311T also formed a monophyletic line distinct from the known Lentzea species in the phylogenomic tree. In addition, physiological and chemotaxonomic characteristics allowed phenotypic differentiation of the novel strain from L. guizhouensis . Based on the evidence presented here, strain FXJ1.1311T represents a novel species of the genus Lentzea , for which the name Lentzea tibetensis sp. nov. is proposed. The type strain is FXJ1.1311T (=CGMCC 4.7383T=DSM 104975T).


2019 ◽  
Vol 35 (19) ◽  
pp. 3852-3854 ◽  
Author(s):  
You Tang ◽  
Xiaolei Liu

Abstract Motivation Plenty of Genome-Wide-Association-Study (GWAS) methods have been developed for mapping genetic markers that associated with human diseases and agricultural economic traits. Computer simulation is a nice tool to test the performances of various GWAS methods under certain scenarios. Existing tools are either inefficient in terms of computation and memory efficiency or inconvenient to use to simulate big, realistic genotype data and phenotype data to evaluate available GWAS methods. Results Here, we present a GWAS simulation tool named G2P that can be used to simulate genotype data, phenotype data and perform power evaluation of GWAS methods. G2P is a user-friendly tool with all functions is provided in both graphical user interface and pipeline manners and it is available for Windows, Mac and Linux environments. Furthermore, G2P achieves maximum efficiency in terms of both memory usage and simulation speed; with G2P, the simulation of genotype data that includes 1 000 000 samples and 2 000 000 markers can be accomplished in 5 h. Availability and implementation The G2P software, user manual, and example datasets are freely available at GitHub: https://github.com/XiaoleiLiuBio/G2P. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Matthias Munz ◽  
Inken Wohlers ◽  
Eric Simon ◽  
Tobias Reinberger ◽  
Hauke Busch ◽  
...  

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).


Sign in / Sign up

Export Citation Format

Share Document