Bioinformatics Advances
Latest Publications


TOTAL DOCUMENTS

48
(FIVE YEARS 48)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press (OUP)

2635-0041

Author(s):  
Tomasz Konopka ◽  
Letizia Vestito ◽  
Damian Smedley

Abstract Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature, and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53,000 mouse models with mutations in over 15,000 genomic markers and characterized by more than 254,000 annotations using more than 9,000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering, and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases.


Author(s):  
Pierre Morisse ◽  
Claire Lemaitre ◽  
Fabrice Legeai

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances


Author(s):  
Rashedul Islam ◽  
Misha Bilenky ◽  
Andrew P Weng ◽  
Joseph M Connors ◽  
Martin Hirst

Abstract Motivation B-cells display remarkable diversity in producing B-cell receptors through recombination of immunoglobulin V-D-J genes. Somatic hypermutation of immunoglobulin heavy chain variable (IGHV) genes are used as a prognostic marker in B-cell malignancies. Clinically, IGHV mutation status is determined by targeted Sanger sequencing which is a resource intensive and low-throughput procedure. Here we describe a bioinformatic pipeline, CRIS (Complete Reconstruction of Immunoglobulin IGHV-D-J Sequences) that uses RNA sequencing (RNA-seq) datasets to reconstruct IGHV-D-J sequences and determine IGHV somatic hypermutation status. Results CRIS extracts RNA-seq reads aligned to immunoglobulin gene (Ig) loci, performs assembly of Ig-transcripts and aligns the resulting contigs to reference Ig sequences to enumerate and classify somatic hypermutations in the IGHV gene sequence. CRIS improves on existing tools that infer the B-cell receptor (BCR) repertoire from RNA-seq data using a portion IGHV gene segment by de novo assembly. We show that the somatic hypermutation status identified by CRIS using the entire IGHV gene segment is highly concordant with clinical classification in three independent chronic lymphocytic leukemia patient cohorts. Availability The CRIS pipeline is available under the MIT License from https://github.com/Rashedul/CRIS. Supplementary information Supplementary data are available at Bioinformatics Advances online.


Sign in / Sign up

Export Citation Format

Share Document