Journal of Bioinformatics and Computational Biology
Latest Publications


TOTAL DOCUMENTS

1199
(FIVE YEARS 183)

H-INDEX

42
(FIVE YEARS 4)

Published By World Scientific

1757-6334, 0219-7200

Author(s):  
Xuan Song ◽  
Hai Yun Gao ◽  
Karl Herrup ◽  
Ronald P. Hart

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.


Author(s):  
Ming Cao ◽  
Qinke Peng ◽  
Ze-Gang Wei ◽  
Fei Liu ◽  
Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.


Author(s):  
Akram Emdadi ◽  
Changiz Eslahchi

Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model’s performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF’s logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.


Author(s):  
Quentin Delorme ◽  
Rémy Costa ◽  
Yasmine Mansour ◽  
Anna-Sophie Fiston-Lavier ◽  
Annie Chateau

In this paper, we investigate througth a premilinary study the influence of repeat elements during the assembly process. We analyze the link between the presence and the nature of one type of repeat element, called transposable element (TE) and misassembly events in genome assemblies. We propose to improve assemblies by taking into account the presence of repeat elements, including TEs, during the scaffolding step. We analyze the results and relate the misassemblies to TEs before and after correction.


Author(s):  
Firda Aminy Maruf ◽  
Rian Pratama ◽  
Giltae Song

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.


Author(s):  
Remie Janssen ◽  
Pengyu Liu

Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.


Author(s):  
Daiane Aparecida Zuanetti ◽  
Luis Aparecido Milan

In this paper, we propose a new Bayesian approach for QTL mapping of family data. The main purpose is to model a phenotype as a function of QTLs’ effects. The model considers the detailed familiar dependence and it does not rely on random effects. It combines the probability for Mendelian inheritance of parents’ genotype and the correlation between flanking markers and QTLs. This is an advance when compared with models which use only Mendelian segregation or only the correlation between markers and QTLs to estimate transmission probabilities. We use the Bayesian approach to estimate the number of QTLs, their location and the additive and dominance effects. We compare the performance of the proposed method with variance component and LASSO models using simulated and GAW17 data sets. Under tested conditions, the proposed method outperforms other methods in aspects such as estimating the number of QTLs, the accuracy of the QTLs’ position and the estimate of their effects. The results of the application of the proposed method to data sets exceeded all of our expectations.


Author(s):  
Qiaoji Xu ◽  
Xiaomeng Zhang ◽  
Yue Zhang ◽  
Chunfang Zheng ◽  
James H. Leebens-Mack ◽  
...  

Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].


Sign in / Sign up

Export Citation Format

Share Document