Investigating gene methylation signatures for fetal intolerance prediction

Pregnancy is a complicated and long procedure during one or more offspring development inside a woman. A short period of oxygen shortage after birth is quite normal for most babies and does not threaten their health. However, if babies have to suffer from a long period of oxygen shortage, then this condition is an indication of pathological fetal intolerance, which probably causes their death. The identification of the pathological fetal intolerance from the physical oxygen shortage is one of the important clinical problems in obstetrics for a long time. The clinical syndromes typically manifest five symptoms that indicate that the baby may suffer from fetal intolerance. At present, liquid biopsy combined with high-throughput sequencing or mass spectrum techniques provides a quick approach to detect real-time alteration in the peripheral blood at multiple levels with the rapid development of molecule sequencing technologies. Gene methylation is functionally correlated with gene expression; thus, the combination of gene methylation and expression information would help in screening out the key regulators for the pathogenesis of fetal intolerance. We combined gene methylation and expression features together and screened out the optimal features, including gene expression or methylation signatures, for fetal intolerance prediction for the first time. In addition, we applied various computational methods to construct a comprehensive computational pipeline to identify the potential biomarkers for fetal intolerance dependent on the liquid biopsy samples. We set up qualitative and quantitative computational models for the prediction for fetal intolerance during pregnancy. Moreover, we provided a new prospective for the detailed pathological mechanism of fetal intolerance. This work can provide a solid foundation for further experimental research and contribute to the application of liquid biopsy in antenatal care.

Download Full-text

scLink: Inferring Sparse Gene Co-expression Networks from Single-cell Expression Data

10.1101/2020.09.19.304956 ◽

2020 ◽

Author(s):

Wei Vivian Li ◽

Yanzeng Li

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rapid Development ◽

Real Data ◽

System Level ◽

Expression Data ◽

Sequencing Technologies ◽

Gene Network Analysis ◽

Cell Gene Expression ◽

Cell Gene

AbstractA system-level understanding of the regulation and coordination mechanisms of gene expression is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell-type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and to construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The source code used in this article is available at https://github.com/Vivianstats/scLink.

Download Full-text

A comprehensive review of scaffolding methods in genome assembly

Briefings in Bioinformatics ◽

10.1093/bib/bbab033 ◽

2021 ◽

Author(s):

Junwei Luo ◽

Yawei Wei ◽

Mengna Lyu ◽

Zhengjiang Wu ◽

Xiaoyan Liu ◽

...

Keyword(s):

Genome Assembly ◽

High Throughput Sequencing ◽

Rapid Development ◽

Genomic Research ◽

Future Research ◽

Sequencing Data ◽

Sequencing Technologies ◽

Biological Studies ◽

Downstream Analysis

Abstract In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.

Download Full-text

Prediction of Long Non-Coding RNAs Based on Deep Learning

Genes ◽

10.3390/genes10040273 ◽

2019 ◽

Vol 10 (4) ◽

pp. 273 ◽

Cited By ~ 6

Author(s):

Xiu-Qin Liu ◽

Bing-Xiu Li ◽

Guan-Rong Zeng ◽

Qiao-Yue Liu ◽

Dong-Mei Ai

Keyword(s):

Deep Learning ◽

High Throughput Sequencing ◽

Short Term Memory ◽

Rapid Development ◽

Classification Performance ◽

Sequencing Technology ◽

Learning Framework ◽

Non Coding Rnas ◽

Set Up ◽

Deep Learning Model

With the rapid development of high-throughput sequencing technology, a large number of transcript sequences have been discovered, and how to identify long non-coding RNAs (lncRNAs) from transcripts is a challenging task. The identification and inclusion of lncRNAs not only can more clearly help us to understand life activities themselves, but can also help humans further explore and study the disease at the molecular level. At present, the detection of lncRNAs mainly includes two forms of calculation and experiment. Due to the limitations of bio sequencing technology and ineluctable errors in sequencing processes, the detection effect of these methods is not very satisfactory. In this paper, we constructed a deep-learning model to effectively distinguish lncRNAs from mRNAs. We used k-mer embedding vectors obtained through training the GloVe algorithm as input features and set up the deep learning framework to include a bidirectional long short-term memory model (BLSTM) layer and a convolutional neural network (CNN) layer with three additional hidden layers. By testing our model, we have found that it obtained the best values of 97.9%, 96.4% and 99.0% in F1score, accuracy and auROC, respectively, which showed better classification performance than the traditional PLEK, CNCI and CPC methods for identifying lncRNAs. We hope that our model will provide effective help in distinguishing mature mRNAs from lncRNAs, and become a potential tool to help humans understand and detect the diseases associated with lncRNAs.

Download Full-text

Genomic Variation Prediction: A Summary From Different Views

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.795883 ◽

2021 ◽

Vol 9 ◽

Author(s):

Xiuchun Lin

Keyword(s):

High Throughput Sequencing ◽

Prediction Models ◽

Rapid Development ◽

Genomic Variation ◽

Learning Theories ◽

Paper Machine ◽

Structural Variations ◽

Synonymous Mutations ◽

Sequencing Technologies ◽

Mutation Data

Structural variations in the genome are closely related to human health and the occurrence and development of various diseases. To understand the mechanisms of diseases, find pathogenic targets, and carry out personalized precision medicine, it is critical to detect such variations. The rapid development of high-throughput sequencing technologies has accelerated the accumulation of large amounts of genomic mutation data, including synonymous mutations. Identifying pathogenic synonymous mutations that play important roles in the occurrence and development of diseases from all the available mutation data is of great importance. In this paper, machine learning theories and methods are reviewed, efficient and accurate pathogenic synonymous mutation prediction methods are developed, and a standardized three-level variant analysis framework is constructed. In addition, multiple variation tolerance prediction models are studied and integrated, and new ideas for structural variation detection based on deep information mining are explored.

Download Full-text

A Transcriptome Post-Scaffolding Method for Assembling High Quality Contigs

Computational Biology Journal ◽

10.1155/2014/961823 ◽

2014 ◽

Vol 2014 ◽

pp. 1-4 ◽

Cited By ~ 11

Author(s):

Mingming Liu ◽

Zach N. Adelman ◽

Kevin M. Myles ◽

Liqing Zhang

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Rapid Development ◽

High Quality ◽

High Coverage ◽

Sequencing Technologies ◽

Sequence Variations ◽

Downstream Analysis ◽

Genome Assemblies ◽

Redundant Contigs

With the rapid development of high throughput sequencing technologies, new transcriptomes can be sequenced for little cost with high coverage. Sequence assembly approaches have been modified to meet the requirements for de novo transcriptomes, which have complications not found in traditional genome assemblies such as variation in coverage for each candidate mRNA and alternative splicing. As a consequence, de novo assembly strategies tend to generate a large number of redundant contigs due to sequence variations, which adversely affects downstream analysis and experiments. In this work we proposed TransPS, a transcriptome post-scaffolding method, to generate high quality, nonredundant de novo transcriptomes. TransPS shows promising results on the test transcriptome datasets, where redundancy is greatly reduced by more than 50% and, at the same time, coverage is improved considerably. The web server and source code are available.

Download Full-text

A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data

BMC Bioinformatics ◽

10.1186/s12859-019-3116-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 6

Author(s):

Jing Xu ◽

Peng Wu ◽

Yuehui Chen ◽

Qingfang Meng ◽

Hussain Dawood ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

High Throughput Sequencing ◽

Omics Data ◽

Expression Data ◽

Cancer Subtypes ◽

Stacked Autoencoder ◽

Subtype Classification ◽

Sequencing Technologies ◽

Cancer Subtype

Abstract Background Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification. Results A new hierarchical integration deep flexible neural forest framework is proposed to integrate multi-omics data for cancer subtype classification named as HI-DFNForest. Stacked autoencoder (SAE) is used to learn high-level representations in each omics data, then the complex representations are learned by integrating all learned representations into a layer of autoencoder. Final learned data representations (from the stacked autoencoder) are used to classify patients into different cancer subtypes using deep flexible neural forest (DFNForest) model.Cancer subtype classification is verified on BRCA, GBM and OV data sets from TCGA by integrating gene expression, miRNA expression and DNA methylation data. These results demonstrated that integrating multiple omics data improves the accuracy of cancer subtype classification than only using gene expression data and the proposed framework has achieved better performance compared with other conventional methods. Conclusion The new hierarchical integration deep flexible neural forest framework(HI-DFNForest) is an effective method to integrate multi-omics data to classify cancer subtypes.

Download Full-text

A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering

Current Bioinformatics ◽

10.2174/1574893614666190126144139 ◽

2019 ◽

Vol 14 (6) ◽

pp. 480-490 ◽

Cited By ~ 1

Author(s):

Tuncay Bayrak ◽

Hasan Oğul

Keyword(s):

Gene Expression ◽

Regression Model ◽

Collaborative Filtering ◽

High Throughput Sequencing ◽

Mean Squared Error ◽

Experimental Studies ◽

Kernel Functions ◽

Feature Representation ◽

Sequencing Analysis ◽

Computational Systems Biology

Background: Predicting the value of gene expression in a given condition is a challenging topic in computational systems biology. Only a limited number of studies in this area have provided solutions to predict the expression in a particular pattern, whether or not it can be done effectively. However, the value of expression for the measurement is usually needed for further meta-data analysis. Methods: Because the problem is considered as a regression task where a feature representation of the gene under consideration is fed into a trained model to predict a continuous variable that refers to its exact expression level, we introduced a novel feature representation scheme to support work on such a task based on two-way collaborative filtering. At this point, our main argument is that the expressions of other genes in the current condition are as important as the expression of the current gene in other conditions. For regression analysis, linear regression and a recently popularized method, called Relevance Vector Machine (RVM), are used. Pearson and Spearman correlation coefficients and Root Mean Squared Error are used for evaluation. The effects of regression model type, RVM kernel functions, and parameters have been analysed in our study in a gene expression profiling data comprising a set of prostate cancer samples. Results: According to the findings of this study, in addition to promising results from the experimental studies, integrating data from another disease type, such as colon cancer in our case, can significantly improve the prediction performance of the regression model. Conclusion: The results also showed that the performed new feature representation approach and RVM regression model are promising for many machine learning problems in microarray and high throughput sequencing analysis.

Download Full-text

Reassortment of Genome Segments Creates Stable Lineages Among Strains of Orchid Fleck Virus Infecting Citrus in Mexico

Phytopathology ◽

10.1094/phyto-07-19-0253-fi ◽

2020 ◽

Vol 110 (1) ◽

pp. 106-120 ◽

Cited By ~ 1

Author(s):

Avijit Roy ◽

Andrew L. Stone ◽

Gabriel Otero-Colina ◽

Gang Wei ◽

Ronald H. Brlansky ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sensu Stricto ◽

Genome Segment ◽

Rt Pcr ◽

Sequence Comparisons ◽

Orchid Fleck Virus ◽

Reverse Transcription Pcr ◽

Sequencing Technologies ◽

Negative Sense

The genus Dichorhavirus contains viruses with bipartite, negative-sense, single-stranded RNA genomes that are transmitted by flat mites to hosts that include orchids, coffee, the genus Clerodendrum, and citrus. A dichorhavirus infecting citrus in Mexico is classified as a citrus strain of orchid fleck virus (OFV-Cit). We previously used RNA sequencing technologies on OFV-Cit samples from Mexico to develop an OFV-Cit–specific reverse transcription PCR (RT-PCR) assay. During assay validation, OFV-Cit–specific RT-PCR failed to produce an amplicon from some samples with clear symptoms of OFV-Cit. Characterization of this virus revealed that dichorhavirus-like particles were found in the nucleus. High-throughput sequencing of small RNAs from these citrus plants revealed a novel citrus strain of OFV, OFV-Cit2. Sequence comparisons with known orchid and citrus strains of OFV showed variation in the protein products encoded by genome segment 1 (RNA1). Strains of OFV clustered together based on host of origin, whether orchid or citrus, and were clearly separated from other dichorhaviruses described from infected citrus in Brazil. The variation in RNA1 between the original (now OFV-Cit1) and the new (OFV-Cit2) strain was not observed with genome segment 2 (RNA2), but instead, a common RNA2 molecule was shared among strains of OFV-Cit1 and -Cit2, a situation strikingly similar to OFV infecting orchids. We also collected mites at the affected groves, identified them as Brevipalpus californicus sensu stricto, and confirmed that they were infected by OFV-Cit1 or with both OFV-Cit1 and -Cit2. OFV-Cit1 and -Cit2 have coexisted at the same site in Toliman, Queretaro, Mexico since 2012. OFV strain-specific diagnostic tests were developed.

Download Full-text

Application of Oxford Nanopore Technology to Plant Virus Detection

Viruses ◽

10.3390/v13081424 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1424

Author(s):

Lia W. Liefting ◽

David W. Waite ◽

Jeremy R. Thompson

Keyword(s):

Plant Virus ◽

High Throughput Sequencing ◽

Virus Detection ◽

Diagnostic Methods ◽

Plant Virus Detection ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Virus Diagnostics ◽

Post Entry ◽

Read Accuracy

The adoption of Oxford Nanopore Technologies (ONT) sequencing as a tool in plant virology has been relatively slow despite its promise in more recent years to yield large quantities of long nucleotide sequences in real time without the need for prior amplification. The portability of the MinION and Flongle platforms combined with lowering costs and continued improvements in read accuracy make ONT an attractive method for both low- and high-scale virus diagnostics. Here, we provide a detailed step-by-step protocol using the ONT Flongle platform that we have developed for the routine application on a range of symptomatic post-entry quarantine and domestic surveillance plant samples. The aim of this methods paper is to highlight ONT’s feasibility as a valuable component to the diagnostician’s toolkit and to hopefully stimulate other laboratories towards the eventual goal of integrating high-throughput sequencing technologies as validated plant virus diagnostic methods in their own right.

Download Full-text

Fast and Efficient 5′P Degradome Library Preparation for Analysis of Co-Translational Decay in Arabidopsis

Plants ◽

10.3390/plants10030466 ◽

2021 ◽

Vol 10 (3) ◽

pp. 466

Author(s):

Marie-Christine Carpentier ◽

Cécile Bousquet-Antonelli ◽

Rémy Merret

Keyword(s):

Gene Expression ◽

Quality Control ◽

Transcriptional Regulation ◽

High Throughput ◽

Library Preparation ◽

Working Day ◽

Set Up ◽

Post Transcriptional Regulation ◽

Commercial Kit

The recent development of high-throughput technologies based on RNA sequencing has allowed a better description of the role of post-transcriptional regulation in gene expression. In particular, the development of degradome approaches based on the capture of 5′monophosphate decay intermediates allows the discovery of a new decay pathway called co-translational mRNA decay. Thanks to these approaches, ribosome dynamics could now be revealed by analysis of 5′P reads accumulation. However, library preparation could be difficult to set-up for non-specialists. Here, we present a fast and efficient 5′P degradome library preparation for Arabidopsis samples. Our protocol was designed without commercial kit and gel purification and can be easily done in one working day. We demonstrated the robustness and the reproducibility of our protocol. Finally, we present the bioinformatic reads-outs necessary to assess library quality control.

Download Full-text