Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty.Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods.Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

Download Full-text

NANOCOMPLEJO DE FOSFOPÉPTIDO DE CASEÍNA-FOSFATO DE CALCIO AMORFO (CPP-ACP) EN ODONTOLOGÍA: ESTADO DEL ARTE

Revista Facultad de Odontología ◽

10.17533/udea.rfo.v30n2a10 ◽

2019 ◽

Vol 30 (2) ◽

Cited By ~ 1

Author(s):

Cristhian Camilo Madrid Troconis ◽

Sthefanie Del Carmen Perez Puello

Keyword(s):

Dental Caries ◽

Chemical Properties ◽

Potassium Nitrate ◽

Surface Pretreatment ◽

Physico Chemical ◽

External Agents ◽

Risk Patients ◽

Casein Phosphopeptide ◽

Dental Surface ◽

Better Than

Saliva and external agents containing different concentrations of sodium fluoride (NaF) promote the dental remineralization process. However, these resources may not be sufficient to counteract the multiple factors involved in the process of dental caries, especially in high-risk patients. There are alternatives that have been extensively researched, such as casein phosphopeptide-amorphous calcium phosphate (CPP-ACP) which provides essential ions, like phosphate and calcium, acting as an adjuvant in the remineralization process. Manufacturers of CPP-ACP-based products also suggest that it can produce desensitizing effects. This nanocomplex has been used experimentally with some dental cements and adhesive systems, but it is important to clarify the effects of this procedure, and the remineralizing/desensitizing advantages it offers. The objective of this topic review was to present the state of the art on CPP-ACP nanocomplex. In terms of dental caries prevention, this remineralizing option is not better than NaF. CPP-ACP provides a dental desensitizing action, but it is temporary and less effective than other alternatives such as potassium nitrate or NaF. The experimental incorporation of CPP-ACP into dental cements should be controlled for not to compromise the physico-chemical properties of the material. The use of dental products based on this nanocomplex as dental surface pretreatment may decrease the bond strength of adhesive materials, but this effect is material dependent.

Download Full-text

Sentiment Classification Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app9112347 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2347 ◽

Cited By ~ 18

Author(s):

Hannah Kim ◽

Young-Seob Jeong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Text Classification ◽

State Of The Art ◽

Sentiment Classification ◽

Learning Models ◽

Text Data ◽

Textual Data ◽

Better Than

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.

Download Full-text

ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery

Bioinformatics ◽

10.1093/bioinformatics/btz290 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4632-4639 ◽

Cited By ~ 1

Author(s):

Yang Li ◽

Pengyu Ni ◽

Shaoqiang Zhang ◽

Guojun Li ◽

Zhengchang Su

Keyword(s):

Transcription Factors ◽

Gibbs Sampler ◽

Binding Sites ◽

Motif Discovery ◽

Source Code ◽

Motif Finding ◽

Supplementary Information ◽

Highly Efficient ◽

Motif Finder ◽

Motif Finding Algorithm

Abstract Motivation The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets. Results We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes. Availability and implementation Source code and binaries are freely available for download at https://github.com/zhengchangsulab/prosampler. It was implemented in C++ and supported on Linux, macOS and MS Windows platforms. Supplementary information Supplementary materials are available at Bioinformatics online.

Download Full-text

An efficient motif finding algorithm for large DNA data sets

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2014.6999191 ◽

2014 ◽

Cited By ~ 4

Author(s):

Qiang Yu ◽

Hongwei Huo ◽

Xiaoyang Chen ◽

Haitao Guo ◽

Jeffrey Scott Vitter ◽

...

Keyword(s):

Motif Finding ◽

Data Sets ◽

Motif Finding Algorithm

Download Full-text

A Motif Finding Algorithm Based on Color Coding Technology

Journal of Software ◽

10.1360/jos181298 ◽

2007 ◽

Vol 18 (4) ◽

pp. 1298 ◽

Cited By ~ 1

Author(s):

Jian-Xin WANG

Keyword(s):

Motif Finding ◽

Color Coding ◽

Motif Finding Algorithm

Download Full-text

RANGI: A Fast List-Colored Graph Motif Finding Algorithm

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2012.167 ◽

2013 ◽

Vol 10 (2) ◽

pp. 504-513 ◽

Cited By ~ 5

Author(s):

Ali Gholami Rudi ◽

Saeed Shahrivari ◽

Saeed Jalili ◽

Zahra Razaghi Moghadam Kashani

Keyword(s):

Motif Finding ◽

Colored Graph ◽

Graph Motif ◽

Motif Finding Algorithm

Download Full-text

A deterministic motif finding algorithm with application to the human genome

Bioinformatics ◽

10.1093/bioinformatics/btl037 ◽

2006 ◽

Vol 22 (9) ◽

pp. 1047-1054 ◽

Cited By ~ 11

Author(s):

Lawrence S Hon ◽

Ajay N Jain

Keyword(s):

Human Genome ◽

Motif Finding ◽

Motif Finding Algorithm

Download Full-text

On the unsupervised analysis of domain-specific Chinese texts

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1516510113 ◽

2016 ◽

Vol 113 (22) ◽

pp. 6154-6159 ◽

Cited By ~ 5

Author(s):

Ke Deng ◽

Peter K. Bol ◽

Kate J. Li ◽

Jun S. Liu

Keyword(s):

Chinese Text ◽

Context Analysis ◽

Text Data ◽

Training Corpus ◽

Domain Specific ◽

Association Pattern ◽

Supervised Segmentation ◽

Chinese Texts ◽

Chinese Text Mining ◽

Better Than

With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.

Download Full-text

A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets

BioMed Research International ◽

10.1155/2015/218068 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

Yipu Zhang ◽

Ping Wang

Keyword(s):

High Throughput ◽

Motif Discovery ◽

Large Scale ◽

High Throughput Sequencing ◽

Es Cells ◽

Motif Finding ◽

Data Sets ◽

Data Set ◽

Binding Motifs ◽

Motif Finding Algorithm

New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the(l, d)motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the(l, d)motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME.

Download Full-text

Comparison of result differences in multiple implementations of a stochastic motif finding algorithm

2015 E-Health and Bioengineering Conference (EHB) ◽

10.1109/ehb.2015.7391409 ◽

2015 ◽

Author(s):

Mihai Isaroiu ◽

Luca Dan Serbanati

Keyword(s):

Motif Finding ◽

Motif Finding Algorithm

Download Full-text