An event based topic learning pipeline for neuroimaging literature mining

AbstractNeuroimaging text mining extracts knowledge from neuroimaging texts and has received widespread attention. Topic learning is an important research focus of neuroimaging text mining. However, current neuroimaging topic learning researches mainly used traditional probability topic models to extract topics from literature and cannot obtain high-quality neuroimaging topics. The existing topic learning methods also cannot meet the requirements of topic learning oriented to full-text neuroimaging literature. In this paper, three types of neuroimaging research topic events are defined to describe the process and result of neuroimaging researches. An event based topic learning pipeline, called neuroimaging Event-BTM, is proposed to realize topic learning from full-text neuroimaging literature. The experimental results on the PLoS One data set show that the accuracy and completeness of the proposed method are significantly better than the existing main topic learning methods.

Download Full-text

An Event Based Topic Learning Pipeline for Neuroimaging Literature Mining

10.21203/rs.3.rs-95392/v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lihong Chen ◽

Jianzhuo Yan ◽

Jianhui Chen ◽

Ying Sheng ◽

Zhe Xu ◽

...

Keyword(s):

Text Mining ◽

Full Text ◽

Research Topic ◽

Literature Mining ◽

Important Research ◽

Research Focus ◽

Learning Methods ◽

Data Set ◽

Event Based ◽

Better Than

Abstract Neuroimaging text mining extracts knowledge from neuroimaging text and has received widespread attention. Topic learning is an important research focus of neuroimaging text mining. However, current neuroimaging topic learning researches mainly use traditional probability topic models to extract topics from literature and cannot obtain high-quality neuroimaging topics. The existing topic learning methods cannot meet the requirements of topic learning oriented to full-text neuroimaging literature. In this paper, three types of neuroimaging research topic events are defined to describe the process and result of neuroimaging research. An event based topic learning pipeline, called neuroimaging Event-BTM, is proposed to realize knowledge extraction from full-text neuroimaging literature. The experimental results on the PLoS One data set show that the accuracy and completeness of proposed method are significantly better than the existing main topic learning methods.

Download Full-text

Negotiating a Text Mining License for Faculty Researchers

Information Technology and Libraries ◽

10.6017/ital.v33i3.5485 ◽

2014 ◽

Vol 33 (3) ◽

pp. 5 ◽

Cited By ~ 6

Author(s):

Leslie A. Williams ◽

Lynne M Fox ◽

Christophe Roeder ◽

Lawrence Hunter

Keyword(s):

Text Mining ◽

Language Processing ◽

Full Text ◽

Publishing Industry ◽

Journal Articles ◽

Data Set ◽

Pubmed Central ◽

Extensible Markup ◽

The Right

<p>This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.</p>

Download Full-text

Mining Indirect Temporal Sequential Patterns in Large Transaction Databases

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.385-386.1362 ◽

2013 ◽

Vol 385-386 ◽

pp. 1362-1365

Author(s):

Wei Min Ouyang ◽

Qin Hua Huang

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Research Topic ◽

Sequential Pattern ◽

The Other ◽

Sequential Patterns ◽

Important Research ◽

Data Set ◽

Important Research Topic ◽

The One

Sequential pattern is an important research topic in data mining and knowledge discovery. Traditional algorithms for mining sequential patterns focus on the frequent sequences, which do not consider the infrequent sequences and lifespan of each sequence. On the one hand, some infrequent patterns can provide very useful insight view into the data set, on the other hand, without taking lifespan of each sequence into account, not only some discovered patterns may be invalid, but also some useful patterns may not be discovered. So, we extend the sequential patterns to the indirect temporal sequential patterns, and put forward an algorithm to discover indirect temporal sequential patterns in this paper.

Download Full-text

The Investigation of Different Loss Functions with Capsule Networks for Speech Emotion Recognition

Scientific Programming ◽

10.1155/2021/9916915 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Anfernee Joan B. Ng ◽

Kun-Hong Liu

Keyword(s):

Emotion Recognition ◽

Spatial Information ◽

Image Features ◽

Research Topic ◽

Loss Functions ◽

Speech Emotion Recognition ◽

Important Research ◽

Important Research Topic ◽

The Common ◽

Better Than

Speech emotion recognition (SER) is an important research topic. Image features like spectrograms are one of the common ways of extracting information from speech. In the area of image recognition, a relatively novel type of network called capsule networks has shown good and promising results. This study aims to use capsule networks to encode spatial information from spectrograms and analyse its performance when paired with different loss functions. Experiments comparing the capsule network with models from previous works show that the capsule network performs better than them.

Download Full-text

Development and validation of prognosis model of mortality risk in patients with COVID-19

Epidemiology and Infection ◽

10.1017/s0950268820001727 ◽

2020 ◽

Vol 148 ◽

Cited By ~ 2

Author(s):

Xuedi Ma ◽

Michael Ng ◽

Shuang Xu ◽

Zhouming Xu ◽

Hui Qiu ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Clinical Features ◽

Mortality Risk ◽

Operating Characteristics ◽

Multivariate Logistic Regression ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Better Than

Abstract This study aimed to identify clinical features for prognosing mortality risk using machine-learning methods in patients with coronavirus disease 2019 (COVID-19). A retrospective study of the inpatients with COVID-19 admitted from 15 January to 15 March 2020 in Wuhan is reported. The data of symptoms, comorbidity, demographic, vital sign, CT scans results and laboratory test results on admission were collected. Machine-learning methods (Random Forest and XGboost) were used to rank clinical features for mortality risk. Multivariate logistic regression models were applied to identify clinical features with statistical significance. The predictors of mortality were lactate dehydrogenase (LDH), C-reactive protein (CRP) and age based on 500 bootstrapped samples. A multivariate logistic regression model was formed to predict mortality 292 in-sample patients with area under the receiver operating characteristics (AUROC) of 0.9521, which was better than CURB-65 (AUROC of 0.8501) and the machine-learning-based model (AUROC of 0.4530). An out-sample data set of 13 patients was further tested to show our model (AUROC of 0.6061) was also better than CURB-65 (AUROC of 0.4608) and the machine-learning-based model (AUROC of 0.2292). LDH, CRP and age can be used to identify severe patients with COVID-19 on hospital admission.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types

Journal of Translational Medicine ◽

10.1186/s12967-021-02941-z ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Chris Bauer ◽

Ralf Herwig ◽

Matthias Lienhard ◽

Paul Prasse ◽

Tobias Scheffer ◽

...

Keyword(s):

Text Mining ◽

Knowledge Base ◽

Survival Data ◽

Scientific Literature ◽

Entity Recognition ◽

Literature Mining ◽

Cancer Drugs ◽

Classical Text ◽

Anti Cancer ◽

Cancer Types

Abstract Background There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. Methods In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data. Results We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap. Conclusions Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.

Download Full-text

Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.64 ◽

1995 ◽

Vol 3 (3) ◽

pp. 133-142 ◽

Cited By ~ 10

Author(s):

M. Hana ◽

W.F. McClure ◽

T.B. Whitaker ◽

M. White ◽

D.R. Bahler

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Near Infrared ◽

Back Propagation ◽

Linear Network ◽

Data Set ◽

Input Layer ◽

Propagation Network ◽

Better Than

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

Download Full-text

Conformal Optimization Algorithm for Undevelopable Surfaces and its Application

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.3570 ◽

2014 ◽

Vol 602-605 ◽

pp. 3570-3574

Author(s):

Zhen Hua Luo ◽

Fen Jiang

Keyword(s):

Approximation Algorithms ◽

Optimization Algorithm ◽

Manufacturing Process ◽

Research Topic ◽

Important Research ◽

Computer Aided Geometric Design ◽

Industrial Manufacturing ◽

Important Research Topic ◽

Computer Aided ◽

Planar Materials

In the industrial manufacturing process, most kinds of surfaces are processed by planar materials, but undevelopable surfaces are difficult develop to the plane. The approximation algorithms to develop a undevelopable surface is an important research topic in Computer Aided Geometric Design (CAGD). In this paper, we propose a new approximation algorithms based optimization algorithm. We guarantee the deformation vector make the minimum changes during the developing process. In the paper, some numerical example are given and the can illustrate the our method is effective.

Download Full-text