scholarly journals An event based topic learning pipeline for neuroimaging literature mining

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Lihong Chen ◽  
Jianzhuo Yan ◽  
Jianhui Chen ◽  
Ying Sheng ◽  
Zhe Xu ◽  
...  

AbstractNeuroimaging text mining extracts knowledge from neuroimaging texts and has received widespread attention. Topic learning is an important research focus of neuroimaging text mining. However, current neuroimaging topic learning researches mainly used traditional probability topic models to extract topics from literature and cannot obtain high-quality neuroimaging topics. The existing topic learning methods also cannot meet the requirements of topic learning oriented to full-text neuroimaging literature. In this paper, three types of neuroimaging research topic events are defined to describe the process and result of neuroimaging researches. An event based topic learning pipeline, called neuroimaging Event-BTM, is proposed to realize topic learning from full-text neuroimaging literature. The experimental results on the PLoS One data set show that the accuracy and completeness of the proposed method are significantly better than the existing main topic learning methods.

Author(s):  
Lihong Chen ◽  
Jianzhuo Yan ◽  
Jianhui Chen ◽  
Ying Sheng ◽  
Zhe Xu ◽  
...  

Abstract Neuroimaging text mining extracts knowledge from neuroimaging text and has received widespread attention. Topic learning is an important research focus of neuroimaging text mining. However, current neuroimaging topic learning researches mainly use traditional probability topic models to extract topics from literature and cannot obtain high-quality neuroimaging topics. The existing topic learning methods cannot meet the requirements of topic learning oriented to full-text neuroimaging literature. In this paper, three types of neuroimaging research topic events are defined to describe the process and result of neuroimaging research. An event based topic learning pipeline, called neuroimaging Event-BTM, is proposed to realize knowledge extraction from full-text neuroimaging literature. The experimental results on the PLoS One data set show that the accuracy and completeness of proposed method are significantly better than the existing main topic learning methods.


2014 ◽  
Vol 33 (3) ◽  
pp. 5 ◽  
Author(s):  
Leslie A. Williams ◽  
Lynne M Fox ◽  
Christophe Roeder ◽  
Lawrence Hunter

<p>This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.</p>


2013 ◽  
Vol 385-386 ◽  
pp. 1362-1365
Author(s):  
Wei Min Ouyang ◽  
Qin Hua Huang

Sequential pattern is an important research topic in data mining and knowledge discovery. Traditional algorithms for mining sequential patterns focus on the frequent sequences, which do not consider the infrequent sequences and lifespan of each sequence. On the one hand, some infrequent patterns can provide very useful insight view into the data set, on the other hand, without taking lifespan of each sequence into account, not only some discovered patterns may be invalid, but also some useful patterns may not be discovered. So, we extend the sequential patterns to the indirect temporal sequential patterns, and put forward an algorithm to discover indirect temporal sequential patterns in this paper.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Anfernee Joan B. Ng ◽  
Kun-Hong Liu

Speech emotion recognition (SER) is an important research topic. Image features like spectrograms are one of the common ways of extracting information from speech. In the area of image recognition, a relatively novel type of network called capsule networks has shown good and promising results. This study aims to use capsule networks to encode spatial information from spectrograms and analyse its performance when paired with different loss functions. Experiments comparing the capsule network with models from previous works show that the capsule network performs better than them.


2020 ◽  
Vol 148 ◽  
Author(s):  
Xuedi Ma ◽  
Michael Ng ◽  
Shuang Xu ◽  
Zhouming Xu ◽  
Hui Qiu ◽  
...  

Abstract This study aimed to identify clinical features for prognosing mortality risk using machine-learning methods in patients with coronavirus disease 2019 (COVID-19). A retrospective study of the inpatients with COVID-19 admitted from 15 January to 15 March 2020 in Wuhan is reported. The data of symptoms, comorbidity, demographic, vital sign, CT scans results and laboratory test results on admission were collected. Machine-learning methods (Random Forest and XGboost) were used to rank clinical features for mortality risk. Multivariate logistic regression models were applied to identify clinical features with statistical significance. The predictors of mortality were lactate dehydrogenase (LDH), C-reactive protein (CRP) and age based on 500 bootstrapped samples. A multivariate logistic regression model was formed to predict mortality 292 in-sample patients with area under the receiver operating characteristics (AUROC) of 0.9521, which was better than CURB-65 (AUROC of 0.8501) and the machine-learning-based model (AUROC of 0.4530). An out-sample data set of 13 patients was further tested to show our model (AUROC of 0.6061) was also better than CURB-65 (AUROC of 0.4608) and the machine-learning-based model (AUROC of 0.2292). LDH, CRP and age can be used to identify severe patients with COVID-19 on hospital admission.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Chris Bauer ◽  
Ralf Herwig ◽  
Matthias Lienhard ◽  
Paul Prasse ◽  
Tobias Scheffer ◽  
...  

Abstract Background There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. Methods In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data. Results We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap. Conclusions Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.


1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.


2014 ◽  
Vol 602-605 ◽  
pp. 3570-3574
Author(s):  
Zhen Hua Luo ◽  
Fen Jiang

In the industrial manufacturing process, most kinds of surfaces are processed by planar materials, but undevelopable surfaces are difficult develop to the plane. The approximation algorithms to develop a undevelopable surface is an important research topic in Computer Aided Geometric Design (CAGD). In this paper, we propose a new approximation algorithms based optimization algorithm. We guarantee the deformation vector make the minimum changes during the developing process. In the paper, some numerical example are given and the can illustrate the our method is effective.


Sign in / Sign up

Export Citation Format

Share Document