Correlation Analysis and Text Classification of Chemical Accident Cases Based on Word Embedding

Author(s):  
Sifeng Jing ◽  
Xiwei Liu ◽  
Xiaoyan Gong ◽  
Ying Tang ◽  
Gang Xiong ◽  
...  
2020 ◽  
Author(s):  
Luiz Fernando Spillere de Souza ◽  
Alexandre Leopoldo Gonçalves

Text classification aims to extract knowledge from unstructured text patterns. The concept of word incorporation is a representation technique that allows words with similar meanings to have a similar representation, in order to incorporate reasoning characteristics about their use and meaning. The aim of this article is to analyze the work already published on the use of embedded words applied to the classification of texts, to propose a practical application that demonstrates its effectiveness. This study contributes to proving the effectiveness of the use of word incorporation applied to text classification, having reached an accuracy rate of around 73%.


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


2018 ◽  
Vol 21 (2) ◽  
pp. 125-137
Author(s):  
Jolanta Stasiak ◽  
Marcin Koba ◽  
Marcin Gackowski ◽  
Tomasz Baczek

Aim and Objective: In this study, chemometric methods as correlation analysis, cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA) have been used to reduce the number of chromatographic parameters (logk/logkw) and various (e.g., 0D, 1D, 2D, 3D) structural descriptors for three different groups of drugs, such as 12 analgesic drugs, 11 cardiovascular drugs and 36 “other” compounds and especially to choose the most important data of them. Material and Methods: All chemometric analyses have been carried out, graphically presented and also discussed for each group of drugs. At first, compounds’ structural and chromatographic parameters were correlated. The best results of correlation analysis were as follows: correlation coefficients like R = 0.93, R = 0.88, R = 0.91 for cardiac medications, analgesic drugs, and 36 “other” compounds, respectively. Next, part of molecular and HPLC experimental data from each group of drugs were submitted to FA/PCA and CA techniques. Results: Almost all results obtained by FA or PCA, and total data variance, from all analyzed parameters (experimental and calculated) were explained by first two/three factors: 84.28%, 76.38 %, 69.71% for cardiovascular drugs, for analgesic drugs and for 36 “other” compounds, respectively. Compounds clustering by CA method had similar characteristic as those obtained by FA/PCA. In our paper, statistical classification of mentioned drugs performed has been widely characterized and discussed in case of their molecular structure and pharmacological activity. Conclusion: Proposed QSAR strategy of reduced number of parameters could be useful starting point for further statistical analysis as well as support for designing new drugs and predicting their possible activity.


2019 ◽  
Vol 54 (2) ◽  
pp. 279-286
Author(s):  
Sung-Hwa Lee ◽  
You-Soon Chang

Author(s):  
Ravi Kauthale

Abstract: The aim here is to explore the methods to automate the labelling of the information that is present in bug trackers and client support systems. This is majorly based on the classification of the content depending on some criteria e.g., priority or product area. Labelling of the tickets is important as it helps in effective and efficient handling of the ticket and help is quicker and comprehensive resolution of the tickets. The main goal of the project is to analyze the existing methodologies used for automated labelling and then use a newer approach and compare the results. The existing methodologies are the ones which are based of the neural networks and without neural networks. In this project, a newer approach based on the recurrent neural networks which are based on the hierarchical attention paradigm will be used. Keywords: Automate Labeling, Recurrent Neural Networks, Hierarchical Attention, Multi-class Text Classification, GRU


Sign in / Sign up

Export Citation Format

Share Document