Classification Analysis: Machine Learning Applied to Text

Author(s):  
Murugan Anandarajan ◽  
Chelsey Hill ◽  
Thomas Nolan
2018 ◽  
Vol 110 ◽  
pp. 206-215 ◽  
Author(s):  
Fernando López-Martínez ◽  
Aron Schwarcz.MD ◽  
Edward Rolando Núñez-Valdez ◽  
Vicente García-Díaz

Author(s):  
В’ячеслав Васильович Москаленко ◽  
Микола Олександрович Зарецький ◽  
Артем Геннадійович Коробов ◽  
Ярослав Юрійович Ковальський ◽  
Артур Фанісович Шаєхов ◽  
...  

Models and training methods for water-level classification analysis on the footage of sewage pipe inspections have been developed and investigated. The object of the research is the process of water-level recognition, considering the spatial and temporal context during the inspection of sewage pipes. The subject of the research is a model and machine learning method for water-level classification analysis on video sequences of pipe inspections under conditions of limited size and an unbalanced set of training data. A four-stage algorithm for training the classifier is proposed. At the first stage of training, training occurs with a softmax triplet loss function and a regularizing component to penalize the rounding error of the network output to a binary code. The next step is to define a binary code (reference vector) for each class according to the principles of error-correcting output codes, but considering the intraclass and interclass relations. The computed reference vector of each class is used as the target label of the sample for further training using the joint cross-entropy loss function. The last stage of machine learning involves optimizing the parameters of the decision rules based on the information criterion to account for the boundaries of deviation of the binary representation of the observations of each class from the corresponding reference vectors. As a classifier model, a combination of 2D convolutional feature extractor for each frame and temporal network to analyze inter-frame dependencies is considered. The different variants of the temporal network are compared. We consider a 1D regular convolutional network with dilated convolutions, 1D causal convolutional network with dilated convolutions, recurrent LSTM-network, recurrent GRU-network. The performance of the models is compared by the micro-averaged metric F1 computed on the test subset. The results obtained on the dataset from Ace Pipe Cleaning (Kansas City, USA) confirm the suitability of the model and training method for practical use, the obtained value of F1-metric is 0.88. The results of training by the proposed method were compared with the results obtained using the traditional method. It was shown that the proposed method provides a 9 % increase in the value of micro-averaged F1-measure.


2020 ◽  
Author(s):  
Victorien Delannée ◽  
Marc Nicklaus

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.


2020 ◽  
Author(s):  
Victorien Delannée ◽  
Marc Nicklaus

In the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we developed a new open-source format which allows to encode and decode a reaction into multi-layers machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Victorien Delannée ◽  
Marc C. Nicklaus

AbstractIn the past two decades a lot of different formats for molecules and reactions have been created. These formats were mostly developed for the purposes of identifiers, representation, classification, analysis and data exchange. A lot of efforts have been made on molecule formats but only few for reactions where the endeavors have been made mostly by companies leading to proprietary formats. Here, we present ReactionCode: a new open-source format that allows one to encode and decode a reaction into multi-layer machine readable code, which aggregates reactants and products into a condensed graph of reaction (CGR). This format is flexible and can be used in a context of reaction similarity searching and classification. It is also designed for database organization, machine learning applications and as a new transform reaction language.


Sign in / Sign up

Export Citation Format

Share Document