scholarly journals Exploring a knowledge-based approach to predicting NACE codes of enterprises based on web page texts

2020 ◽  
Vol 36 (3) ◽  
pp. 807-821
Author(s):  
Heidi Kühnemann ◽  
Arnout van Delden ◽  
Dick Windmeijer

Classification of enterprises by main economic activity according to NACE codes is a challenging but important task for national statistical institutes. Since manual editing is time-consuming, we investigated the automatic prediction from dedicated website texts using a knowledge-based approach. To that end, concept features were derived from a set of domain-specific keywords. Furthermore, we compared flat classification to a specific two-level hierarchy which was based on an approach used by manual editors. We limited ourselves to Naïve Bayes and Support Vector Machines models and only used texts from the main web pages. As a first step, we trained a filter model that classifies whether websites contain information about economic activity. The resulting filtered data set was subsequently used to predict 111 NACE classes. We found that using concept features did not improve the model performance compared to a model with character n-grams, i.e. non-informative features. Neither did the two-level hierarchy improve the performance relative to a flat classification. Nonetheless, prediction of the best three NACE classes clearly improved the overall prediction performance compared to a top-one prediction. We conclude that more effort is needed in order to achieve good results with a knowledge-based approach and discuss ideas for improvement.

Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5896
Author(s):  
Eddi Miller ◽  
Vladyslav Borysenko ◽  
Moritz Heusinger ◽  
Niklas Niedner ◽  
Bastian Engelmann ◽  
...  

Changeover times are an important element when evaluating the Overall Equipment Effectiveness (OEE) of a production machine. The article presents a machine learning (ML) approach that is based on an external sensor setup to automatically detect changeovers in a shopfloor environment. The door statuses, coolant flow, power consumption, and operator indoor GPS data of a milling machine were used in the ML approach. As ML methods, Decision Trees, Support Vector Machines, (Balanced) Random Forest algorithms, and Neural Networks were chosen, and their performance was compared. The best results were achieved with the Random Forest ML model (97% F1 score, 99.72% AUC score). It was also carried out that model performance is optimal when only a binary classification of a changeover phase and a production phase is considered and less subphases of the changeover process are applied.


Author(s):  
Dorian Ruiz Alonso ◽  
Claudia Zepeda Cortés ◽  
Hilda Castillo Zacatelco ◽  
José Luis Carballido Carranza

In this work, we propose the extension of a methodology for the multi-label classification of feedback according to the Hattie and Timperley feedback model, incorporating a hyperparameter tuning stage. It is analyzed whether the incorporation of the hyperparameter tuning stage prior to the execution of the algorithms support vector machines, random forest and multi-label k-nearest neighbors, improves the performance metrics of multi-label classifiers that automatically locate the feedback generated by a teacher to the activities sent by students in online courses on the Blackboard platform at the task, process, regulation, praise and other levels proposed in the feedback model by Hattie and Timperley. The grid search strategy is used to refine the hyperparameters of each algorithm. The results show that the adjustment of the hyperparameters improves the performance metrics for the data set used.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Tatdow Pansombut ◽  
Siripen Wikaisuksakul ◽  
Kittiya Khongkraphan ◽  
Aniruth Phon-on

This paper presents the recognition for WHO classification of acute lymphoblastic leukaemia (ALL) subtypes. The two ALL subtypes considered are T-lymphoblastic leukaemia (pre-T) and B-lymphoblastic leukaemia (pre-B). They exhibit various characteristics which make it difficult to distinguish between subtypes from their mature cells, lymphocytes. In a common approach, handcrafted features must be well designed for this complex domain-specific problem. With deep learning approach, handcrafted feature engineering can be eliminated because a deep learning method can automate this task through the multilayer architecture of a convolutional neural network (CNN). In this work, we implement a CNN classifier to explore the feasibility of deep learning approach to identify lymphocytes and ALL subtypes, and this approach is benchmarked against a dominant approach of support vector machines (SVMs) applying handcrafted feature engineering. Additionally, two traditional machine learning classifiers, multilayer perceptron (MLP), and random forest are also applied for the comparison. The experiments show that our CNN classifier delivers better performance to identify normal lymphocytes and pre-B cells. This shows a great potential for image classification with no requirement of multiple preprocessing steps from feature engineering.


Author(s):  
Marianne Maktabi ◽  
Hannes Köhler ◽  
Magarita Ivanova ◽  
Thomas Neumuth ◽  
Nada Rayes ◽  
...  

2011 ◽  
Vol 61 (9) ◽  
pp. 2874-2878 ◽  
Author(s):  
L. Gonzalez-Abril ◽  
F. Velasco ◽  
J.A. Ortega ◽  
L. Franco

Author(s):  
Rakesh Kumar ◽  
Avinash M. Jade ◽  
Valadi K. Jayaraman ◽  
Bhaskar D. Kulkarni

A hybrid strategy of using (i) locally linear embedding for nonlinear dimensionality reduction of high dimensional data and (ii) support vector machines for classification of the resultant features is proposed as a robust methodology for process monitoring. Illustrative examples substantiate the methodology vis-à-vis current practice.


2004 ◽  
Vol 44 (2) ◽  
pp. 499-507 ◽  
Author(s):  
Omowunmi Sadik ◽  
Walker H. Land, ◽  
Adam K. Wanekaya ◽  
Michiko Uematsu ◽  
Mark J. Embrechts ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document