scholarly journals Automatic classification of older electronic texts into the Universal Decimal Classification–UDC

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Matjaž Kragelj ◽  
Mirjana Kljajić Borštnar

PurposeThe purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.Design/methodology/approachThe general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.FindingsResults suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.Research limitations/implicationsThe main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.Practical implicationsThe classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.Social implicationsThe proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.Originality/valueThese findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

2020 ◽  
Vol 13 (5) ◽  
pp. 508-523 ◽  
Author(s):  
Guan‐Hua Huang ◽  
Chih‐Hsuan Lin ◽  
Yu‐Ren Cai ◽  
Tai‐Been Chen ◽  
Shih‐Yen Hsu ◽  
...  

2021 ◽  
Vol 79 ◽  
pp. 52-58
Author(s):  
Arnaldo Stanzione ◽  
Renato Cuocolo ◽  
Francesco Verde ◽  
Roberta Galatola ◽  
Valeria Romeo ◽  
...  

Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 187
Author(s):  
Aaron Barbosa ◽  
Elijah Pelofske ◽  
Georg Hahn ◽  
Hristo N. Djidjev

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.


2021 ◽  
Vol 13 (11) ◽  
pp. 6376
Author(s):  
Junseo Bae ◽  
Sang-Guk Yum ◽  
Ji-Myong Kim

Given the highly visible nature, transportation infrastructure construction projects are often exposed to numerous unexpected events, compared to other types of construction projects. Despite the importance of predicting financial losses caused by risk, it is still difficult to determine which risk factors are generally critical and when these risks tend to occur, without benchmarkable references. Most of existing methods are prediction-focused, project type-specific, while ignoring the timing aspect of risk. This study filled these knowledge gaps by developing a neural network-driven machine-learning classification model that can categorize causes of financial losses depending on insurance claim payout proportions and risk occurrence timing, drawing on 625 transportation infrastructure construction projects including bridges, roads, and tunnels. The developed network model showed acceptable classification accuracy of 74.1%, 69.4%, and 71.8% in training, cross-validation, and test sets, respectively. This study is the first of its kind by providing benchmarkable classification references of economic damage trends in transportation infrastructure projects. The proposed holistic approach will help construction practitioners consider the uncertainty of project management and the potential impact of natural hazards proactively, with the risk occurrence timing trends. This study will also assist insurance companies with developing sustainable financial management plans for transportation infrastructure projects.


Heliyon ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e06257
Author(s):  
Ennio Idrobo-Ávila ◽  
Humberto Loaiza-Correa ◽  
Rubiel Vargas-Cañas ◽  
Flavio Muñoz-Bolaños ◽  
Leon van Noorden

Sign in / Sign up

Export Citation Format

Share Document