Automatic classification of older electronic texts into the Universal Decimal Classification–UDC

PurposeThe purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.Design/methodology/approachThe general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.FindingsResults suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.Research limitations/implicationsThe main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.Practical implicationsThe classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.Social implicationsThe proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.Originality/valueThese findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Download Full-text

Machine Learning Classification of Spinal Lesions: Compared Accuracy of Texture Parameters Extracted by Different Software

10.1055/s-0039-1692578 ◽

2019 ◽

Author(s):

V. Chianca ◽

D. Albano ◽

R. Cuocolo ◽

C. Messina ◽

S. Gitto ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Classification ◽

Spinal Lesions ◽

Texture Parameters

Download Full-text

Machine Learning Classification of Low-grade and High-grade Chondrosarcomas Based on MRI-based Texture Analysis

10.1055/s-0039-1692575 ◽

2019 ◽

Author(s):

S. Gitto ◽

D. Albano ◽

V. Chianca ◽

R. Cuocolo ◽

L. Ugga ◽

...

Keyword(s):

Machine Learning ◽

Texture Analysis ◽

Low Grade ◽

High Grade ◽

Machine Learning Classification

Download Full-text

Multiclass machine learning classification of functional brain images for Parkinson's disease stage prediction

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11480 ◽

2020 ◽

Vol 13 (5) ◽

pp. 508-523 ◽

Cited By ~ 1

Author(s):

Guan‐Hua Huang ◽

Chih‐Hsuan Lin ◽

Yu‐Ren Cai ◽

Tai‐Been Chen ◽

Shih‐Yen Hsu ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Disease Stage ◽

Brain Images ◽

Functional Brain ◽

Machine Learning Classification

Download Full-text

Handcrafted MRI radiomics and machine learning: Classification of indeterminate solid adrenal lesions

Magnetic Resonance Imaging ◽

10.1016/j.mri.2021.03.009 ◽

2021 ◽

Vol 79 ◽

pp. 52-58

Author(s):

Arnaldo Stanzione ◽

Renato Cuocolo ◽

Francesco Verde ◽

Roberta Galatola ◽

Valeria Romeo ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Classification

Download Full-text

Using Machine Learning for Quantum Annealing Accuracy Prediction

Algorithms ◽

10.3390/a14060187 ◽

2021 ◽

Vol 14 (6) ◽

pp. 187

Author(s):

Aaron Barbosa ◽

Elijah Pelofske ◽

Georg Hahn ◽

Hristo N. Djidjev

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Classification Model ◽

Maximum Clique Problem ◽

Problem Instance ◽

Np Hard ◽

Machine Learning Classification ◽

Hard Problems ◽

Problem Instances ◽

D Wave

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.

Download Full-text

Harnessing Machine Learning for Classifying Economic Damage Trends in Transportation Infrastructure Projects

Sustainability ◽

10.3390/su13116376 ◽

2021 ◽

Vol 13 (11) ◽

pp. 6376

Author(s):

Junseo Bae ◽

Sang-Guk Yum ◽

Ji-Myong Kim

Keyword(s):

Machine Learning ◽

Financial Management ◽

Construction Projects ◽

Transportation Infrastructure ◽

Classification Model ◽

Insurance Companies ◽

Economic Damage ◽

Infrastructure Projects ◽

Machine Learning Classification ◽

Financial Losses

Given the highly visible nature, transportation infrastructure construction projects are often exposed to numerous unexpected events, compared to other types of construction projects. Despite the importance of predicting financial losses caused by risk, it is still difficult to determine which risk factors are generally critical and when these risks tend to occur, without benchmarkable references. Most of existing methods are prediction-focused, project type-specific, while ignoring the timing aspect of risk. This study filled these knowledge gaps by developing a neural network-driven machine-learning classification model that can categorize causes of financial losses depending on insurance claim payout proportions and risk occurrence timing, drawing on 625 transportation infrastructure construction projects including bridges, roads, and tunnels. The developed network model showed acceptable classification accuracy of 74.1%, 69.4%, and 71.8% in training, cross-validation, and test sets, respectively. This study is the first of its kind by providing benchmarkable classification references of economic damage trends in transportation infrastructure projects. The proposed holistic approach will help construction practitioners consider the uncertainty of project management and the potential impact of natural hazards proactively, with the risk occurrence timing trends. This study will also assist insurance companies with developing sustainable financial management plans for transportation infrastructure projects.

Download Full-text