scholarly journals Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports

Author(s):  
Alex Brandsen ◽  
Martin Koole

AbstractThe extraction of information from Dutch archaeological grey literature has recently been investigated by the AGNES project. AGNES aims to disclose relevant information by means of a web search engine, to enable researchers to search through excavation reports. In this paper, we focus on the multi-labelling of archaeological excavation reports with time periods and site types, and provide a manually labelled reference set to this end. We propose a series of approaches, pre-processing methods, and various modifications of the training set to address the often low quality of both texts and labels. We find that despite those issues, our proposed methods lead to promising results.

2014 ◽  
Vol 539 ◽  
pp. 181-184
Author(s):  
Wan Li Zuo ◽  
Zhi Yan Wang ◽  
Ning Ma ◽  
Hong Liang

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.


2016 ◽  
Vol 7 (1) ◽  
pp. 33-49 ◽  
Author(s):  
Suruchi Chawla

In this paper novel method is proposed using hybrid of Genetic Algorithm (GA) and Back Propagation (BP) Artificial Neural Network (ANN) for learning of classification of user queries to cluster for effective Personalized Web Search. The GA- BP ANN has been trained offline for classification of input queries and user query session profiles to a specific cluster based on clustered web query sessions. Thus during online web search, trained GA –BP ANN is used for classification of new user queries to a cluster and the selected cluster is used for web page recommendations. This process of classification and recommendations continues till search is effectively personalized to the information need of the user. Experiment was conducted on the data set of web user query sessions to evaluate the effectiveness of Personalized Web Search using GA optimized BP ANN and the results confirm the improvement in the precision of search results.


Sci ◽  
2020 ◽  
Vol 2 (2) ◽  
pp. 37
Author(s):  
Yuanyuan Ma ◽  
Ognjen Arandjelović

Ancient numismatics, that is, the study of ancient currencies (predominantly coins), is an interesting domain for the application of computer vision and machine learning, and has been receiving an increasing amount of attention in recent years. Notwithstanding the number of articles published on the topic, the variety of different methodological approaches described, and the mounting realisation that the relevant problems in the field are most challenging indeed, all research to date has entirely ignored one specific, readily accessible modality: colour. Invariably, colour is discarded and images of coins treated as being greyscale. The present article is the first one to question this decision (and indeed, it is a decision). We discuss the reasons behind the said choice, present a case why it ought to be reexamined, and in turn investigate the issue for the first time in the published literature. Specifically, we propose two new colour-based representations specifically designed with the aim of being applied to ancient coin analysis, and argue why it is sensible to employ them in the first stages of the classification process as a means of drastically reducing the initially enormous number of classes involved in type matching ancient coins (tens of thousands, just for Ancient Roman Imperial coins). Furthermore, we introduce a new data set collected with the specific aim of denomination-based categorisation of ancient coins, where we hypothesised colour could be of potential use, and evaluate the proposed representations. Lastly, we report surprisingly successful performances which goes further than confirming our hypothesis—rather, they convincingly demonstrate a much higher relevant information content carried by colour than even we expected. Thus we trust that our findings will be noted by others in the field and that more attention and further research will be devoted to the use of colour in automatic ancient coin analysis.


2020 ◽  
Vol 25 (6) ◽  
pp. 655-664
Author(s):  
Wienand A. Omta ◽  
Roy G. van Heesbeen ◽  
Ian Shen ◽  
Jacob de Nobel ◽  
Desmond Robers ◽  
...  

There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5097 ◽  
Author(s):  
David Agis ◽  
Francesc Pozo

This work presents a structural health monitoring (SHM) approach for the detection and classification of structural changes. The proposed strategy is based on t-distributed stochastic neighbor embedding (t-SNE), a nonlinear procedure that is able to represent the local structure of high-dimensional data in a low-dimensional space. The steps of the detection and classification procedure are: (i) the data collected are scaled using mean-centered group scaling (MCGS); (ii) then principal component analysis (PCA) is applied to reduce the dimensionality of the data set; (iii) t-SNE is applied to represent the scaled and reduced data as points in a plane defining as many clusters as different structural states; and (iv) the current structure to be diagnosed will be associated with a cluster or structural state based on three strategies: (a) the smallest point-centroid distance; (b) majority voting; and (c) the sum of the inverse distances. The combination of PCA and t-SNE improves the quality of the clusters related to the structural states. The method is evaluated using experimental data from an aluminum plate with four piezoelectric transducers (PZTs). Results are illustrated in frequency domain, and they manifest the high classification accuracy and the strong performance of this method.


2018 ◽  
pp. 1-8 ◽  
Author(s):  
Okyaz Eminaga ◽  
Nurettin Eminaga ◽  
Axel Semjonow ◽  
Bernhard Breil

Purpose The recognition of cystoscopic findings remains challenging for young colleagues and depends on the examiner’s skills. Computer-aided diagnosis tools using feature extraction and deep learning show promise as instruments to perform diagnostic classification. Materials and Methods Our study considered 479 patient cases that represented 44 urologic findings. Image color was linearly normalized and was equalized by applying contrast-limited adaptive histogram equalization. Because these findings can be viewed via cystoscopy from every possible angle and side, we ultimately generated images rotated in 10-degree grades and flipped them vertically or horizontally, which resulted in 18,681 images. After image preprocessing, we developed deep convolutional neural network (CNN) models (ResNet50, VGG-19, VGG-16, InceptionV3, and Xception) and evaluated these models using F1 scores. Furthermore, we proposed two CNN concepts: 90%-previous-layer filter size and harmonic-series filter size. A training set (60%), a validation set (10%), and a test set (30%) were randomly generated from the study data set. All models were trained on the training set, validated on the validation set, and evaluated on the test set. Results The Xception-based model achieved the highest F1 score (99.52%), followed by models that were based on ResNet50 (99.48%) and the harmonic-series concept (99.45%). All images with cancer lesions were correctly determined by these models. When the focus was on the images misclassified by the model with the best performance, 7.86% of images that showed bladder stones with indwelling catheter and 1.43% of images that showed bladder diverticulum were falsely classified. Conclusion The results of this study show the potential of deep learning for the diagnostic classification of cystoscopic images. Future work will focus on integration of artificial intelligence–aided cystoscopy into clinical routines and possibly expansion to other clinical endoscopy applications.


Author(s):  
Oliver Ray ◽  
Amy Conroy ◽  
Rozano Imansyah

This paper introduces a method called SUmmarisation with Majority Opinion (SUMO) that integrates and extends two prior approaches for abstractively and extractively summarising UK House of Lords cases. We show how combining two previously distinct lines of work allows us to better address the challenges resulting from this court’s unusual tradition of publishing the opinions of multiple judges with no formal statement of the reasoning (if any) agreed by a majority. We do this by applying natural language processing and machine learning, Conditional Random Fields (CRFs), to a data set we created by fusing together expert-annotated sentence labels from the HOLJ corpus of rhetorical role summary relevance with the ASMO corpus of agreement statement and majority opinion. By using CRFs and a bespoke summary generator on our enriched data set, we show a significant quantitative F1-score improvement in rhetorical role and relevance classification of 10–15% over the state-of-the-art SUM system; and we show a significant qualitative improvement in the quality of our summaries, which closely resemble gold-standard multi-judge abstracts according to a proof-of-principle user study.


Sci ◽  
2020 ◽  
Vol 2 (1) ◽  
pp. 18
Author(s):  
Yuanyuan Ma ◽  
Ognjen Arandjelović

Ancient numismatics, that is, the study of ancient currencies (predominantly coins), is an interesting domain for the application of computer vision and machine learning, and has been receiving an increasing amount of attention in recent years. Notwithstanding the number of articles published on the topic, the variety of different methodological approaches described, and the mounting realisation that the relevant problems in the field are most challenging indeed, all research to date has entirely ignored one specific, readily accessible modality: colour. Invariably, colour is discarded and images of coins treated as being greyscale. The present article is the first one to question this decision (and indeed, it is a decision). We discuss the reasons behind the said choice, present a case why it ought to be reexamined, and in turn investigate the issue for the first time in the published literature. Specifically, we propose two new colour-based representations specifically designed with the aim of being applied to ancient coin analysis, and argue why it is sensible to employ them in the first stages of the classification process as a means of drastically reducing the initially enormous number of classes involved in type matching ancient coins (tens of thousands, just for Ancient Roman Imperial coins). Furthermore, we introduce a new data set collected with the specific aim of denomination-based categorisation of ancient coins, where we hypothesised colour could be of potential use, and evaluate the proposed representations. Lastly, we report surprisingly successful performances which goes further than confirming our hypothesis—rather, they convincingly demonstrate a much higher relevant information content carried by colour than even we expected. Thus we trust that our findings will be noted by others in the field and that more attention and further research will be devoted to the use of colour in automatic ancient coin analysis.


Geophysics ◽  
2013 ◽  
Vol 78 (1) ◽  
pp. E41-E46 ◽  
Author(s):  
Laurens Beran ◽  
Barry Zelt ◽  
Leonard Pasion ◽  
Stephen Billings ◽  
Kevin Kingdon ◽  
...  

We have developed practical strategies for discriminating between buried unexploded ordnance (UXO) and metallic clutter. These methods are applicable to time-domain electromagnetic data acquired with multistatic, multicomponent sensors designed for UXO classification. Each detected target is characterized by dipole polarizabilities estimated via inversion of the observed sensor data. The polarizabilities are intrinsic target features and so are used to distinguish between UXO and clutter. We tested this processing with four data sets from recent field demonstrations, with each data set characterized by metrics of data and model quality. We then developed techniques for building a representative training data set and determined how the variable quality of estimated features affects overall classification performance. Finally, we devised a technique to optimize classification performance by adapting features during target prioritization.


Author(s):  
ROELOF K. BROUWER

The main contribution of this paper is the development of an Integer Recurrent Artificial Neural Network (IRANN) for classification of feature vectors. The network consists both of threshold units or perceptrons and of counters, which are non-threshold units with binary input and integer output. Input and output of the network consists of vectors of natural numbers that may be used to represent feature vectors. For classification purposes, representatives of sets are stored by calculating a connection matrix such that all the elements in a training set are attracted to members of the same training set. The class of its attractor then classifies an arbitrary element if the attractor is a member of one of the original training sets. The network is successfully applied to the classification of sugar diabetes data, credit application data, and the iris data set.


Sign in / Sign up

Export Citation Format

Share Document