Stopping Active Learning Based on Predicted Change of F Measure for Text Classification

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.

Download Full-text

Deep Active Learning for Text Classification

Proceedings of the 2nd International Conference on Vision, Image and Signal Processing - ICVISP 2018 ◽

10.1145/3271553.3271578 ◽

2018 ◽

Cited By ~ 2

Author(s):

Bang An ◽

Wenjun Wu ◽

Huimin Han

Keyword(s):

Active Learning ◽

Text Classification

Download Full-text

Les méthodes de classification non supervisées appliquées aux textes : mesure de la performance des résultats de clustering de documents

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais355 ◽

2013 ◽

Author(s):

Pascal Cuxac ◽

Jean-Charles Lamirel ◽

Maha Ghribi

Keyword(s):

Text Classification ◽

Experimental Comparison ◽

Alternative Approach ◽

Bibliographic Data ◽

F Measure

Nous présentons une approche alternative pour l'évaluation de la qualité de classifications non supervisées de textes basée sur des critères de rappel, précision et F-mesure non supervisés, exploitant les descripteurs associées aux classes. La comparaison expérimentale du comportement des critères classiques avec notre approche est effectuée sur des données bibliographiques.This paper presents an alternative approach to measuring the quality of non-supervised text classification based on the recall, precision and non-supervised F-measure criteria, using class descriptors. The experimental comparison of classical criteria behaviour to our approach is based on bibliographic data.

Download Full-text

Uncertainty-based active learning with instability estimation for text classification

ACM Transactions on Speech and Language Processing ◽

10.1145/2093153.2093154 ◽

2012 ◽

Vol 8 (4) ◽

pp. 1-21 ◽

Cited By ~ 10

Author(s):

Jingbo Zhu ◽

Matthew Ma

Keyword(s):

Active Learning ◽

Text Classification

Download Full-text

The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

2019 IEEE 13th International Conference on Semantic Computing (ICSC) ◽

10.1109/icosc.2019.8665546 ◽

2019 ◽

Cited By ~ 2

Author(s):

Garrett Beatty ◽

Ethan Kochis ◽

Michael Bloodgood

Keyword(s):

Active Learning ◽

Text Classification ◽

Unlabeled Data

Download Full-text

Reducing the Deterioration of Sentiment Analysis Results Due to the Time Impact

Information ◽

10.3390/info9080184 ◽

2018 ◽

Vol 9 (8) ◽

pp. 184 ◽

Cited By ~ 2

Author(s):

Yuliya Rubtsova

Keyword(s):

Computational Complexity ◽

Text Classification ◽

Feature Space ◽

Sentiment Classification ◽

Text Collections ◽

Word Representation ◽

F Measure ◽

Over Time

The research identifies and substantiates the problem of quality deterioration in the sentiment classification of text collections identical in composition and characteristics, but staggered over time. It is shown that the quality of sentiment classification can drop up to 15% in terms of the F-measure over a year and a half. This paper presents three different approaches to improving text classification by sentiment in continuously-updated text collections in Russian: using a weighing scheme with linear computational complexity, adding lexicons of emotional vocabulary to the feature space and distributed word representation. All methods are compared, and it is shown which method is most applicable in certain cases. Experiments comparing the methods on sufficiently representative text collections are described. It is shown that suggested approaches could reduce the deterioration of sentiment classification results for collections staggered over time.

Download Full-text