A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

Author(s):  
Yun Li ◽  
Yan Sheng ◽  
Luan Luan ◽  
Ling Chen
Author(s):  
Sarmad Mahar ◽  
Sahar Zafar ◽  
Kamran Nishat

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.


2014 ◽  
Vol 1046 ◽  
pp. 444-448 ◽  
Author(s):  
Lu Chen ◽  
Tao Zhang ◽  
Yuan Yuan Ma ◽  
Cheng Zhou

With the rapid development of Internet technology and information technology, the emergence of a large number of document data, text classification techniques for handling massive amounts of data is becoming increasingly important. This paper presents a distributed text feature extraction method based on distributed computing model—MapReduce. In the process of mass text processing, solve the problem of processing text size limit and inadequate performance, provide the research of text feature extraction method a new way of thinking.


Sign in / Sign up

Export Citation Format

Share Document