AFE-MERT: imbalanced text classification with abstract feature extraction

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.

Download Full-text

Applied-Information Technology with Distributed Text Feature Extraction Method Based on MapReduce

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1046.444 ◽

2014 ◽

Vol 1046 ◽

pp. 444-448 ◽

Cited By ~ 1

Author(s):

Lu Chen ◽

Tao Zhang ◽

Yuan Yuan Ma ◽

Cheng Zhou

Keyword(s):

Information Technology ◽

Feature Extraction ◽

Text Classification ◽

Extraction Method ◽

Text Processing ◽

Rapid Development ◽

Internet Technology ◽

Feature Extraction Method ◽

Computing Model ◽

Text Feature

With the rapid development of Internet technology and information technology, the emergence of a large number of document data, text classification techniques for handling massive amounts of data is becoming increasingly important. This paper presents a distributed text feature extraction method based on distributed computing model—MapReduce. In the process of mass text processing, solve the problem of processing text size limit and inadequate performance, provide the research of text feature extraction method a new way of thinking.

Download Full-text

Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Modern Physics Letters B ◽

10.1142/s0217984921501517 ◽

2021 ◽

pp. 2150151

Author(s):

Dasong Sun

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Experimental Results ◽

The Other ◽

Central Feature ◽

Number Of Clusters ◽

Average Linkage ◽

Text Feature

By clustering feature words, we can not only simplify the dimension of feature subsets, but also eliminate the redundancy of the feature. However, for a feature set with very large dimensions, the traditional [Formula: see text]-medoids algorithm is difficult to accurately estimate the value of [Formula: see text]. Moreover, the clustering results of the average linkage (AL) algorithm cannot be divided again, and the AL algorithm cannot be directly used for text classification. In order to overcome the limitations of AL and [Formula: see text]-medoids, in this paper, we combine the two algorithms together so as to be mutually complementary to each other. In particular, in order to meet the purpose of text classification, we improve the AL algorithm and propose the [Formula: see text] testing statistics to obtain the approximate number of clusters. Finally, the central feature words are preserved, and the other feature words are deleted. The experimental results show that the new algorithm largely eliminates the redundancy of the feature. Compared with the traditional TF-IDF algorithms, the performance of the text classification of the new algorithm is improved.

Download Full-text

Introduction to Text Classification: Impact of Stemming and Comparing TF-IDF and Count Vectorization as Feature Extraction Technique

10.1007/978-3-030-85521-5_19 ◽

2021 ◽

pp. 289-300

Author(s):

André Wendland ◽

Marco Zenere ◽

Jörg Niemann

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Extraction Technique ◽

Feature Extraction Technique

Download Full-text

Text Classification Feature Extraction Method Based on Deep Learning for Unbalanced Data Sets

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-67871-5_29 ◽

2021 ◽

pp. 320-331

Author(s):

Li Lin ◽

Shu-xin Guo

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Text Classification ◽

Extraction Method ◽

Unbalanced Data ◽

Data Sets ◽

Feature Extraction Method ◽

Classification Feature

Download Full-text

Arabic Stemming Techniques as Feature Extraction Applied in Arabic Text Classification

Lecture Notes in Networks and Systems - Advanced Information Technology, Services and Systems ◽

10.1007/978-3-319-69137-4_31 ◽

2017 ◽

pp. 349-361

Author(s):

Samir Boukil ◽

Fatiha El Adnani ◽

Abd Elmajid El Moutaouakkil ◽

Loubna Cherrat ◽

Mostafa Ezziyyani

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Arabic Text ◽

Arabic Text Classification ◽

Arabic Stemming

Download Full-text

Composite Feature Extraction and Selection for Text Classification

IEEE Access ◽

10.1109/access.2019.2904602 ◽

2019 ◽

Vol 7 ◽

pp. 35208-35219 ◽

Cited By ~ 2

Author(s):

Chuan Wan ◽

Yuling Wang ◽

Yaoze Liu ◽

Jinchao Ji ◽

Guozhong Feng

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Feature Extraction And Selection ◽

Composite Feature ◽

Selection For

Download Full-text

Feature extraction and performance measure of requirement engineering (RE) document using text classification technique

2018 4th International Conference on Recent Advances in Information Technology (RAIT) ◽

10.1109/rait.2018.8389074 ◽

2018 ◽

Cited By ~ 1

Author(s):

L. P. Saikia ◽

Shilpi Singh

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Requirement Engineering ◽

Performance Measure ◽

Classification Technique ◽

And Performance

Download Full-text

A Text Classification Method with an Effective Feature Extraction Based on Category Analysis

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2009.304 ◽

2009 ◽

Cited By ~ 2

Author(s):

Yun Li ◽

Yan Sheng ◽

Luan Luan ◽

Ling Chen

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Classification Method

Download Full-text

Comparison and Improvements of Feature Extraction Methods for Text Categorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.599-601.1824 ◽

2014 ◽

Vol 599-601 ◽

pp. 1824-1828

Author(s):

Juan Wang ◽

Zhi Xun Zhang ◽

Yong Dong Wang

Keyword(s):

Feature Extraction ◽

Mutual Information ◽

Text Classification ◽

Text Categorization ◽

Information Gain ◽

Extraction Methods ◽

Improved Method ◽

Document Frequency ◽

Text Feature

Feature extraction is a key point of text categorization[1]. The accuracy of extraction will directly affect the accuracy of text classification. This paper introduces and compares 4 commonly used methods of text feature extraction: IG (Information gain), MI (Mutual information), CHI (statistics), DF (Document frequency), and proposes an improved method based on the method of CHI. Experiment result shows that the proposed method can improve the accuracy of text categorization.

Download Full-text