Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier

System Analysis and Design for Document Classification

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch008 ◽

2021 ◽

pp. 137-144

Keyword(s):

Missing Values ◽

System Analysis ◽

Keyword Search ◽

Text Processing ◽

Document Classification ◽

Analysis And Design ◽

Keywords And Phrases ◽

Text Collections ◽

Specific Concept ◽

Keyword List

The text-mining process starts with a keyword search in text collections. Current text processing technology allows a search technique beyond simple Boolean searches by using natural language queries. Since search engines can recognize any of thousands of keywords and phrases but not the concepts behind the text, it is necessary for researchers to construct an automatic keyword extractor to generate the “Keyword List” for each document. Later, this list can act as the knowledge base to associate unorganized documents to meaningful classes. Failures in identifying the keywords for a certain concept will result in missing values or data for that specific concept.

Dynamic Template Generation

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch001 ◽

2021 ◽

pp. 1-37

Keyword(s):

Optimization Problems ◽

Real Life ◽

Dynamic Test ◽

Test Paper ◽

Multi Objective Optimization ◽

Multi Objective ◽

Life Problems ◽

Conflicting Objectives ◽

Complex Optimization ◽

Dynamic Template

A test blueprint/test template, also known as the table of specifications, represents the structure of a test. It has been highly recommended in assessment textbook to carry out the preparation of a test with a test blueprint. This chapter focuses on modeling a dynamic test paper template using multi-objective optimization algorithm and makes use of the template in dynamic generation of examination test paper. Multi-objective optimization-based models are realistic models for many complex optimization problems. Modeling a dynamic test paper template, similar to many real-life problems, includes solving multiple conflicting objectives satisfying the template specifications.

Document Classification

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch007 ◽

2021 ◽

pp. 132-136

Keyword(s):

Machine Learning ◽

Contextual Effects ◽

Document Classification ◽

Feature Vectors ◽

Learning Scheme ◽

Document Collection

Keywords can be used as attributes for mining rules or as a basis for measuring the similarity of new (unclassified) documents with existing (classified) ones. The focus is on the problem of extracting keywords from document collection in order to use them as attributes for document classification. Document classification is a hot topic in machine learning. Typical approaches extract “features,” generally words, from document, and use the feature vectors as input to a machine learning scheme that learns how to classify documents. This “bag of keywords” model neglects keyword order and contextual effects.

Input Output for Document Classifier

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch009 ◽

2021 ◽

pp. 145-158

Keyword(s):

Input Output ◽

Maximum Weight ◽

Overlapping Classes ◽

Selection Of

The report generated displays a list of automatically generated keywords in each document. A document is allowed to have any number of keywords. As the keywords are getting generated at any pass of the loop, there is no restriction on the width of keywords. Another report is also generated to display the list of the document class. If a document finds its match with more than one class (overlapping classes), the selection of the final class for a document is done on the basis of the maximum weight of the keywords in each class.

Keyword Extraction

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch006 ◽

2021 ◽

pp. 119-131

Keyword(s):

Information Retrieval ◽

Keyword Extraction

Keywords are defined as phrases that capture the main topics discussed in a document. As they offer a brief yet precise summary of document content, they can be utilized for various applications. In an IR (information retrieval) environment, they serve as an indication of document relevance for users, as the list of keywords can quickly help to determine whether a given document is relevant to their interest. As keywords reflect a document's main topics, they can be utilized to classify documents into groups by measuring the overlap between the keywords assigned to them. Keywords are also used proactively in information retrieval (i.e., in indexing).

Question Selection in Template-Based Test Paper Models

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch003 ◽

2021 ◽

pp. 52-82

Keyword(s):

Learning Process ◽

Educational Program ◽

Evaluation System ◽

Optimization Problem ◽

Generation Process ◽

Test Paper ◽

Concurrent Optimization ◽

Unit Module ◽

Question Selection ◽

Paper Format

The success of any educational program depends on its evaluation system. Examinations are a part of learning process which acts as an element in evaluation. For the smooth conduct of examinations of various universities and academic institutions, the test paper generation process would be helpful. However, examination test paper composition is a multi-constraint concurrent optimization problem. Question selection plays a key role in test paper generation systems. Also, it is the most significant and time-consuming activity. The question selection is handled in traditional test paper generation systems by using a specified test paper format containing a listing of weightages to be allotted to each unit/module of the syllabus.

Answer Evaluation of Short Descriptive Questions

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch005 ◽

2021 ◽

pp. 104-118

Keyword(s):

Similarity Measure ◽

Assessment System ◽

Similarity Matrix ◽

Test Paper ◽

Continuous Assessment ◽

Simple Interpretation ◽

Cognitive Levels ◽

Stages Of Learning ◽

Syntactic Similarity ◽

Measure Word

Reforms in the educational system emphasize more on continuous assessment. The descriptive examination test paper when compared to objective test paper acts as a better aid in continuous assessment for testing the progress of a student under various cognitive levels at different stages of learning. Unfortunately, assessment of descriptive answers is found to be tedious and time-consuming by instructors due to the increase in number of examinations in continuous assessment system. In this chapter, an attempt has been made to address the problem of automatic evaluation of descriptive answer using vector-based similarity matrix with order-based word-to-word syntactic similarity measure. Word order similarity measure remains as one of the best measures to find the similarity between sequential words in sentences and is increasing its popularity due to its simple interpretation and easy computation.

Software Tool for Test Paper Generation

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch011 ◽

2021 ◽

pp. 170-198

Keyword(s):

Research Work ◽

Software Tool ◽

Conflict Detection ◽

Test Paper ◽

Generation System ◽

Question Selection

In this chapter, the authors discuss the features of the tool which is developed using the algorithms designed and implemented as part of the research work carried out. They have named it a test paper generation system (TPGS). At some places, they have used question paper generation system (QPGS) instead of its alias TPGS. The main modules of this tool are (1) test paper template generation, (2) question conflict detection, (3) test paper template-based question selection, (4) syllabus coverage evaluator for test paper, (5) and answer paper evaluator.

Implementation and Testing Details of Document Classification

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch010 ◽

2021 ◽

pp. 159-169

Keyword(s):

Document Classification ◽

Frequent Occurrence ◽

The Stability ◽

Moderate Length

It is trivial to achieve a recall of 100% by returning all documents in response to any query. Therefore, recall alone is not enough, but one needs to measure the number of non-relevant, for example by computing the precision. The analysis was performed for 30 documents to ensure the stability of precision and recall values. It is observed that the precision of large documents is less than a moderate length document, in the sense that some unimportant keywords get extracted. The reason for this may be attributed to the frequent occurrence and its unimportant role in the sentence.

Syllabus Coverage Evaluation in Test Paper Models

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier ◽

10.4018/978-1-7998-3772-5.ch004 ◽

2021 ◽

pp. 83-103

Keyword(s):

Learning Outcomes ◽

Teaching Methods ◽

Bloom's Taxonomy ◽

Educational Institutions ◽

Test Paper ◽

Bloom’S Taxonomy

A syllabus is a detailed instructional plan of materials, resources, teaching methods, and evaluation plans primarily designed to inform the students about the standards, requirements, and learning outcomes expected out of them in the course. It also expresses an “informal agreement” between the instructor and the students in completing the delivery of the content of the syllabus throughout the course. A syllabus also informs the coverage of contents to other educational institutions so that they can determine if it is equivalent to a similar one offered at their institutions. A modularized syllabus contains weightages assigned to different units/modules of a subject. Different criteria like Bloom's taxonomy, learning outcomes, etc. have been used for evaluating the syllabus coverage of a test paper.

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

System Analysis and Design for Document Classification

Dynamic Template Generation

Document Classification

Input Output for Document Classifier

Keyword Extraction

Question Selection in Template-Based Test Paper Models

Answer Evaluation of Short Descriptive Questions

Software Tool for Test Paper Generation

Implementation and Testing Details of Document Classification

Syllabus Coverage Evaluation in Test Paper Models

Export Citation Format

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document ClassifierLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

System Analysis and Design for Document Classification

Dynamic Template Generation

Document Classification

Input Output for Document Classifier

Keyword Extraction

Question Selection in Template-Based Test Paper Models

Answer Evaluation of Short Descriptive Questions

Software Tool for Test Paper Generation

Implementation and Testing Details of Document Classification

Syllabus Coverage Evaluation in Test Paper Models

Advances in Data Mining and Database Management - Developing a Keyword Extractor and Document Classifier
Latest Publications