A Machine Learning Based Framework for Enterprise Document Classification

Keywords can be used as attributes for mining rules or as a basis for measuring the similarity of new (unclassified) documents with existing (classified) ones. The focus is on the problem of extracting keywords from document collection in order to use them as attributes for document classification. Document classification is a hot topic in machine learning. Typical approaches extract “features,” generally words, from document, and use the feature vectors as input to a machine learning scheme that learns how to classify documents. This “bag of keywords” model neglects keyword order and contextual effects.

Download Full-text

Self-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods

Adaptive and Natural Computing Algorithms - Lecture Notes in Computer Science ◽

10.1007/978-3-642-20282-7_27 ◽

2011 ◽

pp. 260-269 ◽

Cited By ~ 7

Author(s):

Jyri Saarikoski ◽

Jorma Laurikkala ◽

Kalervo Järvelin ◽

Martti Juhola

Keyword(s):

Machine Learning ◽

Document Classification ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Application of Machine Learning for Document Classification and Processing in Adaptive Information Systems

Intelligent Algorithms in Software Engineering - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-51965-0_25 ◽

2020 ◽

pp. 291-300

Author(s):

Artem Obukhov ◽

Mikhail Krasnyanskiy

Keyword(s):

Machine Learning ◽

Information Systems ◽

Document Classification

Download Full-text

AUTOMATIC SUBJECT LABELING IN DOCUMENTS BY USING ONTOLOGY AND GRAPH DATABASES

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.292 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Document Classification ◽

Graph Database ◽

Graph Databases ◽

Text Documents ◽

Domain Specific ◽

Text Document ◽

Text Document Classification

Ontologies apply to many applications in recent years, such as information retrieval, information extraction, and text document classification. The purpose of domain-specific ontology is to enrich the identification of concept and the interrelationships. In our research, we use ontology to specify a set of generic subjects (concept) that characterizes the domain as well as their definitions and interrelationships. This paper introduces a system for labeling subjects of a text documents based on the differential layers of domain specific ontology, which contains the information and the vocabularies related to the computer domain. A document can contain several subjects such as data science, database, and machine learning. The subjects in text document classification are determined based on the differential layers of the domain specific ontology. We combine the methodologies of Natural Language Processing with domain ontology to determine the subjects in text document. In order to increase performance, we use graph database to store and access ontology. Besides, the paper focuses on evaluating our proposed algorithm with some other methods. Experimental results show that our proposed algorithm yields performance significantly

Download Full-text

Machine Learning Based Text Document Classification for E-Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e5748.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 194-201

Keyword(s):

Machine Learning ◽

Web Sites ◽

Document Classification ◽

Classification Model ◽

Classification Algorithms ◽

Testing Time ◽

Machine Learning Classification ◽

Text Document ◽

E Learning ◽

The Web

The number of e-learning websites as well as e-contents are increasing exponentially over the years and most of the time it become cumbersome for a learner to find e-content suitable for learning as the learner gets overwhelmed by the enormity of the content availability. The proposed work focus on evaluating the efficiencies of the different classification algorithm for the identification of the e-learning content based on difficulty levels. The data is collected from many e-learning web sites through web scraping. The web scraper downloads the web pages and parse to text file. The text files were made to run through many machine learning classification algorithms to find out the best classification model suitable for achieving the highest score with minimum training and testing time. This method helps to understand the performance of different text classification algorithms on e-learning contents and identifies the classifier with high accuracy for document classification.

Download Full-text

Machine Learning approach to Document Classification using Concept based Features

International Journal of Computer Applications ◽

10.5120/20864-3578 ◽

2015 ◽

Vol 118 (20) ◽

pp. 33-36

Author(s):

C. SaranyaJothi ◽

D.Thenmozhi D.Thenmozhi

Keyword(s):

Machine Learning ◽

Document Classification ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Document Classification of Accreditation Documents Using Machine Learning Algorithm

CEEHSS-19,ELBIS-19,ECBMS-19,TSETDM-19 Jan. 29-30, 2019 Cebu (Philippines) ◽

10.17758/erpub3.er01192016 ◽

2019 ◽

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Document Classification ◽

Machine Learning Algorithm

Download Full-text

A Document Classification using NLP and Recurrent Neural Network

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8087.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 632-636

Keyword(s):

Neural Network ◽

Machine Learning ◽

Recurrent Neural Network ◽

Time Complexity ◽

Learning Task ◽

Document Classification ◽

Machine Learning Algorithms ◽

Class Label ◽

Abstract Section ◽

Document Models

The classification technique is most important for supervised and semi supervised base machine learning task. Many classification algorithms has introduced already for existing systems. Class-label classification is an important machine learning task wherein one assigns a subset of candidate without label to an object. Classification of various document models based on short text, metadata, heading levels these are the existing techniques which are introduced in literature survey. Sometime whole data reading and processing might be take a much time for classification, so it increase the time complexity for entire system. We proposed a new document classification method based on deep learning using NLP and machine learning approach. In this work system has several attractive properties: it captures some metadata from entire abstract section and built the training set first. Once complete all document process, it deals with optimization algorithm. Recurrent Neural Network has used to categories the individual object according to their weights. And it provides final class label for entire test dataset. Based on the various experimental analysis system provides data classification accuracy as well as minimum time complexity than classical machine learning algorithms.

Download Full-text