Evaluación de un clasificador de textos digitales basado en el contenido semántico a través de ontologías

Nowadays, the generation of information through digital text documents has increased exponentially, so there is a need to store documents in mass storage devices such as high capacity hard discs, storage servers, the cloud and others. However, the storage that is carried out lacks a thematic organization, therefore, a search for information becomes complex. Given this problem, this publication describes the development of a system that has the purpose of classifying a digital text document based on the thematic content. This system implements ontologies to achieve a better classification by taking advantage of its characteristics. The system is divided into five tasks: the first is the implementation of a word count to create a frequency vector; The second task performs a refinement on the frequency vector to eliminate the sentence connectors and prepositions; the third task orders the vector from the highest to the lowest frequency; the fourth task takes the most significant set of frequencies vector, in which the ontology of a domain is applied and the relation that the words have to determine the thematic of the document is sought; and the fifth task is to organize the documents in a folder structure based on the identified domains. The system was developed with the incremental development methodology. To validate the operation of the system, a set of tests was carried out in a controlled scenario in order to verify the correct classification of the documents.

Download Full-text

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180903112541 ◽

2020 ◽

Vol 16 (4) ◽

pp. 296-306 ◽

Cited By ~ 3

Author(s):

Laith Mohammad Abualigah ◽

Essam Said Hanandeh ◽

Ahamad Tajudin Khader ◽

Mohammed Abdallh Otair ◽

Shishir Kumar Shandilya

Keyword(s):

Optimization Technique ◽

Document Clustering ◽

Text Clustering ◽

Hill Climbing ◽

Text Documents ◽

Clustering Problem ◽

Text Document ◽

Text Information ◽

Amount Of Knowledge ◽

The Hill

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Text Document Summarization Using POS tagging for Kannada Text Documents

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence51648.2021.9377106 ◽

2021 ◽

Author(s):

Jayashree R ◽

Basavaraj S Anami ◽

Poornima B K

Keyword(s):

Text Documents ◽

Document Summarization ◽

Pos Tagging ◽

Text Document

Download Full-text

A germanium and zinc chalcogenide as an anode for a high-capacity and long cycle life lithium battery

RSC Advances ◽

10.1039/c9ra06023e ◽

2019 ◽

Vol 9 (60) ◽

pp. 35045-35049

Author(s):

Xu Chen ◽

Jian Zhou ◽

Jiarui Li ◽

Haiyan Luo ◽

Lin Mei ◽

...

Keyword(s):

Energy Storage ◽

High Performance ◽

Lithium Battery ◽

Large Scale ◽

High Capacity ◽

Lithium Ion ◽

Energy Storage Devices ◽

Storage Devices ◽

Long Cycle Life ◽

Zinc Chalcogenide

High-performance lithium ion batteries are ideal energy storage devices for both grid-scale and large-scale applications.

Download Full-text

Flexible all-solid-state fiber-shaped Ni–Fe batteries with high electrochemical performance

Journal of Materials Chemistry A ◽

10.1039/c8ta09822k ◽

2019 ◽

Vol 7 (2) ◽

pp. 520-530 ◽

Cited By ~ 31

Author(s):

Qiulong Li ◽

Qichong Zhang ◽

Chenglong Liu ◽

Juan Sun ◽

Jiabin Guo ◽

...

Keyword(s):

Energy Storage ◽

Solid State ◽

Electrochemical Performance ◽

High Capacity ◽

Core Shell ◽

Next Generation ◽

Energy Storage Devices ◽

Storage Devices

The fiber-shaped Ni–Fe battery takes advantage of high capacity of hierarchical CoP@Ni(OH)2 NWAs/CNTF core–shell heterostructure and spindle-like α-Fe2O3/CNTF electrodes to yield outstanding electrochemical performance, demonstrating great potential for next-generation portable wearable energy storage devices.

Download Full-text

Secure control protocol for universal serial bus mass storage devices

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2014.0196 ◽

2015 ◽

Vol 9 (6) ◽

pp. 321-327 ◽

Cited By ~ 4

Author(s):

Jianghong Wei ◽

Wenfen Liu ◽

Xuexian Hu

Keyword(s):

Mass Storage ◽

Universal Serial Bus ◽

Storage Devices ◽

Control Protocol ◽

Secure Control

Download Full-text

R-Opitools – An Opinion Analytical Tool for Big Digital Text Document (DTD)

The Journal of Open Source Software ◽

10.21105/joss.03605 ◽

2021 ◽

Vol 6 (64) ◽

pp. 3605

Author(s):

Monsuru Adepeju

Keyword(s):

Analytical Tool ◽

Digital Text ◽

Text Document

Download Full-text

Development of the documents comparison module for an electronic document management system

Information Technology and Nanotechnology ◽

10.18287/1613-0073-2019-2416-527-533 ◽

2019 ◽

pp. 527-533

Author(s):

M A Mikheev ◽

P Y Yakimov

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Document Management ◽

Electronic Document ◽

Text Documents ◽

Text Document ◽

Document Management System ◽

Optical Character ◽

Electronic Document Management ◽

Scanned Image

The article is devoted to solving the problem of document versions comparison in electronic document management systems. Systems-analogues were considered, the process of comparing text documents was studied. In order to recognize the text on the scanned image, the technology of optical character recognition and its implementation — Tesseract library were chosen. The Myers algorithm is applied to compare received texts. The software implementation of the text document comparison module was implemented using the solutions described above.

Download Full-text

The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.12.28 ◽

2020 ◽

pp. 3397-3407

Author(s):

Nur Syafiqah Mohd Nafis ◽

Suryanti Awang

Keyword(s):

Feature Selection ◽

Text Classification ◽

Training Dataset ◽

Recursive Feature Elimination ◽

High Dimensional ◽

Significant Feature ◽

Support Vector ◽

Svm Classifier ◽

Text Documents ◽

Text Document

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.

Download Full-text

High Capacity, Rate-Capability, and Power Delivery at High-Temperature by an Oxygen-Deficient Perovskite Oxide as Proton Insertion Anodes for Energy Storage Devices

Journal of The Electrochemical Society ◽

10.1149/1945-7111/ac131f ◽

2021 ◽

Author(s):

Aman Bhardwaj ◽

Hohan Bae ◽

In-Ho Kim ◽

Lakshya Mathur ◽

Jun-Young Park ◽

...

Keyword(s):

High Temperature ◽

Energy Storage ◽

High Capacity ◽

Rate Capability ◽

Power Delivery ◽

Perovskite Oxide ◽

Energy Storage Devices ◽

Storage Devices

Download Full-text

Assessment of Twitter Data Clusters with Cosine-Based Validation Metrics Using Hybrid Topic Models

Ingénierie des systèmes d information ◽

10.18280/isi.250606 ◽

2020 ◽

Vol 25 (6) ◽

pp. 755-769

Author(s):

Noorullah R. Mohammed ◽

Moulana Mohammed

Keyword(s):

Data Clustering ◽

Topic Models ◽

Cluster Validity ◽

Text Documents ◽

Text Data ◽

Validity Assessment ◽

Text Document ◽

Cluster Validity Indices ◽

Validity Indices ◽

Data Clusters

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.

Download Full-text