scholarly journals English poems categorization using text mining and rough set theory

2020 ◽  
Vol 9 (4) ◽  
pp. 1701-1710
Author(s):  
Saif Ali Alsaidi ◽  
Ahmed T. Sadeq ◽  
Hasanen S. Abdullah

In recent years, Text Mining wasan important topic because of the growth of digital text data from many sources such as government document, Email, Social Media, Website, etc. The English poemsare one of the text data to categorization English Poems will use Text categorization, Text categorization is a method in which classify documents into one or more categories that were predefined the category based on the text content in a document .In this paper we will solve the problem of how to categorize the English poem into one of the English Poems categorizations by using text mining technique and Machine learning algorithm, Our data set consist of seven categorizations for poems the data set is divided into two-part training (learning)and testing data. In the proposed model we apply the text preprocessing for the documents file to reduce the number of feature and reduce dimensionality the preprocessing process converts the text poem to features and remove the irrelevant feature by using text mining process (tokenize,remove stop word and stemming), to reduce the feature vector of the remaining feature we usetwo methods for feature selection and use Rough set theory as machine learning algorithm to perform the categorization, and we get 88% success classification of the proposed model.

2011 ◽  
Vol 271-273 ◽  
pp. 1239-1242
Author(s):  
Shao Jun Chen

The most important issue for online courses is to provide learners with high quality satisfacion. In order to resolve the question and evaluating course satisfaction , rough set theory is proposed in this article, by which we reduce 10 attributes to 5 and get the index of value assessment.As a result, teachers can make an adjustment to achieve better effect in teaching by taking advantage of the method.The proposed model can be applied to not only a network environment but also remote educational environment.


2012 ◽  
Vol 9 (3) ◽  
pp. 1-17 ◽  
Author(s):  
D. Calvo-Dmgz ◽  
J. F. Gálvez ◽  
D. Glez-Peña ◽  
S. Gómez-Meire ◽  
F. Fdez-Riverola

Summary DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge, provided as gene sets, into the classication process by means of Variable Precision Rough Set Theory (VPRS). The proposed model is able to highlight which part of the provided biological knowledge has been important for classification. This paper presents a novel model for microarray data classification which is able to incorporate prior biological knowledge in the form of gene sets. Based on this knowledge, we transform the input microarray data into supergenes, and then we apply rough set theory to select the most promising supergenes and to derive a set of easy interpretable classification rules. The proposed model is evaluated over three breast cancer microarrays datasets obtaining successful results compared to classical classification techniques. The experimental results shows that there are not significat differences between our model and classical techniques but it is able to provide a biological-interpretable explanation of how it classifies new samples.


2018 ◽  
Vol 7 (2) ◽  
pp. 75-84 ◽  
Author(s):  
Shivam Shreevastava ◽  
Anoop Kumar Tiwari ◽  
Tanmoy Som

Feature selection is one of the widely used pre-processing techniques to deal with large data sets. In this context, rough set theory has been successfully implemented for feature selection of discrete data set but in case of continuous data set it requires discretization, which may cause information loss. Fuzzy rough set theory approaches have also been used successfully to resolve this issue as it can handle continuous data directly. Moreover, almost all feature selection techniques are used to handle homogeneous data set. In this article, the center of attraction is on heterogeneous feature subset reduction. A novel intuitionistic fuzzy neighborhood models have been proposed by combining intuitionistic fuzzy sets and neighborhood rough set models by taking an appropriate pair of lower and upper approximations and generalize it for feature selection, supported with theory and its validation. An appropriate algorithm along with application to a data set has been added.


Extracting knowledge through the machine learning techniques in general lacks in its predictions the level of perfection with minimal error or accuracy. Recently, researchers have been enjoying the fruits of Rough Set Theory (RST) to uncover the hidden patterns with its simplicity and expressive power. In RST mainly the issue of attribute reduction is tackled through the notion of ‘reducts’ using lower and upper approximations of rough sets based on a given information table with conditional and decision attributes. Hence, while researchers go for dimension reduction they propose many methods among which RST approach shown to be simple and efficient for text mining tasks. The area of text mining has focused on patterns based on text files or corpus, initially preprocessed to identify and remove irrelevant and replicated words without inducing any information loss for the classifying models later generated and tested. In this current work, this hypothesis are taken as core and tested on feedbacks for elearning courses using RST’s attribution reduction and generating distinct models of n-grams and finally the results are presented for selecting final efficient model


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Hengrong Ju ◽  
Huili Dou ◽  
Yong Qi ◽  
Hualong Yu ◽  
Dongjun Yu ◽  
...  

Decision-theoretic rough set is a quite useful rough set by introducing the decision cost into probabilistic approximations of the target. However, Yao’s decision-theoretic rough set is based on the classical indiscernibility relation; such a relation may be too strict in many applications. To solve this problem, aδ-cut decision-theoretic rough set is proposed, which is based on theδ-cut quantitative indiscernibility relation. Furthermore, with respect to criterions of decision-monotonicity and cost decreasing, two different algorithms are designed to compute reducts, respectively. The comparisons between these two algorithms show us the following: (1) with respect to the original data set, the reducts based on decision-monotonicity criterion can generate more rules supported by the lower approximation region and less rules supported by the boundary region, and it follows that the uncertainty which comes from boundary region can be decreased; (2) with respect to the reducts based on decision-monotonicity criterion, the reducts based on cost minimum criterion can obtain the lowest decision costs and the largest approximation qualities. This study suggests potential application areas and new research trends concerning rough set theory.


Author(s):  
TAGHI M. KHOSHGOFTAAR ◽  
LOFTON A. BULLARD ◽  
KEHAN GAO

Finding techniques to reduce software developmental effort and produce highly reliable software is an extremely vital goal for software developers. One method that has proven quite useful is the application of software metrics-based classification models. Classification models can be constructed to identify faulty components in a software system with high accuracy. Significant research has been dedicated towards developing methods for improving the quality of software metrics-based classification models. It has been shown in several studies that the accuracy of these models improves when irrelevant attributes are identified and eliminated from the training data set. This study presents a rough set theory approach, based on classical set theory, for identifying and eliminating irrelevant attributes from a training data set. Rough set theory is used to find small groups of attributes, determined by the relationships that exist between the objects in a data set, with comparable discernibility as larger sets of attributes. This allows for the development of simpler classification models that are easy for analyst to understand and explain to others. We built case-based reasoning models in order to evaluate their classification performance on the smaller subsets of attributes selected using rough set theory. The empirical studies demonstrated that by applying a rough set approach to find small subsets of attributes we can build case-based reasoning models with an accuracy comparable to, and in some cases better than, a case-based reasoning model built with a complete set of attributes.


Sign in / Sign up

Export Citation Format

Share Document