Data Mining of Text Documents

ANALYSIS OF THESIS ADVISING PATTERN IN MASTER PROGRAM ON FACULTY OF AGRICULTURE IPB USING DATA MINING APPROACH

Edulib ◽

10.17509/edulib.v8i2.12321 ◽

2018 ◽

Vol 8 (2) ◽

pp. 194

Author(s):

Lilis Syarifah ◽

Imas Sukaesih Sitanggang ◽

Pudji Muljono

Keyword(s):

Data Mining ◽

Upland Rice ◽

Future Research ◽

Rule Mining ◽

The comparative study of text documents clustering algorithms

Environment Conservation Journal ◽

10.36953/ecj.2015.se1614 ◽

2015 ◽

Vol 16 (SE) ◽

pp. 133-138

Author(s):

Mohammad Eiman Jamnezhad ◽

Reza Fattahi

Keyword(s):

Data Mining ◽

Dna Analysis ◽

Clustering Algorithms ◽

Research Area ◽

Large Set ◽

Text Documents ◽

Web Documents ◽

Significant Research ◽

The Comparative Study ◽

F Measure

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.

Download Full-text

Use of text mining techniques for unsupervised organization of digital procedural acts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.83581 ◽

2018 ◽

Vol 25 (4) ◽

pp. 74

Author(s):

Alfredo Silveira Araújo Neto ◽

Marcos Negreiros

Keyword(s):

Data Mining ◽

Text Mining ◽

Text Documents ◽

Digital Format ◽

Large Databases ◽

Context Data ◽

Self Discovery ◽

The Many ◽

Many Sources ◽

And Storage

The rapid advances in technologies related to the capture and storage of data in digital format have allowed to organizations the accumulation of a volume of information extremely high, constituted a higher proportion of data in unstructured format, represented by texts. However, it is noted that the retrieval of useful information from these large repositories has been a very challenging activity. In this context, data mining is presented as a self-discovery process that acts on large databases and enables the knowledge extraction from raw text documents. Among the many sources of textual documents are electronic diaries of justice, which are intended to make public officially all the acts of the Judiciary. Despite the publication in digital form has provided improvements represented by the removal of imperfections related to divulgation at printed format, it is observed that the application of data mining methods could render more rapid analysis of its contents. In this sense, this article establishes a tool capable of automatically grouping and categorizing digital procedural acts, based on the evaluation of text mining techniques applied to groups determination activity. In addition, the strategy of defining the descriptors of the groups, that is usually conducted based on the most frequent words in the documents, was evaluated and remodeled in order to use, instead of words, the most regularly identified concepts in the texts.

Download Full-text

Sentiment Analysis and Summarization of Social Media Content using Topic Modeling

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b8102.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 209-212

Keyword(s):

Data Mining ◽

Social Media ◽

Sentiment Analysis ◽

Topic Modeling ◽

Opinion Mining ◽

Text Documents ◽

Topic Identification ◽

Key Factor ◽

Polarity Classification ◽

Using Data

Explosion of Web 2.0 had made different social media platforms like Facebook, Twitter, Blogs, etc a data hub for the task of Data Mining. Sentiment Analysis or Opinion mining is an automated process of understanding an opinion expressed by customers. By using Data mining techniques, sentiment analysis helps in determining the polarity (Positive, Negative & Neutral) of views expressed by the end user. Nowadays there are terabytes of data available related to any topic then it can be advertising, politics and Survey Companies, etc. CSAT (Customer Satisfaction) is the key factor for this survey companies. In this paper, we used topic modeling by incorporating a LDA algorithm for finding the topics related to social media. We have used datasets of 900 records for analysis. By analysis, we found three important topics from Survey/Response dataset, which are Customers, Agents & Product/Services. Results depict the CSAT score according to Positive, Negative and Neutral response. We used topic modeling which is a statistical modeling technique. Topic modeling is a technique for categorization of text documents into different topics. This approach helps in better summarization of data according to the topic identification and depiction of polarity classification of sentiments expressed.

Download Full-text

Robust Face Recognition for Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch226 ◽

2008 ◽

pp. 3621-3629

Author(s):

Brian C. Lovell ◽

Shaokang Chen

Keyword(s):

Data Mining ◽

Interactive Television ◽

Multimedia Data ◽

The Internet ◽

Data Types ◽

Text Documents ◽

Text Data ◽

Internet Backbone ◽

Large Databases ◽

Robust Face Recognition

While the technology for mining text documents in large databases could be said to be relatively mature, the same cannot be said for mining other important data types such as speech, music, images and video. Yet these forms of multimedia data are becoming increasingly prevalent on the Internet and intranets as bandwidth rapidly increases due to continuing advances in computing hardware and consumer demand. An emerging major problem is the lack of accurate and efficient tools to query these multimedia data directly, so we are usually forced to rely on available metadata, such as manual labeling. Currently the most effective way to label data to allow for searching of multimedia archives is for humans to physically review the material. This is already uneconomic or, in an increasing number of application areas, quite impossible because these data are being collected much faster than any group of humans could meaningfully label them — and the pace is accelerating, forming a veritable explosion of non-text data. Some driver applications are emerging from heightened security demands in the 21st century, post-production of digital interactive television, and the recent deployment of a planetary sensor network overlaid on the Internet backbone.

Download Full-text

Automatic Building of an Ontology from a Corpus of Text Documents Using Data Mining Tools

Journal of Applied Research and Technology ◽

10.22201/icat.16656423.2012.10.3.395 ◽

2012 ◽

Vol 10 (3) ◽

Cited By ~ 5

Author(s):

J. I. Toledo-Alvarado ◽

G. L. Martínez-Luna

Keyword(s):

Data Mining ◽

Text Documents ◽

Using Data ◽

Mining Tools

En este artículo mostramos un procedimiento para construir automáticamente una ontología a partir de un corpus de documentos de texto sin ayuda externa tal como diccionarios o tesauros. El método propuesto encuentra conceptos relevantes en forma de frases temáticas en el corpus de documentos y relaciones no jerárquicas entre ellos de manera no supervisada.

Download Full-text

Inspecting Hybrid Data Mining Approaches in Decision Support Systems for Humanities Texts Criticism

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.11.30 ◽

2021 ◽

pp. 4101-4109

Author(s):

Baraa Hasan Hadi ◽

Tareef Kamil Mustafa

Keyword(s):

Data Mining ◽

Decision Support ◽

Text Mining ◽

Language Processing ◽

Hybrid Approach ◽

Decision Support Techniques ◽

Relevant Information ◽

Delta Method ◽

Technological Advancement ◽

Text Documents

The majority of systems dealing with natural language processing (NLP) and artificial intelligence (AI) can assist in making automated and automatically-supported decisions. However, these systems may face challenges and difficulties or find it confusing to identify the required information (characterization) for eliciting a decision by extracting or summarizing relevant information from large text documents or colossal content. When obtaining these documents online, for instance from social networking or social media, these sites undergo a remarkable increase in the textual content. The main objective of the present study is to conduct a survey and show the latest developments about the implementation of text-mining techniques in humanities when summarizing and eliciting automated decisions. This process relies on technological advancement and considers (1) the automated-decision support-techniques commonly used in humanities, (2) the performance evolution and the use of the stylometric approach in text-mining, and (3) the comparisons of the results of chunking text by using different attributes in Burrows' Delta method. This study also provides an overview of the efficiency of applying some selected data-mining (DM) methods with various text-mining techniques to support the critics' decision in artistry ‒ one field of humanities. The automatic choice of criticism in this field was supported by a hybrid approach to these procedures.

Download Full-text

TEXT AND DATA MINING TECHNIQUES IN ASPECT OF KNOWLEDGE ACQUISITION FOR DECISION SUPPORT SYSTEM IN CONSTRUCTION INDUSTRY / DUOMENŲ RINKIMO METODAI STATYBOS SPRENDIMŲ PARAMOS SISTEMAI

Technological and Economic Development of Economy ◽

10.3846/tede.2010.14 ◽

2010 ◽

Vol 16 (2) ◽

pp. 219-232 ◽

Cited By ~ 25

Author(s):

Marcin Gajzler

Keyword(s):

Data Mining ◽

Decision Support ◽

Text Mining ◽

Knowledge Acquisition ◽

Construction Industry ◽

Support System ◽

Text Documents ◽

Fundamental Feature ◽

Mining Technique ◽

Text And Data Mining

This article presents the possibilities of using mining techniques in building Decision Support Systems. One of the biggest problems is the issue of gaining data and knowledge, their mutual representation and reciprocal usage. Data and knowledge make up the resources of the system and are its key link. It has been estimated that 70% to 80% of the sources available for general use are text documents. The text mining technique is defined as a process aiming to extract previously unknown information from text resources (e.g. technological cards). The fundamental feature of text mining is the ability to converse text documents in formal form, which opens up great possibilities of conducting further analysis. This article presents chosen IT tools using text mining technique, along with the elements of the text mining analysis. The main objectives are the simplification of the process of knowledge acquisition, its automation and shortening as well as the creation of ready‐made models containing knowledge. Previous tests with knowledge acquisition (surveys, questionnaires) were time‐consuming and exacting for experts. Santrauka Straipsnyje pateikiamos informacijos rinkimo metodu pritaikymo galimybės sprendimų paramos sistemoms statyboje. Daugiausia problemų sukelia informacijos gavimas, tinkamas jos atvaizdavimas ir naudojimas. Duomenys yra pagrindinis sistemos išteklius. Nustatyta, kad nuo 70 iki 80 % visu turimų bendrojo naudojimo informacijos šaltinių yra tekstiniai dokumentai. Tekstines informacijos rinkimo technika yra suprantama kaip procesas, kuriuo siekiama išgauti anksčiau nežinoma informacija iš tekstiniu dokumentu (pavyzdžiui, technologiniu kortelių). Pagrindine šios technikos savybė ‐ galimybė tekstinių dokumentų informacija pateikti formalizuota forma, tai atveria plačiu galimybių tolesnei analizei. Šiame straipsnyje pateikiamos pasirinktos IT priemonės, naudojamos tekstinei informacijai rinkti. Autoriaus tikslas ‐ su paprastinti informacijos rinkimą, ji automatizuoti ir sutrumpinti, sukurti informacija apimančius modelius. Ankstesni informacijos kaupimo metodai (apklausos, anketos) reikalavo daug ekspertų darbo ir laiko.

Download Full-text

Robust Face Recognition for Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch182 ◽

2011 ◽

pp. 965-972

Author(s):

Brain C. Lovell ◽

Shaokang Chen

Keyword(s):

Data Mining ◽

Interactive Television ◽

Multimedia Data ◽

The Internet ◽

Data Types ◽

Text Documents ◽

Text Data ◽

Internet Backbone ◽

Large Databases ◽

Robust Face Recognition

While the technology for mining text documents in large databases could be said to be relatively mature, the same cannot be said for mining other important data types such as speech, music, images and video. Yet these forms of multimedia data are becoming increasingly prevalent on the Internet and intranets as bandwidth rapidly increases due to continuing advances in computing hardware and consumer demand. An emerging major problem is the lack of accurate and efficient tools to query these multimedia data directly, so we are usually forced to rely on available metadata, such as manual labeling. Currently the most effective way to label data to allow for searching of multimedia archives is for humans to physically review the material. This is already uneconomic or, in an increasing number of application areas, quite impossible because these data are being collected much faster than any group of humans could meaningfully label them — and the pace is accelerating, forming a veritable explosion of non-text data. Some driver applications are emerging from heightened security demands in the 21st century, post-production of digital interactive television, and the recent deployment of a planetary sensor network overlaid on the Internet backbone.

Download Full-text

OCEAN: 2 1/2D Interactive Visual Data Mining of Text Documents

Tenth International Conference on Information Visualisation (IV'06) ◽

10.1109/iv.2006.78 ◽

2006 ◽

Cited By ~ 1

Author(s):

C. Jacquemin ◽

H. Folch ◽

S. Nugier

Keyword(s):

Data Mining ◽

Visual Data ◽

Visual Data Mining ◽

Text Documents ◽

Interactive Visual Data Mining

Download Full-text