Mining Text with the Prototype-Matching Method

Author(s):  
A. Durfee ◽  
A. Visa ◽  
H. Vanharanta ◽  
S. Schneberger ◽  
B. Back

Text documents are the most common means for exchanging formal knowledge among people. Text is a rich medium that can contain a vast range of information, but text can be difficult to decipher automatically. Many organizations have vast repositories of textual data but with few means of automatically mining that text. Text mining methods seek to use an understanding of natural language text to extract information relevant to user needs. This article evaluates a new text mining methodology: prototypematching for text clustering, developed by the authors’ research group. The methodology was applied to four applications: clustering documents based on their abstracts, analyzing financial data, distinguishing authorship, and evaluating multiple translation similarity. The results are discussed in terms of common business applications and possible future research.

2021 ◽  
Author(s):  
Diogo J. S. Machado ◽  
Camilla Reginatto De Pierri ◽  
Leticia Graziela Costa Santos ◽  
Fabio O. Pedrosa ◽  
Roberto Tadeu Raittz

The large amount of existing textual data justifies the development of new text mining tools. Bioinformatics tools can be brought to Text Mining, increasing the arsenal of resources. Here, we present Biotext, a package of strategies for converting natural language text into biological-like information data, providing a general protocol with standardized functions, allowing to share, encode and decode textual data for amino acid data. The package was used to encode the arbitrary information present in the headings of the biological sequences found in a BLAST survey. The protocol implemented in this study consists of 12 steps, which can be easily executed and/ or changed by the user, depending on the study area. Biotext empowers user to perform text mining using bioinformatics tools. Biotext is Freely available at https://pypi.org/project/biotext/ (Python package) and https://sourceforge.net/projects/biotext-tools/files/AMINOcode_GUI/ (Standalone tool).


2017 ◽  
Vol 13 (21) ◽  
pp. 429
Author(s):  
Nadeem Ur-Rahman

Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.


Author(s):  
Maurizio Romano ◽  
Francesco Mola ◽  
Claudio Conversano

The importance of the Word of Mouth is growing day by day in many topics. This phenomenon is evident in everyday life, e.g., the rise of influencers and social media managers. If more people positively debate specific products, then even more people are encouraged to buy them and vice versa. This effect is directly affected by the relationship between the potential customer and the reviewer. Moreover, considering the negative reporting bias is evident in how the Word of Mouth analysis is of absolute interest in many fields. We propose an algorithm to extract the sentiment from a natural language text corpus. The combined approach of Neural Networks, with high predictive power but more challenging interpretation, with more simple but informative models, allows us to quantify a sentiment with a numeric value and to predict if a sentence has a positive (negative) sentiment. The assessment of an objective quantity improves the interpretation of the results in many fields. For example, it is possible to identify crucial specific sectors that require intervention, improving the company's services whilst finding the strengths of the company himself (useful for advertising campaigns). Moreover, considering that the time information is usually available in textual data with a web origin, to analyze trends on macro/micro topics. After showing how to properly reduce the dimensionality of the textual data with a data-cleaning phase, we show how to combine: WordEmbedding, K-Means clustering, SentiWordNet, and the Threshold-based Naïve Bayes classifier. We apply this method to Booking.com and TripAdvisor.com data, analyzing the sentiment of people who discuss a particular issue, providing an example of customer satisfaction.


2020 ◽  
Vol 5 (2) ◽  
pp. 43-52
Author(s):  
Nor Anis Asma Sulaiman ◽  
◽  
Leelavathi Rajamanickam ◽  

This study is aiming to analyse the feelings expressed by the users in a text on a comment posted on social media. Text Mining and Emotion Mining can be analysed by using both technique of Natural Processing Language (NLP). Mostly on the previous study of text mining is using unsupervised technique and referring to Ekman’s Emotion Model (EEM) but it has restrained coverage of polarity shifters, negations and lack emoticon. In this study have proposed a Naïve Bayes algorithm as a tool to produce users’ emotion pattern. The most important contribution of this study is to visualize the emotion’s theory with the text sentiment based on the computational methods for classifying users’ feelings from natural language text. Then, the general system framework of extracting opinions to emotion mining has produced and capable use in any domains.


2018 ◽  
Vol 22 (4) ◽  
pp. 941-968 ◽  
Author(s):  
Theresa Schmiedel ◽  
Oliver Müller ◽  
Jan vom Brocke

Research has emphasized the limitations of qualitative and quantitative approaches to studying organizational phenomena. For example, in-depth interviews are resource-intensive, while questionnaires with closed-ended questions can only measure predefined constructs. With the recent availability of large textual data sets and increased computational power, text mining has become an attractive method that has the potential to mitigate some of these limitations. Thus, we suggest applying topic modeling, a specific text mining technique, as a new and complementary strategy of inquiry to study organizational phenomena. In particular, we outline the potentials of structural topic modeling for organizational research and provide a step-by-step tutorial on how to apply it. Our application example builds on 428,492 reviews of Fortune 500 companies from the online platform Glassdoor, on which employees can evaluate organizations. We demonstrate how structural topic models allow to inductively identify topics that matter to employees and quantify their relationship with employees’ perception of organizational culture. We discuss the advantages and limitations of topic modeling as a research method and outline how future research can apply the technique to study organizational phenomena.


Author(s):  
John Atkinson

This chapter introduces a novel evolutionary model for intelligent text mining. The model deals with issues concerning shallow text representation and processing for mining purposes in an integrated way. Its aims are to look for interesting explanatory knowledge across text documents. The approach uses Natural-Language technology and Genetic Algorithms to produce explanatory novel hidden patterns. The proposed approach involves a mixture of different techniques from evolutionary computation and other kinds of text mining methods. Accordingly, new kinds of genetic operations suitable for text mining are proposed. Some experiments and results and their assessment by human experts are discussed which indicate the plausibility of the model for effective knowledge discovery from texts. With this chapter, authors hope the readers to understand the principles, theoretical foundations, implications and challenges of a promising linguistically-motivated approach to text mining.


Author(s):  
Jonathan S. Lewis

Text mining presents an efficient, scalable method to separate signals and noise in large-scale text data, and therefore to effectively analyze open-ended survey responses as well as the tremendous amount of text that students, faculty, and staff produce through their interactions online. Traditional qualitative methods are impractical when working with these data, and text mining methods are consonant with current literature on thematic analysis. This chapter provides a tutorial for researchers new to this method, including a lengthy discussion of preprocessing tasks and knowledge extraction from both supervised and unsupervised activities, potential data sources, and the range of software (both proprietary and open-source) available to them. Examples are provided throughout the paper of text mining at work in two studies involving data collected from college students. Limitations of this method and implications for future research and policy are discussed.


2020 ◽  
Vol 13 (5) ◽  
pp. 917-925
Author(s):  
Monika Arora ◽  
Vineet Kansal

Background: E-commerce/ M-commerce has emerged as a new way of doing businesses in the present world which requires an understanding of the customer’s needs with the utmost precision and appropriateness. With the advent of technology, mobile devices have become vital tools in today’s world. In fact, smart phones have changed the way of communication. The user can access any information on a single click. Text messages have become the basic channel of communication for interaction. The use of informal text messages by the customers has created a challenge for the business segments in terms of creating a gap pertaining to the actual requirement of the customers due to the inappropriate representation of it's need by using short message service in an informal manner. Objective: The informally written text messages have become a center of attraction for researchers to analyze and normalize such textual data. In this paper, the SMS data have been analyzed for information retrieval using Soundex Phonetic algorithm and its variations. Methods: Two datasets have been considered, SMS- based FAQ of FIRE 2012 and self-generated survey dataset have been tested for evaluating the performance of the proposed Soundex Phonetic algorithm. Results: It has been observed that by applying Soundex with Inverse Edit Term Frequency, the lexical similarity between the SMS word and Natural language text has been significantly improved. The results have been shown to prove the work. Conclusion: Soundex with Inverse Edit Term Frequency Distribution algorithm is best suited among the various variations of Soundex. This algorithm normalizes the informally written text and gets the exact match from the bag of words.


2020 ◽  
pp. 22-31
Author(s):  
Vladimir Vasilyev ◽  
◽  
Alexey Vulfin ◽  
Nailya Kuchkarova ◽  
◽  
...  

Purpose: the development of automated system of software vulnerabilities analysis for information-control systems on the basis of intelligent analysis of texts written on the natural language (Text Mining). Methods: the idea of the used investigation method is based on matching the set of extracted software vulnerabilities and relevant information security threats by means of evaluating the semantic similarity metrics of their textual description with use of Text Mining methods. Practical relevance: the architecture of the automated system of software vulnerabilities analysis is developed, the application of which allows us to evaluate the level of vulnerabilities criticality and match it with the most suitable by discretion (i.e. semantically similar) threats from the Bank of information security threats of FSTEC Russia while ensuring vulnerabilities and threats. The main software modules of the system have been developed. Computational experiments were carried out to assess the effectiveness of its application. The results of comparative analysis show that application of the given system allows us to increase the credibility of evaluating the criticality degree of vulnerabilities, considerably decreasing the time for a search and matching vulnerabilities and threats.


Author(s):  
Pavel Netolický ◽  
Jonáš Petrovský ◽  
František Dařena

Each day, a lot of text data is generated. This data comes from various sources and may contain valuable information. In this article, we use text mining methods to discover if there is a connection between news articles and changes of the S&P 500 stock index. The index values and documents were divided into time windows according to the direction of the index value changes. We achieved a classification accuracy of 65–74 %.


Sign in / Sign up

Export Citation Format

Share Document