Development of a multilingual text mining approach for knowledge discovery in patents

Author(s):  
Chung-Hong Lee ◽  
Hsin-Chang Yang ◽  
Yi-Ju Li
2012 ◽  
Vol 35 (1) ◽  
pp. 87-109 ◽  
Author(s):  
César de Pablo-Sánchez ◽  
Isabel Segura-Bedmar ◽  
Paloma Martínez ◽  
Ana Iglesias-Maqueda

PLoS ONE ◽  
2012 ◽  
Vol 7 (4) ◽  
pp. e33427 ◽  
Author(s):  
Anna Korhonen ◽  
Diarmuid Ó Séaghdha ◽  
Ilona Silins ◽  
Lin Sun ◽  
Johan Högberg ◽  
...  

Author(s):  
Hércules Antonio do Prado ◽  
José Palazzo Moreira de Oliveira ◽  
Edilson Ferneda ◽  
Leandro Krug Wives ◽  
Edilberto Magalhães Silva ◽  
...  

Information about the external environment and organizational processes are among the most worthwhile input for business intelligence (BI). Nowadays, companies have plenty of information in structured or textual forms, either from external monitoring or from the corporative systems. In the last years, the structured part of this information stock has been massively explored by means of data-mining (DM) techniques (Wang, 2003), generating models that enable the analysts to gain insights on the solutions for organizational problems. On the text-mining (TM) side, the rhythm of new applications development did not go so fast. In an informal poll carried out in 2002 (Kdnuggets), just 4% of the knowledge-discovery-from-databases (KDD) practitioners were applying TM techniques. This fact is as intriguing as surprising if one considers that 80% of all information available in an organization comes in textual form (Tan, 1999).


2019 ◽  
Vol 53 (3) ◽  
pp. 333-372 ◽  
Author(s):  
Marcio Pereira Basilio ◽  
Valdecy Pereira ◽  
Gabrielle Brum

Purpose The purpose of this paper is to develop a methodology for knowledge discovery in emergency response service databases based on police occurrence reports, generating information to help law enforcement agencies plan actions to investigate and combat criminal activities. Design/methodology/approach The developed model employs a methodology for knowledge discovery involving text mining techniques and uses latent Dirichlet allocation (LDA) with collapsed Gibbs sampling to obtain topics related to crime. Findings The method used in this study enabled identification of the most common crimes that occurred in the period from 1 January to 31 December of 2016. An analysis of the identified topics reaffirmed that crimes do not occur in a linear manner in a given locality. In this study, 40 per cent of the crimes identified in integrated public safety area 5, or AISP 5 (the historic centre of the city of RJ), had no correlation with AISP 19 (Copacabana – RJ), and 33 per cent of the crimes in AISP 19 were not identified in AISP 5. Research limitations/implications The collected data represent the social dynamics of neighbourhoods in the central and southern zones of the city of Rio de Janeiro during the specific period from January 2013 to December 2016. This limitation implies that the results cannot be generalised to areas with different characteristics. Practical implications The developed methodology contributes in a complementary manner to the identification of criminal practices and their characteristics based on police occurrence reports stored in emergency response databases. The generated knowledge enables law enforcement experts to assess, reformulate and construct differentiated strategies for combating crimes in a given locality. Social implications The production of knowledge from the emergency service database contributes to the government integrating information with other databases, thus enabling the improvement of strategies to combat local crime. The proposed model contributes to research on big data, on the innovation aspect and on decision support, for it breaks with a paradigm of analysis of criminal information. Originality/value The originality of the study lies in the integration of text mining techniques and LDA to detect crimes in a given locality on the basis of the criminal occurrence reports stored in emergency response service databases.


2018 ◽  
Vol 22 (7) ◽  
pp. 1471-1488 ◽  
Author(s):  
Antonio Usai ◽  
Marco Pironti ◽  
Monika Mital ◽  
Chiraz Aouina Mejri

Purpose The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics. Design/methodology/approach This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1). Findings The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment. Originality/value This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.


Author(s):  
Peter A. Chew

The principles of text mining are fundamental to technology in everyday use. The world wide web (WWW) has in many senses driven research in text mining, and with the growth of the WWW, applications of text mining (like search engines) have by now become commonplace. In a way that was not true even less than a decade ago, it is taken for granted that the ‘needle in the haystack’ can quickly be found among large volumes of text. In most cases, however, users still expect search engines to return results in the same language as that of the query, perhaps the language best understood by the user, or the language in which text is most likely to be available. The distribution of languages on the WWW does not match the distribution of languages spoken in general by the world’s population. For example, while English is spoken by under 10% of the world’s population (Gordon 2005), it is still predominant on the WWW, accounting for perhaps two-thirds of documents. There are variety of possible reasons for this disparity, including technological inequities between different parts of the world and the fact that the WWW had its genesis in an English-speaking country. Whatever the cause for the dominance of English, the fact that two-thirds of the WWW is in one language is, in all likelihood, a major reason that the concept of multilingual text mining is still relatively new. Until recently, there simply has not been a significant and widespread need for multilingual text mining. A number of recent developments have begun to change the situation, however. Perhaps these developments can be grouped under the general rubric of ‘globalization’. They include the increasing adoption, use, and popularization of the WWW in non-Englishspeaking societies; the trend towards political integration of diverse linguistic communities (highly evident, for example, in the European Union); and a growing interest in understanding social, technological and political developments in other parts of the world. All these developments contribute to a greater demand for multilingual text processing – essentially, methods for handling, managing, and comparing documents in multiple languages, some of which may not even be known to the end user.


Sign in / Sign up

Export Citation Format

Share Document