Development of a multilingual text mining approach for knowledge discovery in patents

Information about the external environment and organizational processes are among the most worthwhile input for business intelligence (BI). Nowadays, companies have plenty of information in structured or textual forms, either from external monitoring or from the corporative systems. In the last years, the structured part of this information stock has been massively explored by means of data-mining (DM) techniques (Wang, 2003), generating models that enable the analysts to gain insights on the solutions for organizational problems. On the text-mining (TM) side, the rhythm of new applications development did not go so fast. In an informal poll carried out in 2002 (Kdnuggets), just 4% of the knowledge-discovery-from-databases (KDD) practitioners were applying TM techniques. This fact is as intriguing as surprising if one considers that 80% of all information available in an organization comes in textual form (Tan, 1999).

Download Full-text

Identification of operational demand in law enforcement agencies

Data Technologies and Applications ◽

10.1108/dta-12-2018-0109 ◽

2019 ◽

Vol 53 (3) ◽

pp. 333-372 ◽

Cited By ~ 1

Author(s):

Marcio Pereira Basilio ◽

Valdecy Pereira ◽

Gabrielle Brum

Keyword(s):

Law Enforcement ◽

Text Mining ◽

Knowledge Discovery ◽

Emergency Response ◽

Latent Dirichlet Allocation ◽

Social Dynamics ◽

Law Enforcement Agencies ◽

Content Type ◽

The Government ◽

The City

Purpose The purpose of this paper is to develop a methodology for knowledge discovery in emergency response service databases based on police occurrence reports, generating information to help law enforcement agencies plan actions to investigate and combat criminal activities. Design/methodology/approach The developed model employs a methodology for knowledge discovery involving text mining techniques and uses latent Dirichlet allocation (LDA) with collapsed Gibbs sampling to obtain topics related to crime. Findings The method used in this study enabled identification of the most common crimes that occurred in the period from 1 January to 31 December of 2016. An analysis of the identified topics reaffirmed that crimes do not occur in a linear manner in a given locality. In this study, 40 per cent of the crimes identified in integrated public safety area 5, or AISP 5 (the historic centre of the city of RJ), had no correlation with AISP 19 (Copacabana – RJ), and 33 per cent of the crimes in AISP 19 were not identified in AISP 5. Research limitations/implications The collected data represent the social dynamics of neighbourhoods in the central and southern zones of the city of Rio de Janeiro during the specific period from January 2013 to December 2016. This limitation implies that the results cannot be generalised to areas with different characteristics. Practical implications The developed methodology contributes in a complementary manner to the identification of criminal practices and their characteristics based on police occurrence reports stored in emergency response databases. The generated knowledge enables law enforcement experts to assess, reformulate and construct differentiated strategies for combating crimes in a given locality. Social implications The production of knowledge from the emergency service database contributes to the government integrating information with other databases, thus enabling the improvement of strategies to combat local crime. The proposed model contributes to research on big data, on the innovation aspect and on decision support, for it breaks with a paradigm of analysis of criminal information. Originality/value The originality of the study lies in the integration of text mining techniques and LDA to detect crimes in a given locality on the basis of the criminal occurrence reports stored in emergency response service databases.

Download Full-text

Knowledge discovery out of text data: a systematic review via text mining

Journal of Knowledge Management ◽

10.1108/jkm-11-2017-0517 ◽

2018 ◽

Vol 22 (7) ◽

pp. 1471-1488 ◽

Cited By ~ 11

Author(s):

Antonio Usai ◽

Marco Pironti ◽

Monika Mital ◽

Chiraz Aouina Mejri

Keyword(s):

Systematic Review ◽

Text Mining ◽

Knowledge Discovery ◽

Research Collaboration ◽

Collaborative Writing ◽

Web Based ◽

Diverse Range ◽

Content Type ◽

Mining Technique ◽

Database Technology

Purpose The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics. Design/methodology/approach This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1). Findings The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment. Originality/value This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.

Download Full-text

Biomedical text mining for semantic search and knowledge discovery

Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI '12 ◽

10.1145/2110363.2110365 ◽

2012 ◽

Cited By ~ 1

Author(s):

Sophia Ananiadou

Keyword(s):

Text Mining ◽

Knowledge Discovery ◽

Semantic Search ◽

Biomedical Text ◽

Biomedical Text Mining

Download Full-text

A multilingual text mining approach to web cross-lingual text retrieval

Knowledge-Based Systems ◽

10.1016/j.knosys.2004.04.001 ◽

2004 ◽

Vol 17 (5-6) ◽

pp. 219-227 ◽

Cited By ~ 17

Author(s):

Rowena Chau ◽

Chung-Hsing Yeh

Keyword(s):

Text Mining ◽

Text Retrieval ◽

Multilingual Text ◽

Cross Lingual

Download Full-text

Text Mining to Facilitate Domain Knowledge Discovery

Cyberspace ◽

10.5772/intechopen.85362 ◽

2020 ◽

Author(s):

Chengbin Wang ◽

Xiaogang Ma

Keyword(s):

Text Mining ◽

Knowledge Discovery ◽

Domain Knowledge

Download Full-text

Multilingual Text Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch213 ◽

2011 ◽

pp. 1380-1385 ◽

Cited By ~ 1

Author(s):

Peter A. Chew

Keyword(s):

Text Mining ◽

Search Engines ◽

Text Processing ◽

The European Union ◽

End User ◽

The World ◽

Recent Developments ◽

English Speaking ◽

Multilingual Text ◽

Multiple Languages

The principles of text mining are fundamental to technology in everyday use. The world wide web (WWW) has in many senses driven research in text mining, and with the growth of the WWW, applications of text mining (like search engines) have by now become commonplace. In a way that was not true even less than a decade ago, it is taken for granted that the ‘needle in the haystack’ can quickly be found among large volumes of text. In most cases, however, users still expect search engines to return results in the same language as that of the query, perhaps the language best understood by the user, or the language in which text is most likely to be available. The distribution of languages on the WWW does not match the distribution of languages spoken in general by the world’s population. For example, while English is spoken by under 10% of the world’s population (Gordon 2005), it is still predominant on the WWW, accounting for perhaps two-thirds of documents. There are variety of possible reasons for this disparity, including technological inequities between different parts of the world and the fact that the WWW had its genesis in an English-speaking country. Whatever the cause for the dominance of English, the fact that two-thirds of the WWW is in one language is, in all likelihood, a major reason that the concept of multilingual text mining is still relatively new. Until recently, there simply has not been a significant and widespread need for multilingual text mining. A number of recent developments have begun to change the situation, however. Perhaps these developments can be grouped under the general rubric of ‘globalization’. They include the increasing adoption, use, and popularization of the WWW in non-Englishspeaking societies; the trend towards political integration of diverse linguistic communities (highly evident, for example, in the European Union); and a growing interest in understanding social, technological and political developments in other parts of the world. All these developments contribute to a greater demand for multilingual text processing – essentially, methods for handling, managing, and comparing documents in multiple languages, some of which may not even be known to the end user.

Download Full-text