Text Mining in Cybersecurity: Exploring Threats and Opportunities

Maaike H. T. de Boer; Babette J. Bakker; Erik Boertjes; Mike Wilmer; Stephan Raaijmakers; Rick van der Kleij

doi:10.3390/mti3030062

Text Mining in Cybersecurity: Exploring Threats and Opportunities

Multimodal Technologies and Interaction ◽

10.3390/mti3030062 ◽

2019 ◽

Vol 3 (3) ◽

pp. 62 ◽

Cited By ~ 1

Author(s):

Maaike H. T. de Boer ◽

Babette J. Bakker ◽

Erik Boertjes ◽

Mike Wilmer ◽

Stephan Raaijmakers ◽

...

Keyword(s):

Information Retrieval ◽

Text Mining ◽

State Of The Art ◽

Data Sources ◽

Added Value ◽

Design Decision ◽

User Evaluation ◽

User Query ◽

Overall Evaluation ◽

Speed Up

The number of cyberattacks on organizations is growing. To increase cyber resilience, organizations need to obtain foresight to anticipate cybersecurity vulnerabilities, developments, and potential threats. This paper describes a tool that combines state of the art text mining and information retrieval techniques to explore the opportunities of using these techniques in the cybersecurity domain. Our tool, the Horizon Scanner, can scrape and store data from websites, blogs and PDF articles, and search a database based on a user query, show textual entities in a graph, and provide and visualize potential trends. The aim of the Horizon Scanner is to help experts explore relevant data sources for potential threats and trends and to speed up the process of foresight. In a requirements session and user evaluation of the tool with cyber experts from the Dutch Defense Cyber Command, we explored whether the Horizon Scanner tool has the potential to fulfill its aim in the cybersecurity domain. Although the overall evaluation of the tool was not as good as expected, some aspects of the tool were found to have added value, providing us with valuable insights into how to design decision support for forecasting analysts.

Get full-text (via PubEx)

Sentiment Analysis of Code-Mixed Text: A Review

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1239 ◽

2021 ◽

Vol 12 (3) ◽

pp. 2469-2478

Author(s):

Nurul Husna Mahadzir Et.al

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Sentiment Analysis ◽

State Of The Art ◽

The State ◽

The Internet ◽

Qualitative Comparison ◽

Analysis Process ◽

Active Research ◽

Multiple Languages

In recent times, sentiment analysis has become one of the most active research and progressively popular areas in information retrieval and text mining. To date, sentiment analysis has been applied in various domains such as product, movie, sport and political reviews. Most of the previous work in this field has focused on analyzing only a single language, especially English. However, with the need of globalization and the increasing number of the Internet used worldwide; it is common to see the post written in multiple languages. Moreover, in an unstructured content like Twitter posts, people tend to mix languages in one sentence, which make sentiment analysis process even harder and more challenging. This paper reviews the state-of-the-art of sentiment analysis for code-mixed, which includes the detail discussions of each focus area, qualitative comparison and limitations of current approaches. This paper also highlights challenges along this line of research and suggests several recommendations for future works that should be explored.

Get full-text (via PubEx)

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2001.20 ◽

2020 ◽

Vol 39 (1) ◽

pp. 213-222

Author(s):

Junaid Rashid ◽

Syed Muhammad Adnan Shah ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Topic Modeling ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

State Of The Art ◽

Text Documents ◽

New Perspective ◽

Better Than

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.

Get full-text (via PubEx)

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Get full-text (via PubEx)

Review on biomass feedstocks, pyrolysis mechanism and physicochemical properties of biochar: State-of-the-art framework to speed up vision of circular bioeconomy

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.126645 ◽

2021 ◽

Vol 297 ◽

pp. 126645

Author(s):

Gajanan Sampatrao Ghodake ◽

Surendra Krushna Shinde ◽

Avinash Ashok Kadam ◽

Rijuta Ganesh Saratale ◽

Ganesh Dattatraya Saratale ◽

...

Keyword(s):

Physicochemical Properties ◽

State Of The Art ◽

Pyrolysis Mechanism ◽

Biomass Feedstocks ◽

Speed Up

Get full-text (via PubEx)

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Get full-text (via PubEx)

SUPPORT FOR MANAGING PARTLY CONFIGURABLE MODULAR SYSTEMS

Proceedings of the Design Society ◽

10.1017/pds.2021.540 ◽

2021 ◽

Vol 1 ◽

pp. 2791-2800

Author(s):

Jarkko Pakkanen ◽

Teuvo Heikkinen ◽

Nillo Adlin ◽

Timo Lehtonen ◽

Janne Mämmelä ◽

...

Keyword(s):

Portfolio Management ◽

Product Variety ◽

Added Value ◽

Modular System ◽

Product Management ◽

Design Decision ◽

Product Portfolio ◽

Modular Systems ◽

Engineer To Order ◽

The Impact

AbstractThe paper studies what kind of support could be applied to the management of partly configurable modular systems. The main tasks of product management, product portfolio management and product variety management are defined. In addition, a partly configurable product structure and modular system are defined. Because the limited support in the literature for managing partly configurable modular systems, the article reviews previous product development cases in which authors have been involved on lessons learnt basis, i.e., if the methods and tools used in the cases could provide support for the research objective. As a result, the existing definition of the modular system should be extended by the concepts of non-module and design decision sequence description when dealing with partly configurable modular systems. This is because engineer-to-order should be made possible in cases where it brings clear added value to the customer compared to completely pre-defined solutions that may limit the customer's interest in the offering. Tools to assess the impact of changes to the product offering are required. These should be taken into account in frameworks that are used in method and tool development.

Get full-text (via PubEx)

Text mining for economic geographical sectoral analysis of the pulp and paper industry in European Russia

Regional'nye issledovaniya ◽

10.5922/1994-5280-2021-1-2 ◽

2021 ◽

Vol 71 (1) ◽

pp. 18-33

Author(s):

I.F. Kuzminov ◽

P.A. Lobanova

Keyword(s):

Text Mining ◽

Paper Industry ◽

Pulp And Paper Industry ◽

Spatial Development ◽

Pulp And Paper ◽

European Russia ◽

Data Sources ◽

Management Decisions ◽

Timber Industry

The authors show the need and some existing opportunities for analysis of non-traditional data sources to obtain a complete and more relevant picture of industries spatial development. The research methodology includes the use of text mining for economic and geographical studies. The relevance of the research is determined by insufficient completeness of official statistical data, cheapening of relevant information processing technologies and abundance of large text data sources in open access. The article discusses the role of the pulp and paper industry (as a key part of the timber industry) in economic and spatial development of modern Russia. The authors identify main trends in the economic and spatial development of the pulp and paper industry of European Russia, draw the conclusions on the expected industry trends and give recommendations for strategic management decisions to respond to industry challenges. The authors claim that the industry needs liberalization and stabilization, primarily through moratoriums on policy changes. The role of the use of big data, and in particular of text mining in economic and geographical research for reasonable and objective conclusions formation that can be used to make timely and balanced management decisions in the timber industry and the pulp and paper industry, is emphasized.

Get full-text (via PubEx)

Semantic Information Retrieval on Medical Texts

ACM Computing Surveys ◽

10.1145/3462476 ◽

2022 ◽

Vol 54 (7) ◽

pp. 1-38

Author(s):

Lynda Tamine ◽

Lorraine Goeuriot

Keyword(s):

Information Retrieval ◽

Health Informatics ◽

Medical Information ◽

State Of The Art ◽

Lessons Learned ◽

Semantic Search ◽

Future Research ◽

Cross Model ◽

Wide Range ◽

Search Systems

The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.

Get full-text (via PubEx)

Semantic Representation of a Geo-Social User Profile for a Personalised Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500441 ◽

2021 ◽

pp. 2150044

Author(s):

Tahar Rafa ◽

Samir Kechid

Keyword(s):

Information Retrieval ◽

Information Search ◽

Semantic Representation ◽

User Profile ◽

Search Process ◽

Search System ◽

User Interactions ◽

User Interests ◽

Situational Contexts ◽

User Query

The user-centred information retrieval needs to introduce semantics into the user modelling for a meaningful representation of user interests. The semantic representation of the user interests helps to improve the identification of the user’s future cognitive needs. In this paper, we present a semantic-based approach for a personalised information retrieval. This approach is based on the design and the exploitation of a user profile to represent the user and his interests. In this user profile, we combine an ontological semantics issued from WordNet ontology, and a personal semantics issued from the different user interactions with the search system and with his social and situational contexts of his previous searches. The personal semantics considers the co-occurrence relations between relevant components of the user profile as semantic links. The user profile is used to improve two important phases of the information search process: (i) expansion of the initial user query and (ii) adaptation of the search results to the user interests.

Get full-text (via PubEx)

Melanocytic Lesions of the Conjunctiva

Archives of Pathology & Laboratory Medicine ◽

10.5858/2009-0522-rar.1 ◽

2010 ◽

Vol 134 (12) ◽

pp. 1785-1792 ◽

Cited By ~ 5

Author(s):

Artur Zembowicz ◽

Rajni V. Mandal ◽

Pitipol Choopong

Keyword(s):

Personal Experience ◽

State Of The Art ◽

Data Sources ◽

Clinical Aspects ◽

Review Of The Literature ◽

Melanocytic Lesions ◽

Pathologic Features

Abstract Context—Melanocytic proliferations are among the most common neoplasms of the conjunctiva. They often represent challenging lesions for pathologists unfamiliar with unique histologic features of melanocytic proliferations in this location and with nomenclature used by ophthalmologists. Objective—To comprehensively review clinical aspects, pathologic features, and management of melanocytic proliferations of the conjunctiva. Data Sources—Review of the literature and personal experience of the authors. Conclusions—Classification, state of the art, and practical aspects of pathology of melanocytic proliferations of the conjunctiva are discussed.

Get full-text (via PubEx)