scholarly journals Applying NLP techniques to malware detection in a practical environment

Author(s):  
Mamoru Mimura ◽  
Ryo Ito

AbstractExecutable files still remain popular to compromise the endpoint computers. These executable files are often obfuscated to avoid anti-virus programs. To examine all suspicious files from the Internet, dynamic analysis requires too much time. Therefore, a fast filtering method is required. With the recent development of natural language processing (NLP) techniques, printable strings became more effective to detect malware. The combination of the printable strings and NLP techniques can be used as a filtering method. In this paper, we apply NLP techniques to malware detection. This paper reveals that printable strings with NLP techniques are effective for detecting malware in a practical environment. Our dataset consists of more than 500,000 samples obtained from multiple sources. Our experimental results demonstrate that our method is effective to not only subspecies of the existing malware, but also new malware. Our method is effective against packed malware and anti-debugging techniques.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Lisa Grossman Liu ◽  
Raymond H. Grossman ◽  
Elliot G. Mitchell ◽  
Chunhua Weng ◽  
Karthik Natarajan ◽  
...  

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.


Designs ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 42
Author(s):  
Eric Lazarski ◽  
Mahmood Al-Khassaweneh ◽  
Cynthia Howard

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.


Author(s):  
Fredrik Johansson ◽  
Lisa Kaati ◽  
Magnus Sahlgren

The ability to disseminate information instantaneously over vast geographical regions makes the Internet a key facilitator in the radicalisation process and preparations for terrorist attacks. This can be both an asset and a challenge for security agencies. One of the main challenges for security agencies is the sheer amount of information available on the Internet. It is impossible for human analysts to read through everything that is written online. In this chapter we will discuss the possibility of detecting violent extremism by identifying signs of warning behaviours in written text – what we call linguistic markers – using computers, or more specifically, natural language processing.


2021 ◽  
Vol 1 (2) ◽  
pp. 18-22
Author(s):  
Strahil Sokolov ◽  
Stanislava Georgieva

This paper presents a new approach to processing and categorization of text from patient documents in Bulgarian language using Natural Language Processing and Edge AI. The proposed algorithm contains several phases - personal data anonymization, pre-processing and conversion of text to vectors, model training and recognition. The experimental results in terms of achieved accuracy are comparable with modern approaches.


Author(s):  
Rafael Jiménez ◽  
Vicente García ◽  
Abraham López ◽  
Alejandra Mendoza Carreón ◽  
Alan Ponce

The Autonomous University of Ciudad Juárez performs an instructor evaluation each semester to find strengths, weaknesses, and areas of opportunity during the teaching process. In this chapter, the authors show how opinion mining can be useful for labeling student comments as positives and negatives. For this purpose, a database was created using real opinions obtained from five professors of the UACJ over the last four years, covering a total of 20 subjects. Natural language processing techniques were used on the database to normalize its data. Experimental results using 1-NN and Bagging classifiers shows that it is possible to automatically label positive and negative comments with an accuracy of 80.13%.


2021 ◽  
pp. 1-13
Author(s):  
Deguang Chen ◽  
Ziping Ma ◽  
Lin Wei ◽  
Yanbin Zhu ◽  
Jinlin Ma ◽  
...  

Text-based reading comprehension models have great research significance and market value and are one of the main directions of natural language processing. Reading comprehension models of single-span answers have recently attracted more attention and achieved significant results. In contrast, multi-span answer models for reading comprehension have been less investigated and their performances need improvement. To address this issue, in this paper, we propose a text-based multi-span network for reading comprehension, ALBERT_SBoundary, and build a multi-span answer corpus, MultiSpan_NMU. We also conduct extensive experiments on the public multi-span corpus, MultiSpan_DROP, and our multi-span answer corpus, MultiSpan_NMU, and compare the proposed method with the state-of-the-art. The experimental results show that our proposed method achieves F1 scores of 84.10 and 92.88 on MultiSpan_DROP and MultiSpan_NMU datasets, respectively, while it also has fewer parameters and a shorter training time.


Author(s):  
Jalal S. Alowibdi ◽  
Abdulrahman A. Alshdadi ◽  
Ali Daud ◽  
Mohamed M. Dessouky ◽  
Essa Ali Alhazmi

People are afraid about COVID-19 and are actively talking about it on social media platforms such as Twitter. People are showing their emotions openly in their tweets on Twitter. It's very important to perform sentiment analysis on these tweets for finding COVID-19's impact on people's lives. Natural language processing, textual processing, computational linguists, and biometrics are applied to perform sentiment analysis to identify and extract the emotions. In this work, sentiment analysis is carried out on a large Twitter dataset of English tweets. Ten emotional themes are investigated. Experimental results show that COVID-19 has spread fear/anxiety, gratitude, happiness and hope, and other mixed emotions among people for different reasons. Specifically, it is observed that positive news from top officials like Trump of chloroquine as cure to COVID-19 has suddenly lowered fear in sentiment, and happiness, gratitude, and hope started to rise. But, once FDA said, chloroquine is not effective cure, fear again started to rise.


2016 ◽  
Vol 25 (01) ◽  
pp. 234-239 ◽  
Author(s):  
P. Zweigenbaum ◽  
A. Névéol ◽  

Summary Objective: To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). Method: A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. Results: The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. Conclusions: The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.


2013 ◽  
Vol 21 (1) ◽  
pp. 113-138 ◽  
Author(s):  
MUHUA ZHU ◽  
JINGBO ZHU ◽  
HUIZHEN WANG

AbstractShift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.


Sign in / Sign up

Export Citation Format

Share Document