TEMPORAL CONDENSATION OF TAMIL NEWS

Since the dawn of the Internet, we have been inundated with an excess of information. The volume of information available on the Internet is expected to grow exponentially. This brings a need for summarization of information. Thus, making summarization one of the most sought-after topics in the domain of natural language processing. It is essential to be informed about the vital happenings, and newspapers have been serving this purpose for a very long time. Sadly, there is a perception among the general public that no news agency today can be unequivocally trusted, the credibility of news articles is uncertain. Therefore, one has to read news articles from various sources to get an unbiased view on topic. When a query related to an event is entered in SEs like google, the search renders an overwhelming number of responses, it is humanly impossible to read all of them. In an effort to address the aforementioned problems, a condensation of news articles covering the Tamilnadu Legislative Assembly election is performed. The news articles were collected from various news sources over a period of two months. The collected articles were translated from Tamil to English. These articles included news about various events, in order to segregate Tamilnadu related news from them k-means clustering was performed on the dataset. The relvant news articles acquired was pre-processed to remove ambiguity and mistakes from translation. These articles were summarized individually using a linear regression model that gave importance to features such as named entities, number of words that were similar to title etc. The acquired individual summaries were summarized using BERT extractive summarizer as it would reduce redundancy. When generated summary was compared with introduction and title of the article in the absence of an introduction a precision of 0.512, recall of 0.25 and f-measure of 0.31 were obtained.

Download Full-text

Using NLP for Fact Checking: A Survey

Designs ◽

10.3390/designs5030042 ◽

2021 ◽

Vol 5 (3) ◽

pp. 42

Author(s):

Eric Lazarski ◽

Mahmood Al-Khassaweneh ◽

Cynthia Howard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computer Science ◽

Language Processing ◽

The Internet ◽

Fake News ◽

Fact Checking ◽

The Many ◽

Human Powered ◽

The Web

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.

Download Full-text

Constituency Parser for Clinical Narratives using NLP

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8865.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2277-2279

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentence Recall ◽

Structured Report ◽

Clinical Narrative ◽

Medical Domain ◽

Readable Format ◽

F Measure

Clinical parsing is useful in medical domain .Clinical narratives are difficult to understand as it is in unstructured format .Medical Natural language processing systems are used to make these clinical narratives in readable format. Clinical Parser is the combination of natural language processing and medical lexicon .For making clinical narrative understandable parsing technique is used .In this paper we are discussing about constituency parser for clinical narratives, which is based on phrase structured grammar. This parser convert unstructured clinical narratives into structured report. This paper focus on clinical sentences which is in unstructured format after parsing convert into structured format. For each sentence recall ,precision and bracketing f- measure are calculated .

Download Full-text

Identificação de Pragas e Doenças na Cultura da Soja por meio de um Sistema Computacional em Linguagem Natural

10.14210/cotb.v12.p324-331 ◽

2021 ◽

Author(s):

Carolinne Roque e Faria ◽

Cinthyan Renata Sachs Camerlengo de Barb

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computer System ◽

Language Processing ◽

Agricultural Area ◽

Syntactic Analysis ◽

Dependency Parsing ◽

Named Entities ◽

Pests And Diseases ◽

Improve Production

Technology is becoming expressively popular among agribusiness producers and is progressing in all agricultural area. One of the difficulties in this context is to handle data in natural language to solve problems in the field of agriculture. In order to build up dialogs and provide rich researchers, the present work uses Natural Language Processing (NLP) techniques to develop an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in the soybean farming, stored in a database repository to provide accurate diagnoses to simplify the work of the agricultural professional and also for those who deal with a lot of information in this area. Information on 108 pests and 19 diseases that damage Brazilian soybean was collected from Brazilian bibliographic manuals with the purpose to optimize the data and improve production, using the spaCy library for syntactic analysis of NLP, which allowed the pre-process the texts, recognize the named entities, calculate the similarity between the words, verify dependency parsing and also provided the support for the development requirements of the CAROLINA tool (Robotized Agronomic Conversation in Natural Language) using the language belonging to the agricultural area.

Download Full-text

Detecting Linguistic Markers of Violent Extremism in Online Environments

Combating Violent Extremism and Radicalization in the Digital Era - Advances in Religious and Cultural Studies ◽

10.4018/978-1-5225-0156-5.ch018 ◽

2016 ◽

pp. 374-390 ◽

Cited By ~ 5

Author(s):

Fredrik Johansson ◽

Lisa Kaati ◽

Magnus Sahlgren

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Terrorist Attacks ◽

The Internet ◽

Written Text ◽

Linguistic Markers ◽

Geographical Regions ◽

Violent Extremism ◽

Online Environments

The ability to disseminate information instantaneously over vast geographical regions makes the Internet a key facilitator in the radicalisation process and preparations for terrorist attacks. This can be both an asset and a challenge for security agencies. One of the main challenges for security agencies is the sheer amount of information available on the Internet. It is impossible for human analysts to read through everything that is written online. In this chapter we will discuss the possibility of detecting violent extremism by identifying signs of warning behaviours in written text – what we call linguistic markers – using computers, or more specifically, natural language processing.

Download Full-text

LexiPers: An ontology based sentiment lexicon for Persian

10.29007/f4j4 ◽

2018 ◽

Author(s):

Behnam Sabeti ◽

Pedram Hosseini ◽

Gholamreza Ghassem-Sani ◽

Sَeyed Abolghasem Mirroshandel

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Nearest Neighbors ◽

Classification Problem ◽

General Purpose ◽

Seed Selection ◽

K Nearest Neighbors ◽

Sentiment Lexicon ◽

Sentiment Orientation ◽

F Measure

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.

Download Full-text

First Steps to Evaluate an NLP Tool’s Medication Extraction Accuracy from Discharge Letters

German Medical Data Sciences: Bringing Data to Life - Studies in Health Technology and Informatics ◽

10.3233/shti210073 ◽

2021 ◽

Author(s):

Deniz Caliskan ◽

Jakob Zierk ◽

Detlef Kraska ◽

Stefan Schulz ◽

Philipp Daumke ◽

...

Keyword(s):

Natural Language Processing ◽

Data Warehouse ◽

Language Processing ◽

Gold Standard ◽

Evaluation System ◽

University Hospital ◽

Industry Partner ◽

Standard Medication ◽

The University ◽

F Measure

Introduction: The aim of this study is to evaluate the use of a natural language processing (NLP) software to extract medication statements from unstructured medical discharge letters. Methods: Ten randomly selected discharge letters were extracted from the data warehouse of the University Hospital Erlangen (UHE) and manually annotated to create a gold standard. The AHD NLP tool, provided by MIRACUM’s industry partner was used to annotate these discharge letters. Annotations by the NLP tool where then compared to the gold standard on two levels: phrase precision (whether or not the whole medication statement has been identified correctly) and token precision (whether or not the medication name has been identified correctly within correctly discovered medication phrases). Results: The NLP tool detected medication related phrases with an overall F-measure of 0.852. The medication name has been identified correctly with an overall F-measure of 0.936. Discussion: This proof-of-concept study is a first step towards an automated scalable evaluation system for MIRACUM’s industry partner’s NLP tool by using a gold standard. Medication phrases and names have been correctly identified in most cases by the NLP system. Future effort needs to be put into extending and validating the gold standard.

Download Full-text

Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2014-002991 ◽

2014 ◽

Vol 22 (1) ◽

pp. 132-142 ◽

Cited By ~ 2

Author(s):

Ching-Heng Lin ◽

Nai-Yuan Wu ◽

Wei-Shao Lai ◽

Der-Ming Liou

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Automatic Annotation ◽

Processing Application ◽

Entry Level ◽

Novel Approach ◽

Clinical Document ◽

F Measure ◽

Document Architecture

Abstract Background and objective Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Methods Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. Results The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. Conclusions The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.

Download Full-text

Law and Word Order: NLP in Legal Tech

Natural Language Engineering ◽

10.1017/s1351324918000475 ◽

2018 ◽

Vol 25 (1) ◽

pp. 211-217 ◽

Cited By ~ 7

Author(s):

ROBERT DALE

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Order ◽

Legal Profession ◽

Long Time ◽

The Law

AbstractThe law has language at its heart, so it’s not surprising that software that operates on natural language has played a role in some areas of the legal profession for a long time. But the last few years have seen an increased interest in applying modern techniques to a wider range of problems, so I look here at how natural language processing is being used in the legal sector today.

Download Full-text

Applying NLP techniques to malware detection in a practical environment

International Journal of Information Security ◽

10.1007/s10207-021-00553-8 ◽

2021 ◽

Author(s):

Mamoru Mimura ◽

Ryo Ito

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Dynamic Analysis ◽

Language Processing ◽

Malware Detection ◽

Experimental Results ◽

The Internet ◽

Multiple Sources ◽

Filtering Method

AbstractExecutable files still remain popular to compromise the endpoint computers. These executable files are often obfuscated to avoid anti-virus programs. To examine all suspicious files from the Internet, dynamic analysis requires too much time. Therefore, a fast filtering method is required. With the recent development of natural language processing (NLP) techniques, printable strings became more effective to detect malware. The combination of the printable strings and NLP techniques can be used as a filtering method. In this paper, we apply NLP techniques to malware detection. This paper reveals that printable strings with NLP techniques are effective for detecting malware in a practical environment. Our dataset consists of more than 500,000 samples obtained from multiple sources. Our experimental results demonstrate that our method is effective to not only subspecies of the existing malware, but also new malware. Our method is effective against packed malware and anti-debugging techniques.

Download Full-text

Fake News Detection with Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8090.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 124-127

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Language Processing ◽

Online News ◽

The Internet ◽

Daily Routine ◽

Fake News ◽

Criminal Records ◽

The Public ◽

Processing Techniques

As the internet is becoming part of our daily routine there is sudden growth and popularity of online news reading. This news can become a major issue to the public and government bodies (especially politically) if its fake hence authentication is necessary. It is essential to flag the fake news before it goes viral and misleads the society. In this paper, various Natural Language Processing techniques along with the number of classifiers are used to identify news content for its credibility.Further this technique can be used for various applications like plagiarismcheck , checking for criminal records.

Download Full-text