scholarly journals TEMPORAL CONDENSATION OF TAMIL NEWS

2021 ◽  
Vol 06 (07) ◽  
Author(s):  
Shreenidhi S ◽  

Since the dawn of the Internet, we have been inundated with an excess of information. The volume of information available on the Internet is expected to grow exponentially. This brings a need for summarization of information. Thus, making summarization one of the most sought-after topics in the domain of natural language processing. It is essential to be informed about the vital happenings, and newspapers have been serving this purpose for a very long time. Sadly, there is a perception among the general public that no news agency today can be unequivocally trusted, the credibility of news articles is uncertain. Therefore, one has to read news articles from various sources to get an unbiased view on topic. When a query related to an event is entered in SEs like google, the search renders an overwhelming number of responses, it is humanly impossible to read all of them. In an effort to address the aforementioned problems, a condensation of news articles covering the Tamilnadu Legislative Assembly election is performed. The news articles were collected from various news sources over a period of two months. The collected articles were translated from Tamil to English. These articles included news about various events, in order to segregate Tamilnadu related news from them k-means clustering was performed on the dataset. The relvant news articles acquired was pre-processed to remove ambiguity and mistakes from translation. These articles were summarized individually using a linear regression model that gave importance to features such as named entities, number of words that were similar to title etc. The acquired individual summaries were summarized using BERT extractive summarizer as it would reduce redundancy. When generated summary was compared with introduction and title of the article in the absence of an introduction a precision of 0.512, recall of 0.25 and f-measure of 0.31 were obtained.

Designs ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 42
Author(s):  
Eric Lazarski ◽  
Mahmood Al-Khassaweneh ◽  
Cynthia Howard

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.


Clinical parsing is useful in medical domain .Clinical narratives are difficult to understand as it is in unstructured format .Medical Natural language processing systems are used to make these clinical narratives in readable format. Clinical Parser is the combination of natural language processing and medical lexicon .For making clinical narrative understandable parsing technique is used .In this paper we are discussing about constituency parser for clinical narratives, which is based on phrase structured grammar. This parser convert unstructured clinical narratives into structured report. This paper focus on clinical sentences which is in unstructured format after parsing convert into structured format. For each sentence recall ,precision and bracketing f- measure are calculated .


2021 ◽  
Author(s):  
Carolinne Roque e Faria ◽  
Cinthyan Renata Sachs Camerlengo de Barb

Technology is becoming expressively popular among agribusiness producers and is progressing in all agricultural area. One of the difficulties in this context is to handle data in natural language to solve problems in the field of agriculture. In order to build up dialogs and provide rich researchers, the present work uses Natural Language Processing (NLP) techniques to develop an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in the soybean farming, stored in a database repository to provide accurate diagnoses to simplify the work of the agricultural professional and also for those who deal with a lot of information in this area. Information on 108 pests and 19 diseases that damage Brazilian soybean was collected from Brazilian bibliographic manuals with the purpose to optimize the data and improve production, using the spaCy library for syntactic analysis of NLP, which allowed the pre-process the texts, recognize the named entities, calculate the similarity between the words, verify dependency parsing and also provided the support for the development requirements of the CAROLINA tool (Robotized Agronomic Conversation in Natural Language) using the language belonging to the agricultural area.


Author(s):  
Fredrik Johansson ◽  
Lisa Kaati ◽  
Magnus Sahlgren

The ability to disseminate information instantaneously over vast geographical regions makes the Internet a key facilitator in the radicalisation process and preparations for terrorist attacks. This can be both an asset and a challenge for security agencies. One of the main challenges for security agencies is the sheer amount of information available on the Internet. It is impossible for human analysts to read through everything that is written online. In this chapter we will discuss the possibility of detecting violent extremism by identifying signs of warning behaviours in written text – what we call linguistic markers – using computers, or more specifically, natural language processing.


10.29007/f4j4 ◽  
2018 ◽  
Author(s):  
Behnam Sabeti ◽  
Pedram Hosseini ◽  
Gholamreza Ghassem-Sani ◽  
Sَeyed Abolghasem Mirroshandel

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.


Author(s):  
Deniz Caliskan ◽  
Jakob Zierk ◽  
Detlef Kraska ◽  
Stefan Schulz ◽  
Philipp Daumke ◽  
...  

Introduction: The aim of this study is to evaluate the use of a natural language processing (NLP) software to extract medication statements from unstructured medical discharge letters. Methods: Ten randomly selected discharge letters were extracted from the data warehouse of the University Hospital Erlangen (UHE) and manually annotated to create a gold standard. The AHD NLP tool, provided by MIRACUM’s industry partner was used to annotate these discharge letters. Annotations by the NLP tool where then compared to the gold standard on two levels: phrase precision (whether or not the whole medication statement has been identified correctly) and token precision (whether or not the medication name has been identified correctly within correctly discovered medication phrases). Results: The NLP tool detected medication related phrases with an overall F-measure of 0.852. The medication name has been identified correctly with an overall F-measure of 0.936. Discussion: This proof-of-concept study is a first step towards an automated scalable evaluation system for MIRACUM’s industry partner’s NLP tool by using a gold standard. Medication phrases and names have been correctly identified in most cases by the NLP system. Future effort needs to be put into extending and validating the gold standard.


2014 ◽  
Vol 22 (1) ◽  
pp. 132-142 ◽  
Author(s):  
Ching-Heng Lin ◽  
Nai-Yuan Wu ◽  
Wei-Shao Lai ◽  
Der-Ming Liou

Abstract Background and objective Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Methods Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. Results The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. Conclusions The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.


2018 ◽  
Vol 25 (1) ◽  
pp. 211-217 ◽  
Author(s):  
ROBERT DALE

AbstractThe law has language at its heart, so it’s not surprising that software that operates on natural language has played a role in some areas of the legal profession for a long time. But the last few years have seen an increased interest in applying modern techniques to a wider range of problems, so I look here at how natural language processing is being used in the legal sector today.


Author(s):  
Mamoru Mimura ◽  
Ryo Ito

AbstractExecutable files still remain popular to compromise the endpoint computers. These executable files are often obfuscated to avoid anti-virus programs. To examine all suspicious files from the Internet, dynamic analysis requires too much time. Therefore, a fast filtering method is required. With the recent development of natural language processing (NLP) techniques, printable strings became more effective to detect malware. The combination of the printable strings and NLP techniques can be used as a filtering method. In this paper, we apply NLP techniques to malware detection. This paper reveals that printable strings with NLP techniques are effective for detecting malware in a practical environment. Our dataset consists of more than 500,000 samples obtained from multiple sources. Our experimental results demonstrate that our method is effective to not only subspecies of the existing malware, but also new malware. Our method is effective against packed malware and anti-debugging techniques.


As the internet is becoming part of our daily routine there is sudden growth and popularity of online news reading. This news can become a major issue to the public and government bodies (especially politically) if its fake hence authentication is necessary. It is essential to flag the fake news before it goes viral and misleads the society. In this paper, various Natural Language Processing techniques along with the number of classifiers are used to identify news content for its credibility.Further this technique can be used for various applications like plagiarismcheck , checking for criminal records.


Sign in / Sign up

Export Citation Format

Share Document