language processing
Recently Published Documents





2022 ◽  
Vol 9 (3) ◽  
pp. 0-0

This paper presents the work done on recommendations of healthcare related journal papers by understanding the semantics of terms from the papers referred by users in past. In other words, user profiles based on user interest within the healthcare domain are constructed from the kind of journal papers read by the users. Multiple user profiles are constructed for each user based on different categories of papers read by the users. The proposed approach goes to the granular level of extrinsic and intrinsic relationship between terms and clusters highly semantically related relevant domain terms where each cluster represents a user interest area. The semantic analysis of terms is done starting from co-occurrence analysis to extract the intra-couplings between terms and then the inter-couplings are extracted from the intra-couplings and then finally clusters of highly related terms are formed. The experiments showed improved precision for the proposed approach as compared to the state-of-the-art technique with a mean reciprocal rank of 0.76.

2022 ◽  
Vol 30 (6) ◽  
pp. 1-21
Lei Li ◽  
Shaojun Ma ◽  
Runqi Wang ◽  
Yiping Wang ◽  
Yilin Zheng

Abundant natural resources are the basis of urbanisation and industrialisation. Citizens are the key factor in promoting a sustainable supply of natural resources and the high-quality development of urban areas. This study focuses on the co-production behaviours of citizens regarding urban natural resource assets in the age of big data, and uses the latent Dirichlet allocation algorithm and the stepwise regression analysis method to evaluate citizens’ experiences and feelings related to the urban capitalisation of natural resources. Results show that, firstly, the machine learning algorithm based on natural language processing can effectively identify and deal with the demands of urban natural resource assets. Secondly, in the experience of urban natural resources, citizens pay more attention to the combination of history, culture, infrastructure and natural landscape. Unique natural resource can enhance citizens’ sense of participation. Finally, the scenery, entertainment and quality and value of urban natural resources are the influencing factors of citizens’ satisfaction.

Cognition ◽  
2022 ◽  
Vol 221 ◽  
pp. 104988
Duygu Özge ◽  
Jaklin Kornfilt ◽  
Katja Maquate ◽  
Aylin C. Küntay ◽  
Jesse Snedeker

Semantic Web technology is not new as most of us contemplate; it has evolved over the years. Linked Data web terminology is the name set recently to the Semantic Web. Semantic Web is a continuation of Web 2.0 and it is to replace existing technologies. It is built on Natural Language processing and provides solutions to most of the prevailing issues. Web 3.0 is the version of Semantic Web caters to the information needs of half of the population on earth. This paper links two important current concerns, the security of information and enforced online education due to COVID-19 with Semantic Web. The Steganography requirement for the Semantic web is discussed elaborately, even though encryption is applied which is inadequate in providing protection. Web 2.0 issues concerning online education and semantic Web solutions have been discussed. An extensive literature survey has been conducted related to the architecture of Web 3.0, detailed history of online education, and Security architecture. Finally, Semantic Web is here to stay and data hiding along with encryption makes it robust.

2022 ◽  
Vol 22 (2) ◽  
pp. 1-21
Syed Atif Moqurrab ◽  
Adeel Anjum ◽  
Abid Khan ◽  
Mansoor Ahmed ◽  
Awais Ahmad ◽  

Due to the Internet of Things evolution, the clinical data is exponentially growing and using smart technologies. The generated big biomedical data is confidential, as it contains a patient’s personal information and findings. Usually, big biomedical data is stored over the cloud, making it convenient to be accessed and shared. In this view, the data shared for research purposes helps to reveal useful and unexposed aspects. Unfortunately, sharing of such sensitive data also leads to certain privacy threats. Generally, the clinical data is available in textual format (e.g., perception reports). Under the domain of natural language processing, many research studies have been published to mitigate the privacy breaches in textual clinical data. However, there are still limitations and shortcomings in the current studies that are inevitable to be addressed. In this article, a novel framework for textual medical data privacy has been proposed as Deep-Confidentiality . The proposed framework improves Medical Entity Recognition (MER) using deep neural networks and sanitization compared to the current state-of-the-art techniques. Moreover, the new and generic utility metric is also proposed, which overcomes the shortcomings of the existing utility metric. It provides the true representation of sanitized documents as compared to the original documents. To check our proposed framework’s effectiveness, it is evaluated on the i2b2-2010 NLP challenge dataset, which is considered one of the complex medical data for MER. The proposed framework improves the MER with 7.8% recall, 7% precision, and 3.8% F1-score compared to the existing deep learning models. It also improved the data utility of sanitized documents up to 13.79%, where the value of the  k is 3.

Mir Ragib Ishraq ◽  
Nitesh Khadka ◽  
Asif Mohammed Samir ◽  
M. Shahidur Rahman

Three different Indic/Indo-Aryan languages - Bengali, Hindi and Nepali have been explored here in character level to find out similarities and dissimilarities. Having shared the same root, the Sanskrit, Indic languages bear common characteristics. That is why computer and language scientists can take the opportunity to develop common Natural Language Processing (NLP) techniques or algorithms. Bearing the concept in mind, we compare and analyze these three languages character by character. As an application of the hypothesis, we also developed a uniform sorting algorithm in two steps, first for the Bengali and Nepali languages only and then extended it for Hindi in the second step. Our thorough investigation with more than 30,000 words from each language suggests that, the algorithm maintains total accuracy as set by the local language authorities of the respective languages and good efficiency.

Sunita Warjri ◽  
Partha Pakray ◽  
Saralin A. Lyngdoh ◽  
Arnab Kumar Maji

Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a good performance of the tagger. Our main contribution in this research work is the designed Khasi POS corpus. Till date, there has been no form of any kind of Khasi corpus developed or formally developed. In the present designed Khasi POS corpus, each word is tagged manually using the designed tagset. Methods of deep learning have been used to experiment with our designed Khasi POS corpus. The POS tagger based on BiLSTM, combinations of BiLSTM with CRF, and character-based embedding with BiLSTM are presented. The main challenges of understanding and handling Natural Language toward Computational linguistics to encounter are anticipated. In the presently designed corpus, we have tried to solve the problems of ambiguities of words concerning their context usage, and also the orthography problems that arise in the designed POS corpus. The designed Khasi corpus size is around 96,100 tokens and consists of 6,616 distinct words. Initially, while running the first few sets of data of around 41,000 tokens in our experiment the taggers are found to yield considerably accurate results. When the Khasi corpus size has been increased to 96,100 tokens, we see an increase in accuracy rate and the analyses are more pertinent. As results, accuracy of 96.81% is achieved for the BiLSTM method, 96.98% for BiLSTM with CRF technique, and 95.86% for character-based with LSTM. Concerning substantial research from the NLP perspectives for Khasi, we also present some of the recently existing POS taggers and other NLP works on the Khasi language for comparative purposes.

COVID-19 outbreak has created havoc around the world and has brought life to a disturbing halt claiming thousands of lives worldwide and infected cases rising every day. With technological advancements in Artificial Intelligence (AI), AI-based platforms can be used to deal with COVID-19 pandemic and accelerate the processes ranging from crowd surveillance to medical diagnosis. This paper renders a response to battle the virus through various AI techniques by making use of its subsets such as Machine Learning (ML), Deep learning (DL) and Natural Language Processing (NLP). A survey of promising AI methods which could be used in various applications to facilitate the processes in this pandemic along potential of AI and challenges imposed are discussed thoroughly. This paper relies on the findings of the most recent research publications and journals on COVID-19 and suggests numerous relevant strategies. A case study on the impact of COVID-19 in various economic sectors is also discussed. The potential research challenges and future directions are also presented in the paper.

Md. Saddam Hossain Mukta ◽  
Md. Adnanul Islam ◽  
Faisal Ahamed Khan ◽  
Afjal Hossain ◽  
Shuvanon Razik ◽  

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.

Ali Saeed ◽  
Rao Muhammad Adeel Nawab ◽  
Mark Stevenson

Word Sense Disambiguation (WSD), the process of automatically identifying the correct meaning of a word used in a given context, is a significant challenge in Natural Language Processing. A range of approaches to the problem has been explored by the research community. The majority of these efforts has focused on a relatively small set of languages, particularly English. Research on WSD for South Asian languages, particularly Urdu, is still in its infancy. In recent years, deep learning methods have proved to be extremely successful for a range of Natural Language Processing tasks. The main aim of this study is to apply, evaluate, and compare a range of deep learning methods approaches to Urdu WSD (both Lexical Sample and All-Words) including Simple Recurrent Neural Networks, Long-Short Term Memory, Gated Recurrent Units, Bidirectional Long-Short Term Memory, and Ensemble Learning. The evaluation was carried out on two benchmark corpora: (1) the ULS-WSD-18 corpus and (2) the UAW-WSD-18 corpus. Results (Accuracy = 63.25% and F1-Measure = 0.49) show that a deep learning approach outperforms previously reported results for the Urdu All-Words WSD task, whereas performance using deep learning approaches (Accuracy = 72.63% and F1-Measure = 0.60) are low in comparison to previously reported for the Urdu Lexical Sample task.

Sign in / Sign up

Export Citation Format

Share Document