Application of Deep Learning Model Convolution Neural Network for Effective Web Information Retrieval

Fixed Length ◽

Web Information ◽

The Rich

Convolution neural network (CNN) is the most popular deep learning method that has been used for various applications like image recognition, computer vision, and natural language processing. In this chapter, application of CNN in web query session mining for effective information retrieval is explained. CNN has been used for document analysis to capture the rich contextual structure in a search query or document content. The document content represented in matrix form using Word2Vec is applied to CNN for convolution as well as maxpooling operations to generate the fixed length document feature vector. This fixed length document feature vector is input to fully connected neural network (FNN) and generates the semantic document vector. These semantic document vectors are clustered to group similar document for effective web information retrieval. An experiment was performed on the data set of web query sessions, and results confirm the effectiveness of CNN in web query session mining for effective information retrieval.

A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012100101 ◽

2012 ◽

Vol 2 (4) ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Sanjay K. Dwivedi

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Similarity Score ◽

Average Precision ◽

Web Information ◽

Average Similarity ◽

Hindi Language ◽

Ambiguity Detection ◽

Web Queries

The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%.

Multi-agent Web Information Retrieval: Neural Network Based Approach

Advances in Intelligent Data Analysis - Lecture Notes in Computer Science ◽

10.1007/3-540-48412-4_42 ◽

1999 ◽

pp. 499-511 ◽

Cited By ~ 2

Author(s):

Yong S. Choi ◽

Suk I. Yoo

Keyword(s):

Neural Network ◽

Information Retrieval ◽

Web Information ◽

Multi Agent

Analisis Sentimen Pilkada di Tengah Pandemi Covid-19 Menggunakan Convolution Neural Network (CNN)

Jurnal Pendidikan dan Teknologi Indonesia ◽

10.52436/1.jpti.60 ◽

2021 ◽

Vol 1 (7) ◽

pp. 261-268

Author(s):

Sukma Nindi Listyarini ◽

Dimas Aryo Anggoro

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Convolution Neural Network

Pemilihan kepala daerah 2020 menjadi kontroversi, sebab dilaksanakan ditengah pandemi covid-19. Komentar muncul di berbagai lini media sosial seperti twitter. Banyak masyarakat yang setuju pilkada dilanjutkan, namun banyak juga yang perpendapat untuk menunda pilkada sampai masa pandemi berakhir. Melihat perbedaan pendapat seperti ini, perlu dilakukan analisis sentimen, dengan tujuan untuk memperoleh persepsi atau gambaran umum masyarakat terhadap penyelenggaraan pilkada 2020 saat pandemi covid-19. Sebanyak 500 tweet diperoleh dengan cara crawling data dari twitter API menggunakan library tweepy, bedasarkan keyword yang telah ditentukan. Dataset yang didapat diberi label ke dalam dua kelas, negatif dan positif. Penelitian ini mengusulkan pendekatan deep learning dengan algoritma Convolution Neural Network (CNN) untuk klasifikasi, yang terbukti efektif untuk tugas Natural Language Processing (NLP) dan mampu mencapai kinerja yang baik dalam klasifikasi kalimat. Percobaan dilakukan dengan menerapkan 4-layer convolutional dan mengamati pengaruh jumlah epoch terhadap akurasi model. Variasi epoch yang digunakan adalah 50, 75, 100. Hasil dari penelitian menunjukkan bahwa, metode CNN dengan dataset pilkada ditengah pandemi mendapatkan akurasi tertinggi sebesar 90% dengan 4-layer convolutional dan 100 epoch. Didapatkan pula bahwa, semakin banyak epoch yang digunakan dalam model, akurasi cenderung meningkat.

Comparing DBpedia, Wikidata, and YAGO for Web Information Retrieval

Intelligent and Interactive Computing - Lecture Notes in Networks and Systems ◽

10.1007/978-981-13-6031-2_40 ◽

2019 ◽

pp. 525-535 ◽

Cited By ~ 2

Author(s):

Sini Govinda Pillai ◽

Lay-Ki Soon ◽

Su-Cheng Haw

Keyword(s):

Information Retrieval ◽

2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

4th International Workshop on Web Information Retrieval Support Systems (WIRSS 2011)

10.1109/wi-iat.2011.308 ◽

2011 ◽

Keyword(s):

Information Retrieval ◽

Support Systems ◽

International Workshop ◽

Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2019.9.2.778 ◽

2019 ◽

Vol 9 (2) ◽

pp. 143-148 ◽

Cited By ~ 4

Author(s):

Supaporn Bunrit ◽

◽

Thuttaphol Inkian ◽

Nittaya Kerdprasop ◽

Kittisak Kerdprasop

Keyword(s):

Neural Network ◽

Deep Learning ◽

Speaker Identification ◽

Learning Model ◽

Convolution Neural Network ◽

Deep Learning Model

Proceedings of the International Conference for Phoenixes on Emerging Current Trends in Engineering and Management (PECTEAM 2018) ◽

Literature Survey: Analysis on Semantic Web Information Retrieval Methodologies

10.2991/pecteam-18.2018.18 ◽

2018 ◽

Cited By ~ 1

Author(s):

K Ezhilarasi ◽

G. Maria Kalavathy

Keyword(s):

Information Retrieval ◽

Semantic Web ◽

Literature Survey ◽

Survey Analysis ◽

Web Information Retrieval

International Journal of Computer Applications ◽

10.5120/13213-0595 ◽

2013 ◽

Vol 76 (1) ◽

pp. 29-32

Author(s):

Vikas Thada ◽

Vivek Jaglan

Keyword(s):

Information Retrieval ◽

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning

JMIR Medical Informatics ◽

10.2196/23230 ◽

2021 ◽

Vol 9 (8) ◽

pp. e23230

Author(s):

Pei-Fu Chen ◽

Ssu-Ming Wang ◽

Wei-Chih Liao ◽

Lu-Cheng Kuo ◽

Kuan-Chih Chen ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Deep Neural Network ◽

University Hospital ◽

Classification Model ◽

Icd 10 ◽

And Training

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.