Application of Deep Learning Model Convolution Neural Network for Effective Web Information Retrieval

Author(s):  
Suruchi Chawla

Convolution neural network (CNN) is the most popular deep learning method that has been used for various applications like image recognition, computer vision, and natural language processing. In this chapter, application of CNN in web query session mining for effective information retrieval is explained. CNN has been used for document analysis to capture the rich contextual structure in a search query or document content. The document content represented in matrix form using Word2Vec is applied to CNN for convolution as well as maxpooling operations to generate the fixed length document feature vector. This fixed length document feature vector is input to fully connected neural network (FNN) and generates the semantic document vector. These semantic document vectors are clustered to group similar document for effective web information retrieval. An experiment was performed on the data set of web query sessions, and results confirm the effectiveness of CNN in web query session mining for effective information retrieval.

2012 ◽  
Vol 2 (4) ◽  
pp. 1-11 ◽  
Author(s):  
Sanjay K. Dwivedi

The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%.


2021 ◽  
Vol 1 (7) ◽  
pp. 261-268
Author(s):  
Sukma Nindi Listyarini ◽  
Dimas Aryo Anggoro

Pemilihan kepala daerah 2020 menjadi kontroversi, sebab dilaksanakan ditengah pandemi  covid-19. Komentar muncul di berbagai lini media sosial seperti twitter. Banyak masyarakat yang setuju pilkada dilanjutkan, namun banyak juga yang perpendapat untuk menunda pilkada sampai masa pandemi berakhir. Melihat perbedaan pendapat seperti ini, perlu dilakukan analisis sentimen, dengan tujuan untuk memperoleh persepsi atau gambaran umum masyarakat terhadap penyelenggaraan pilkada 2020 saat pandemi covid-19. Sebanyak 500 tweet diperoleh dengan cara crawling data dari twitter API menggunakan library tweepy, bedasarkan keyword yang telah ditentukan. Dataset yang didapat diberi label ke dalam dua kelas, negatif dan positif. Penelitian ini mengusulkan pendekatan deep learning dengan algoritma Convolution Neural Network (CNN) untuk klasifikasi, yang terbukti efektif untuk tugas Natural Language Processing (NLP) dan mampu mencapai kinerja yang baik dalam klasifikasi kalimat. Percobaan dilakukan dengan menerapkan 4-layer convolutional dan mengamati pengaruh jumlah epoch terhadap akurasi model. Variasi epoch yang digunakan adalah 50, 75, 100.  Hasil dari penelitian menunjukkan bahwa, metode CNN dengan dataset pilkada ditengah pandemi mendapatkan akurasi tertinggi sebesar 90% dengan 4-layer convolutional dan 100 epoch. Didapatkan pula bahwa, semakin banyak epoch yang digunakan dalam model,  akurasi cenderung meningkat.


2013 ◽  
Vol 76 (1) ◽  
pp. 29-32
Author(s):  
Vikas Thada ◽  
Vivek Jaglan

10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


Sign in / Sign up

Export Citation Format

Share Document