scholarly journals Image Captioning menurut Scientific Revolution Kuhn dan Popper

2020 ◽  
Vol 10 (2) ◽  
pp. 110-121
Author(s):  
Agus Nursikuwagus ◽  
Rinaldi Munir ◽  
Masayu Layla Khodra

Perkembangan untuk memberikan caption pada suatu gambar merupakan suatu ranah perkembangan baru dalam bidang intelejensia buatan.  Image captioning merupakan penggabungan dari beberapa bidang seperti computer vision, natural language, dan pembelajaran mesin. Aspek yang menjadi perhatian dalam bidang image captioning ini adalah ketepatan arsitektur neural network yang dimodelkan untuk mendapatkan hasil yang sedekat mungkin dengan ground-thruth yang disampaikan oleh person. Beberapa kajian yang sudah diteliti masih mendapatkan kalimat yang masih jauh dari ground-thruth tersebut. Permasalahan yang dibahas pada umumnya mengenai image captioning adalah image generator dan text generator yaitu penggunaan deep learning seperti CNN dan LSTM untuk menyelesaikan masalah captioning. Hal ini menjadi dasar permasalahan untuk memberikan kontribusi baru dalam bidang image captioning yang meliputi image extractor, text generator, dan evaluator yang bisa digunakan pada model yang diusulkan. Perspektif Kuhn dan Popper dalam hal image captioning, diperoleh bahwa caption dalam bidang geologi sangat diperlukan dan mencapai tahap krisis. Perlu adanya metode usulan baru untuk menyajikan caption untuk citra geologi.

2020 ◽  
Vol 3 (1) ◽  
pp. 138-146
Author(s):  
Subash Pandey ◽  
Rabin Kumar Dhamala ◽  
Bikram Karki ◽  
Saroj Dahal ◽  
Rama Bastola

 Automatically generating a natural language description of an image is a major challenging task in the field of artificial intelligence. Generating description of an image bring together the fields: Natural Language Processing and Computer Vision. There are two types of approaches i.e. top-down and bottom-up. For this paper, we approached top-down that starts from the image and converts it into the word. Image is passed to Convolutional Neural Network (CNN) encoder and the output from it is fed further to Recurrent Neural Network (RNN) decoder that generates meaningful captions. We generated the image description by passing the real time images from the camera of a smartphone as well as tested with the test images from the dataset. To evaluate the model performance, we used BLEU (Bilingual Evaluation Understudy) score and match predicted words to the original caption.


Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


2021 ◽  
Vol 336 ◽  
pp. 07004
Author(s):  
Ruoyu Fang ◽  
Cheng Cai

Obstacle detection and target tracking are two major issues for intelligent autonomous vehicles. This paper proposes a new scheme to achieve target tracking and real-time obstacle detection of obstacles based on computer vision. ResNet-18 deep learning neural network is utilized for obstacle detection and Yolo-v3 deep learning neural network is employed for real-time target tracking. These two trained models can be deployed on an autonomous vehicle equipped with an NVIDIA Jetson Nano motherboard. The autonomous vehicle moves to avoid obstacles and follow tracked targets by camera. Adjusting the steering and movement of the autonomous vehicle according to the PID algorithm during the movement, therefore, will help the proposed vehicle achieve stable and precise tracking.


10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


News is a routine in everyone's life. It helps in enhancing the knowledge on what happens around the world. Fake news is a fictional information madeup with the intension to delude and hence the knowledge acquired becomes of no use. As fake news spreads extensively it has a negative impact in the society and so fake news detection has become an emerging research area. The paper deals with a solution to fake news detection using the methods, deep learning and Natural Language Processing. The dataset is trained using deep neural network. The dataset needs to be well formatted before given to the network which is made possible using the technique of Natural Language Processing and thus predicts whether a news is fake or not.


2020 ◽  
Vol 6 (2) ◽  
pp. 115-121
Author(s):  
Ari Purno Wahyu ◽  
Heri Heryono ◽  
Muhammad Benny Chaniago ◽  
Dani Hamdani

Kesehatan merupakan bagian terpenting bagi kita dimana pengaruh atau datangnya penyakit melalui pola makan, terlebih bagi kita yang memiliki kesibukan yang luar biasa padatnya tentu saja tidak ada waktu untuk sarapan dan lebih memilih makanan cepat saji yang tersedia banyak di kantin atau kafe. Hal ini bukan berarti makanan cepat saji tidak sehat, hal ini akan menjadi masalah jika terlalu berlebih dan tidak memperhatikan takaran saji atau kandungan nutrisi yang ada pada makanan tersebut. Beberapa cara bisa dilakukan dengan menjaga sikap  pola makan misalkan dengan diet atau menggunakan aplikasi perhitungan nutrisi yang ada di pasaran dan gratis untuk diunduh. Jenis aplikasi ini masih kurang efektif dimana aplikasi tersebut masih merupakan perkiraan saja dan tidak bisa digunakan secara realtime. Penelitian sebelumnya bisa menggunakan teknik computer vision dengan menggunakan image sebagai alat pembaca dari makanan yang akan kita santap. Aplikasi tersebut mampu membaca kandungan nutrisi sekaligus  harga makanan, teknik pengolah image yang digunakan menggunakan metode Deep Learning Neural Network, algoritma ini terbukti memiliki akurasi dan pembacaan data yang tinggi dibandingkan algoritma yang lain. Aplikasi dengan Neural Network yang berbasis image bisa diimplementasikan pada mesin kasir di kantin atau cafe dan bisa dibuat dalam bentuk perangkat mobile sehingga lebih mudah digunakan. Teknik komputerisasi dengan Deep Learning Neural Network terbukti bisa diterapkan di kantin dan caf


2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Daniel G. E. Thiem ◽  
Paul Römer ◽  
Matthias Gielisch ◽  
Bilal Al-Nawas ◽  
Martin Schlüter ◽  
...  

Abstract Background Hyperspectral imaging (HSI) is a promising non-contact approach to tissue diagnostics, generating large amounts of raw data for whose processing computer vision (i.e. deep learning) is particularly suitable. Aim of this proof of principle study was the classification of hyperspectral (HS)-reflectance values into the human-oral tissue types fat, muscle and mucosa using deep learning methods. Furthermore, the tissue-specific hyperspectral signatures collected will serve as a representative reference for the future assessment of oral pathological changes in the sense of a HS-library. Methods A total of about 316 samples of healthy human-oral fat, muscle and oral mucosa was collected from 174 different patients and imaged using a HS-camera, covering the wavelength range from 500 nm to 1000 nm. HS-raw data were further labelled and processed for tissue classification using a light-weight 6-layer deep neural network (DNN). Results The reflectance values differed significantly (p < .001) for fat, muscle and oral mucosa at almost all wavelengths, with the signature of muscle differing the most. The deep neural network distinguished tissue types with an accuracy of > 80% each. Conclusion Oral fat, muscle and mucosa can be classified sufficiently and automatically by their specific HS-signature using a deep learning approach. Early detection of premalignant-mucosal-lesions using hyperspectral imaging and deep learning is so far represented rarely in in medical and computer vision research domain but has a high potential and is part of subsequent studies.


Author(s):  
Tamanna Sharma ◽  
Anu Bajaj ◽  
Om Prakash Sangwan

Sentiment analysis is computational measurement of attitude, opinions, and emotions (like positive/negative) with the help of text mining and natural language processing of words and phrases. Incorporation of machine learning techniques with natural language processing helps in analysing and predicting the sentiments in more precise manner. But sometimes, machine learning techniques are incapable in predicting sentiments due to unavailability of labelled data. To overcome this problem, an advanced computational technique called deep learning comes into play. This chapter highlights latest studies regarding use of deep learning techniques like convolutional neural network, recurrent neural network, etc. in sentiment analysis.


Author(s):  
S Gopi Naik

Abstract: The plan is to establish an integrated system that can manage high-quality visual information and also detect weapons quickly and efficiently. It is obtained by integrating ARM-based computer vision and optimization algorithms with deep neural networks able to detect the presence of a threat. The whole system is connected to a Raspberry Pi module, which will capture live broadcasting and evaluate it using a deep convolutional neural network. Due to the intimate interaction between object identification and video and image analysis in real-time objects, By generating sophisticated ensembles that incorporate various low-level picture features with high-level information from object detection and scenario classifiers, their performance can quickly plateau. Deep learning models, which can learn semantic, high-level, deeper features, have been developed to overcome the issues that are present in optimization algorithms. It presents a review of deep learning based object detection frameworks that use Convolutional Neural Network layers for better understanding of object detection. The Mobile-Net SSD model behaves differently in network design, training methods, and optimization functions, among other things. The crime rate in suspicious areas has been reduced as a consequence of weapon detection. However, security is always a major concern in human life. The Raspberry Pi module, or computer vision, has been extensively used in the detection and monitoring of weapons. Due to the growing rate of human safety protection, privacy and the integration of live broadcasting systems which can detect and analyse images, suspicious areas are becoming indispensable in intelligence. This process uses a Mobile-Net SSD algorithm to achieve automatic weapons and object detection. Keywords: Computer Vision, Weapon and Object Detection, Raspberry Pi Camera, RTSP, SMTP, Mobile-Net SSD, CNN, Artificial Intelligence.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Venkateswara Rao Kota ◽  
Shyamala Devi Munisamy

PurposeNeural network (NN)-based deep learning (DL) approach is considered for sentiment analysis (SA) by incorporating convolutional neural network (CNN), bi-directional long short-term memory (Bi-LSTM) and attention methods. Unlike the conventional supervised machine learning natural language processing algorithms, the authors have used unsupervised deep learning algorithms.Design/methodology/approachThe method presented for sentiment analysis is designed using CNN, Bi-LSTM and the attention mechanism. Word2vec word embedding is used for natural language processing (NLP). The discussed approach is designed for sentence-level SA which consists of one embedding layer, two convolutional layers with max-pooling, one LSTM layer and two fully connected (FC) layers. Overall the system training time is 30 min.FindingsThe method performance is analyzed using metrics like precision, recall, F1 score, and accuracy. CNN is helped to reduce the complexity and Bi-LSTM is helped to process the long sequence input text.Originality/valueThe attention mechanism is adopted to decide the significance of every hidden state and give a weighted sum of all the features fed as input.


Sign in / Sign up

Export Citation Format

Share Document